Vladimir_Nesov comments on Vladimir_Nesov’s Shortform

Vladimir_Nesov 29 Jan 2025 17:03 UTC
27 points
0
Stargate is evidence towards slower training system scaling. The rumored reason for starting the project is that Microsoft isn’t building giant frontier training systems fast enough, probably because they aren’t seeing the case for doing that faster. In which case other hyperscalers might think similarly, and they are the most well-positioned to build these systems, so this attitude might be indicative of how frontier training systems get built overall, which is notably slower than technically feasible.

The $80bn Microsoft capex is not relevant to this if it goes to many smaller systems^[1], which is only natural as there are millions of datacenter GPUs but only a few 100K GPU frontier training systems, a tiny fraction of inference and smaller/research training compute. The $500bn figure is not relevant as for now it’s only a vague plan. But Microsoft not agreeing to build training systems on OpenAI’s schedule is some evidence.

OpenAI would want to get from under Microsoft’s thumb anyway^[2], and this gets ever more difficult over time, since frontier training systems get ever more expensive, so the sooner they try the more likely they are to succeed. But even this consideration is some evidence of slowdown, since it only motivates saying you want to build frontier training systems even faster, but doesn’t in itself motivate actually going through with it, beyond building a competitive training system that makes you independent.

So the clues that support the prospect of scaling to 1 GW in 2025 and to 5 GW in 2027 could be misleading, running contrary to hyperscaler attitudes and not aligning even with OpenAI’s immediate incentives.
1. ↩︎
  I previously expected that $80bn is evidence that they are building a large training system this year, but it now seems that they are building more inference instead.
2. ↩︎
  As Satya Nadella said, “If OpenAI disappeared tomorrow… we have all the IP rights and all the capability. We have the people, we have the compute, we have the data, we have everything. We are below them, above them, around them.”
What links here?