I don’t think there is a delay specific to NVL72, it just takes this long normally, and with all the external customers Nvidia needs to announce things a bit earlier than, say, Google. This is why I expect Rubin Ultra NVL576 (the next check on TPU dominance after 2026′s NVL72) to also take similarly long. It’s announced for 2027, but 2028 will probably only see completion of a fraction of the eventual buildout, and only in 2029 will the bulk of the buildout be completed (though maybe late 2028 will be made possible for NVL576 specifically, given the urgency and time to prepare). This would enable companies like OpenAI (without access to TPUs at gigawatt scale) to serve flagship models at the next level of scale (what 2026 pretraining compute asks for) for all its users, catching up to where Google and Anthropic were in 2026-2027 thanks to Ironwood. Unless Google decides to give yet another of its competitors this crucial resource and allows OpenAI to build gigawatts of TPUs earlier than 2028-2029.
Do you know why it takes such a long time to deploy a new rack system at scale? In my mind you slap on the new Rubin chips, more HBM, and you are good to go. (In your linked comment you mention “reliability issues”, is that where the bulk of the time comes from? (I did not read the linked semianalysis article.)) Or does everything, including e.g. cooling and interconnects, have to be redesigned from scratch for each new rack system, so you can’t reuse any of the older proven/reliable components?
That things other than chips need to be redesigned wouldn’t argue either way, because in that hypothetical everything could just come together at once, the other things the same way as the chips themselves. The issue is capacity of factories and labor for all the stuff and integration and construction. You can’t produce everything all at once, instead you need to produce each kind of thing that goes into the finished datacenters over the course of at least months, maybe as long as 2 years for sufficiently similar variants of a system that can share many steps of the process (as with H100/H200/B200 previously, and now GB200/GB300 NVL72).
How elaborate the production process needs to be also doesn’t matter, it just shifts the arrival of the finished systems in time (even if substantially), with the first systems still getting ready earlier than the bulk of them. And so the first 20% of everything (at a given stage of production) will be ready partway into the volume production period (in a broad sense that also includes construction of datacenter buildings or burn-in of racks), significantly earlier than most of it.
I don’t think there is a delay specific to NVL72, it just takes this long normally, and with all the external customers Nvidia needs to announce things a bit earlier than, say, Google. This is why I expect Rubin Ultra NVL576 (the next check on TPU dominance after 2026′s NVL72) to also take similarly long. It’s announced for 2027, but 2028 will probably only see completion of a fraction of the eventual buildout, and only in 2029 will the bulk of the buildout be completed (though maybe late 2028 will be made possible for NVL576 specifically, given the urgency and time to prepare). This would enable companies like OpenAI (without access to TPUs at gigawatt scale) to serve flagship models at the next level of scale (what 2026 pretraining compute asks for) for all its users, catching up to where Google and Anthropic were in 2026-2027 thanks to Ironwood. Unless Google decides to give yet another of its competitors this crucial resource and allows OpenAI to build gigawatts of TPUs earlier than 2028-2029.
Do you know why it takes such a long time to deploy a new rack system at scale? In my mind you slap on the new Rubin chips, more HBM, and you are good to go. (In your linked comment you mention “reliability issues”, is that where the bulk of the time comes from? (I did not read the linked semianalysis article.)) Or does everything, including e.g. cooling and interconnects, have to be redesigned from scratch for each new rack system, so you can’t reuse any of the older proven/reliable components?
That things other than chips need to be redesigned wouldn’t argue either way, because in that hypothetical everything could just come together at once, the other things the same way as the chips themselves. The issue is capacity of factories and labor for all the stuff and integration and construction. You can’t produce everything all at once, instead you need to produce each kind of thing that goes into the finished datacenters over the course of at least months, maybe as long as 2 years for sufficiently similar variants of a system that can share many steps of the process (as with H100/H200/B200 previously, and now GB200/GB300 NVL72).
How elaborate the production process needs to be also doesn’t matter, it just shifts the arrival of the finished systems in time (even if substantially), with the first systems still getting ready earlier than the bulk of them. And so the first 20% of everything (at a given stage of production) will be ready partway into the volume production period (in a broad sense that also includes construction of datacenter buildings or burn-in of racks), significantly earlier than most of it.