Vladimir_Nesov comments on AI 2027: What Superintelligence Looks Like

Vladimir_Nesov 3 Apr 2025 19:23 UTC
60 points
6
Non-Google models of late 2027 use Nvidia Rubin, but not yet Rubin Ultra. Rubin NVL144 racks have the same number of compute dies and chips as Blackwell NVL72 racks (change in the name is purely a marketing thing, they now count dies instead of chips). The compute dies are already almost reticle sized, can’t get bigger, but Rubin uses 3nm (~180M Tr/mm2) while Blackwell is 4nm (~130M Tr/mm2). So the number of transistors per rack goes up according to transistor density between 4nm and 3nm, by 1.4x, plus better energy efficiency enables higher clock speed, maybe another 1.4x, for the total of 2x in performance. The GTC 2025 announcement claimed 3.3x improvement for dense FP8, but based on the above argument it should still be only about 2x for the more transistor-hungry BF16 (comparing Blackwell and Rubin racks).

Abilene site of Stargate^[1] will probably have 400K-500K Blackwell chips in 2026, about 1 GW. Nvidia roadmap puts Rubin (VR200 NVL144) 1.5-2 years after Blackwell (GB200 NVL72), which is not yet in widespread use, but will get there soon. So the first models will start being trained on Rubin no earlier than late 2026, much more likely only in 2027, possibly even second half of 2027. Before that, it’s all Blackwell, and if it’s only 1 GW Blackwell training systems^[2] in 2026 for one AI company, shortly before 2x better Rubin comes out, then that’s the scale where Blackwell stops, awaiting Rubin and 2027. Which will only be built at scale a bit later still, similarly to how it’s only 100K chips in GB200 NVL72 racks in 2025 for what might be intended to be a single training system, and not yet 500K chips.

This predicts at most 1e28 BF16 FLOPs (2e28 FP8 FLOPs) models in late 2026 (trained on 2 GW of GB200/GB300 NVL72), and very unlikely more than 1e28-4e28 BF16 FLOPs models in late 2027 (1-4 GW Rubin datacenters in late 2026 to early 2027), though that’s alternatively 3e28-1e29 FP8 FLOPs given the FP8/BF16 performance ratio change with Rubin I’m expecting. Rubin Ultra is another big step ~1 year after Rubin, with 2x more compute dies per chip and 2x more chips per rack, so it’s a reason to plan pacing the scaling a bit rather than rushing it in 2026-2027. Such plans will make rushing it more difficult if there is suddenly a reason to do so, and 4 GW with non-Ultra Rubin seems a bit sudden.

So pretty similar to Agent 2 and Agent 4 at some points, keeping to the highest estimates, but with less compute than the plot suggests for months while the next generation of datacenters is being constructed (during the late 2026 to early 2027 Blackwell-Rubin gap).
1. ↩︎
  It wasn’t confirmed all of it goes to Stargate, only that Crusoe is building it on the same site as it did the first buildings that do go to Stargate.
2. ↩︎
  500K chips, 1M compute dies, 1.25M H100-equivalents, ~4e27 FLOPs for a model in BF16.
What links here?
- AI 2027: Responses by Zvi (8 Apr 2025 12:50 UTC; 111 points)
- romeo 12 Apr 2025 0:10 UTC
  11 points
  0
  Parent
  Thanks for the comment Vladimir!
  [...] for the total of 2x in performance.
  I never got around to updating based on the GTC 2025 announcement but I do have the Blackwell to Rubin efficiency gain down as ~2.0x adjusted by die size so looks like we are in agreement there (though I attributed it a little differently based on information I could find at the time).
  So the first models will start being trained on Rubin no earlier than late 2026, much more likely only in 2027 [...]
  Agreed! I have them coming into use in early 2027 in this chart.
  This predicts at most 1e28 BF16 FLOPs (2e28 FP8 FLOPs) models in late 2026
  Agreed! As you noted we have the early version of Agent-2 at 1e28 fp16 in late 2026.
  Rubin Ultra is another big step ~1 year after Rubin, with 2x more compute dies per chip and 2x more chips per rack, so it’s a reason to plan pacing the scaling a bit rather than rushing it in 2026-2027. Such plans will make rushing it more difficult if there is suddenly a reason to do so, and 4 GW with non-Ultra Rubin seems a bit sudden.
  Agree! I wrote this before knowing about the Rubin Ultra roadmap, but this part of the forecast starts to be affected somewhat by the intelligence explosion. Specifically an urgent demand for research experiment compute and inference specialised chips for running automated researchers.