Vladimir_Nesov comments on Vladimir_Nesov’s Shortform

Vladimir_Nesov 30 Mar 2026 14:59 UTC
32 points
0
There likely won’t be an 8-rack scale-up Nvidia system ready for the 2027 buildout after all (contrary to what I speculated), and even for the 2028 buildout it won’t be offered in important quantities, suggests the SemiAnalysis report ^[1] posted after GTC 2026. If this is the case, then large scale-up worlds in the Nvidia buildout will follow the GTC 2025 timeline, with the first major change compared to GB200/GB300 Oberon racks (14/20 TB of HBM4) being the Rubin Ultra Kyber racks (147 TB of HBM4E), with the full-scale buildout in 2028.

Google TPUs will keep their advantage in hypothetical models with tens of trillions of parameters until 2028, and we might soon observe from Anthropic’s rumored larger-than-Opus Claude 5 model whether that’s likely to become an important class of models before 2028.
1. ↩︎
  Specifically, they say it’s 8x Oberon for racks with 72 Rubin Ultra packages, 4 compute dies per package, shipped starting in 2027 (rather than 72 non-Ultra Rubin packages, 2 compute dies per package, shipped starting in 2026). And that the multi-rack scale-up will be uncomfortably expensive, so unlikely to be shipped in volume. Since Kyber racks of Rubin Ultra should be out at about the same time, it doesn’t seem crucial that this system is offered at all, other than as an early peek at the Feynman 8x Kyber (NVL 1152) systems of 2028-2029.
- anaguma 30 Mar 2026 18:09 UTC
  1 point
  0
  Parent
  Google TPUs will keep their advantage in hypothetical models with tens of trillions of parameters until 2028, and we might soon observe from Anthropic’s rumored larger-than-Opus Claude 5 model whether that’s likely to become an important class of models before 2028.
  Do you think this is likely to be trained using TPUs or Trainium?
  - Vladimir_Nesov 30 Mar 2026 18:56 UTC
    6 points
    1
    Parent
    Anthropic plausibly didn’t and don’t have enough TPUv7 yet. But the model is probably not tens of trillions of parameters, just notably bigger than Opus 4. And Opus 4 is sized for efficient serving with Trainium 2 racks, so maybe 3T params (Opus 5 probably won’t change that, since Trainium 2 remains an important part of the fleet). Thus 10T params would qualify for a larger-than-Opus weight class. The potential observation from this model I’m referring to is in whether further scaling above Opus results in meaningful improvement (Opus itself already demonstrated that scaling above Sonnet works), thus motivating further feasible scaling to continue beyond that point, towards tens of trillions of parameters that TPUv7/TPUv8 (and then Rubin Ultra Kyber racks) should be able to endure well enough.
    
    There is no constraint of fitting models in scale-up worlds, it’s possible to serve a model across dozens of scale-up worlds. But if large parts of it do fit in one scale-up world or better yet it fits entirely with room for KV-cache to spare, it does wonders for efficiency. So a 10T param model on Trainium 2 is not a disaster compared to serving (or RLVRing) it on H100s, but it’s going to be even better with TPUv7. And for a 1T param model there might be no difference between Trainium 2 and TPUv7 (for reasonable levels of interactivity, beyond what the specs of the underlying chips suggest).