Eight-rack Oberon scale-up worlds for Rubin might be in the works, which potentially makes them ready for models with tens of trillions of parameters one year earlier than Kyber racks would’ve made this efficient with Rubin Ultra, in 2027-2028 rather than in 2028-2029.
There were some unclear communications from GTC 2026 about a two-layer all-to-all NVLink scale-up world called “NVL576”. This seems to be a system comprising 8 non-Ultra Rubin Oberon (as in NVL72) racks (each with 144 compute dies in 72 2-die packages), so 576 packages (1152 compute dies) across 8 racks. It’s confusingly announced as “Vera Rubin Ultra NVL576 will combine eight … racks, each with 72 Rubin Ultra GPUs”. (Another slight confusion when searching about it is that in 2025 “NVL576” referred to a single Rubin Ultra Kyber rack with 576 compute dies in 144 4-die packages, but that’s clearly a different system from the “NVL576″ announced at GTC 2026.)
An 8-rack Rubin Oberon NVL576 system would have 165 TB of HBM4, so inferencing (and RLVRing) models with tens of trillions of parameters won’t be significantly less efficient than for models with trillions of parameters. This was TPUv7′s advantage (full buildout in 2026), and last year Nvidia only announced plans to close the gap with the Kyber rack for Rubin Ultra (576 compute dies in 144 4-die packages in one rack, 147 TB of HBM4E), which is due to come out in 2027, so that full-scale buildout would only conclude in 2028 (maybe early 2029). But Vera Rubin Oberon systems are already in production, and full-scale buildout will happen in 2027 (maybe early 2028, for some larger datacenter sites getting fully online).
So if these two-layer scale-up worlds for Oberon are available from the start, the constraint of HBM per scale-up world gets lifted a year earlier, which might translate into models with tens of trillions of parameters getting RLVRed and becoming available a year earlier on non-TPU systems. This might be especially crucial for OpenAI (if models this large can be made more capable than 10x smaller models in the relevant timeframe), since they are mostly working with Nvidia hardware, but even for Anthropic this might make a difference (they are getting 1 GW of TPUv7 in 2026, but it’s unclear if they’ll be able to get meaningfully more TPUs in 2027-2028).
(Having enough hardware to efficiently serve inference for a model of some shape is necessary to deploy it as a flagship model. If instead most of the available hardware is only good at serving smaller models, then even if the larger model can be trained, it can’t be served to most of the users as cheaply as the better hardware allows. This makes it less likely that it gets trained in the first place, and so the capabilities of the most popular hardware indirectly translate into the shapes of models that get trained in practice, even when it’s possible to train larger models in principle, and these larger models could still be served a bit slower and more expensively on older hardware.)
I’m starting to suspect that if 2026-2027 AGI happens through automation of routine AI R&D (automating acquisition of deep skills via RLVR), it doesn’t obviously accelerate ASI timelines all that much. Automated task and RL environment construction fixes some of the jaggedness, but LLMs are not currently particularly superhuman, and advancing their capabilities plausibly needs skills that aren’t easy for LLMs to automatically RLVR into themselves (as evidenced by humans not having made too much progress in RLVRing such skills).
This creates a strange future with broadly capable AGI that’s perhaps even somewhat capable of frontier AI R&D (not just routine AI R&D), but doesn’t accelerate further development beyond picking low-hanging algorithmic fruit unlocked by a given level of compute faster (months instead of years, but bounded by what the current compute makes straightforward). If this low-hanging algorithmic fruit doesn’t by itself lead to crucial breakthroughs, AGIs won’t turn broadly or wildly superhuman before there’s much more compute, or before a few years where human researchers would’ve made similar progress as these AGIs. And compute might remain gated by ASML EUV tools at 100-200 GW of new compute per year (3.5 tools occupied per GW of compute each year; maybe 250-300 EUV tools exist now, 50-100 will be produced per year, about 700 will exist in 2030).