So if I understand correctly, you’re saying it would not be feasible to scale up training compute by 100x in a matter of months, because you’d need to build out the infrastructure first?
Judging by Colossus and Stargate Abilene, it takes about 9 months to construct the buildings/substations/cooling, and 2-3 months to install the compute hardware. Power might in principle be solved with gas generators, and the global compute hardware supply is significantly greater than what individual frontier AI training systems are using, but less than 100x greater.
Stargate Abilene will be 1.2 GW, and a hypothetical frontier AI training system of 2027-2029 might be about 5 GW. Scaling 100x from that within months would be quite a sight. Also for pretraining there won’t be enough text data to go further anyway, though with enough compute training on video might prove useful.
So a more likely story is about figuring out how to use all of the existing global compute for making a single AI smarter, even when it’s not all in one place and not even connected together at a very high bandwidth. RLVR is already pointing to that, but it’s not yet proven to be useful or even work at all at the relevant scale. Reaching AGI will plausibly result in AGIs quickly finding a way to make use of all this compute, at which point it’ll be more valuable in the hands of AGIs than whatever it was doing, so that’s where it’ll end up (unless the world wakes up at the last possible moment and doesn’t do that).
So if I understand correctly, you’re saying it would not be feasible to scale up training compute by 100x in a matter of months, because you’d need to build out the infrastructure first?
Judging by Colossus and Stargate Abilene, it takes about 9 months to construct the buildings/substations/cooling, and 2-3 months to install the compute hardware. Power might in principle be solved with gas generators, and the global compute hardware supply is significantly greater than what individual frontier AI training systems are using, but less than 100x greater.
Stargate Abilene will be 1.2 GW, and a hypothetical frontier AI training system of 2027-2029 might be about 5 GW. Scaling 100x from that within months would be quite a sight. Also for pretraining there won’t be enough text data to go further anyway, though with enough compute training on video might prove useful.
So a more likely story is about figuring out how to use all of the existing global compute for making a single AI smarter, even when it’s not all in one place and not even connected together at a very high bandwidth. RLVR is already pointing to that, but it’s not yet proven to be useful or even work at all at the relevant scale. Reaching AGI will plausibly result in AGIs quickly finding a way to make use of all this compute, at which point it’ll be more valuable in the hands of AGIs than whatever it was doing, so that’s where it’ll end up (unless the world wakes up at the last possible moment and doesn’t do that).