I am imagining a scenario like:
A company spends $10 billion training an AI.
The AI has fully human-level capabilities.
The company thinks, wow this is amazing, we can justify spending way more than $10 billion on something like this.
They don’t bother with any algorithmic improvements or anything, they just run the same training but with $1 trillion instead. (Maybe they get a big loan.)
The $1 trillion AI is superintelligent.
The $1 trillion AI kills everyone.
Thus there is no period of recursive self-improvement, you just go from human-level to dead in a single step.
This scenario depends on some assumptions that seem kinda unlikely to me, but not crazy unlikely. I want to hear other people’s thoughts.
The largest of the present day models (GPT-4.5, Opus 4) in some strange sense could be said to cost about $500M to do the final pretraining run. Though much more even with the same strange way of counting costs in research experiments that are necessary to make the final pretraining run a success and the subsequent post-training good at eliciting its capabilities in a useful form. (Long reasoning training, or RLVR, is still mostly in elicitation mode, but threatens to start creating new capabilities at a more substantial training cost within 1-2 years.)
This is not the real cost in the sense that there is no market where you can pay that amount of money and get the ability to do that training run, instead you need to build a giant training system yourself. The servers and networking cost about 10x more than the 3-4 months of their time at a minimal price necessary to break even, considering that price-performance of compute advances quickly, and also the hardware itself doesn’t last more than a few years when always in use. (Cloud providers would charge substantially more than that and won’t give you nearly enough compute for a frontier training run.)
Since frontier training systems are currently very different from older ones (larger, and with much higher power and cooling requirements per rack), it’s also necessary to pay for the buildings, power, and cooling infrastructure, at the same time as you need to be buying the very expensive compute hardware. This makes the costs about 50% greater than just the compute hardware. So in total you are paying 15x more to build a frontier training system, than the pretend on-paper “cost of a training run”. The $500M training runs of today are done at what’s probably $7bn training systems ($4-5bn in compute hardware, $1-3bn in buildings/power/cooling). The company needs to actually raise the $7bn, and not the $500M.
The largest training system currently being built that’s somewhat documented is Stargate Abilene, to be completed around summer 2026. It might cost maybe $40-45bn to build ($15bn through Crusoe on buildings/power/cooling, maybe around $27bn on compute racks and networking through Oracle), and will host 400K chips in GB200 NVL72 racks, which is 10x more FLOP/s for pretraining than probably went into GPT-4.5 or Opus 4, and 150x more than went into GPT-4.
Now the pretend “cost of time” of the ~$40-45bn system to do a 3-4 month long final pretraining run of a giant model that might come out in 2027 could be said to be “about $3bn”, but that’s a somewhat meaningless figure, they still needed to manage to finance the ~$40-45bn development to get there, and they’ll spend more than $3bn on the experiments needed to make that training run work.
This year, Amazon is spending $100bn on things like building its datacenters around the world, and that’s a $2-3 trillion market cap company. Even if the giant datacenters are each a 2-year project, we are already close to what a non-AGI AI company might be able to finance, and closely after that we’d be running into the constraints of industrial capacity. So without AGI, the scaling of giant frontier AI training systems should stop around 2027-2029, at which point it regresses to the pace of Moore’s law (of price-performance), which is about 3x slower than the current funding-fueled ramp-up.
So if I understand correctly, you’re saying it would not be feasible to scale up training compute by 100x in a matter of months, because you’d need to build out the infrastructure first?
Judging by Colossus and Stargate Abilene, it takes about 9 months to construct the buildings/substations/cooling, and 2-3 months to install the compute hardware. Power might in principle be solved with gas generators, and the global compute hardware supply is significantly greater than what individual frontier AI training systems are using, but less than 100x greater.
Stargate Abilene will be 1.2 GW, and a hypothetical frontier AI training system of 2027-2029 might be about 5 GW. Scaling 100x from that within months would be quite a sight. Also for pretraining there won’t be enough text data to go further anyway, though with enough compute training on video might prove useful.
So a more likely story is about figuring out how to use all of the existing global compute for making a single AI smarter, even when it’s not all in one place and not even connected together at a very high bandwidth. RLVR is already pointing to that, but it’s not yet proven to be useful or even work at all at the relevant scale. Reaching AGI will plausibly result in AGIs quickly finding a way to make use of all this compute, at which point it’ll be more valuable in the hands of AGIs than whatever it was doing, so that’s where it’ll end up (unless the world wakes up at the last possible moment and doesn’t do that).
I am not concerned about this scenario. It does not matter if this is feasible or not (it might be theoretically feasible, but other things will almost certainly happen first).
The labs are laser-focused on algorithmic improvements, and the rate of algorithmic improvements is very fast (algorithmic improvements contribute more than hardware improvements at the moment).
The AIs are being optimized to do productive software engineering and to productively assist in AI research, and soon to perform productive AI research almost autonomously.
So the scenario I tend to ponder is software-only intelligence explosion based on non-saturating recursive self-improvement within a fixed hardware configuration (this is, in some sense, a scenario which is dual to the scenario described in this post; although, of course, they all are trying to scale hardware as well because they are in a race and every bit of advantage matters if one wants to reach an ASI level before other labs do that; that race situation is also quite unfortunate from the existential safety angle).
Answering my own question:
It might cost multiple orders of magnitude more than $10B to build human-level AI. I could still see a similar scenario playing out if the baseline cost is $100B, but probably not at $1T. As I understand,
present-day models cost more like $100B(edit: I badly misread a graph, present-day models cost more like $100M although the figures are not public); the first human-level AI will probably cost considerably more.I doubt a 100x increase in spending (or 1000x) is enough to go from human-level to superintelligent, but I don’t think we can strongly rule it out. We don’t know if scaling laws will continue to hold, and also we don’t know what level of intelligence is required for an AI to pose an existential threat. (Like, maybe 150 IQ + the ability to clone yourself is already sufficient. Probably not, but maybe.)
I somewhat doubt AI companies would decide to do this. It contradicts their stated plans, and it would be a deviation from their historical behavior. But once AI gets good enough to replace human workers, its profitability rapidly increases, so it could be economically justifiable to do a fast scale-up even though that wasn’t justified at weaker capability levels.