New frontier models trained with ~10x more compute than GPT-4 (like Grok) haven’t wowed enough to justify spending another 10x more—~$1B—on pretraining.
Since GPT-4, some of the 2024 AIs (trained on 2023 compute) were already using more compute than original GPT-4, and so the current generation is only at 3x-5x up from those. From every 3x-5x being only a slight improvement, it doesn’t follow that stacking multiple steps of such scaling doesn’t lead to more significant improvement. In total, between 2022 and 2028, it’s technologically and in principle financially feasible to get a 2000x increase in compute (starting with the original GPT-4), and it’s impossible to narrowly estimate how much that improves capabilities from observing a single step of scaling compute by 3x-5x.
Also, the potential of 2024 compute hasn’t yet been demonstrated in 2025 AIs to its fullest extent, since the base models for o3, Gemini 2.5 Pro, and Grok 3 are likely smaller than computeoptimal ones. The models that are probably closer to compute optimal are GPT-4.5 and Opus 4, but a thinking variant for GPT-4.5 hasn’t been released yet. And Opus 4 has plausibly undergone only minimal reasoning training, in order to be ready for release faster, in which case we’ll only see it with the amount of reasoning training comparable to o3 or Gemini 2.5 Pro in later incremental releases.
That is, it’s plausible that a larger model pretrained on 10x more compute makes it possible to train a reasoning model that is meaningfully more capable, and the same applies to the next 10x, and then the 10x after that.
Since GPT-4, some of the 2024 AIs (trained on 2023 compute) were already using more compute than original GPT-4, and so the current generation is only at 3x-5x up from those. From every 3x-5x being only a slight improvement, it doesn’t follow that stacking multiple steps of such scaling doesn’t lead to more significant improvement. In total, between 2022 and 2028, it’s technologically and in principle financially feasible to get a 2000x increase in compute (starting with the original GPT-4), and it’s impossible to narrowly estimate how much that improves capabilities from observing a single step of scaling compute by 3x-5x.
Also, the potential of 2024 compute hasn’t yet been demonstrated in 2025 AIs to its fullest extent, since the base models for o3, Gemini 2.5 Pro, and Grok 3 are likely smaller than compute optimal ones. The models that are probably closer to compute optimal are GPT-4.5 and Opus 4, but a thinking variant for GPT-4.5 hasn’t been released yet. And Opus 4 has plausibly undergone only minimal reasoning training, in order to be ready for release faster, in which case we’ll only see it with the amount of reasoning training comparable to o3 or Gemini 2.5 Pro in later incremental releases.
That is, it’s plausible that a larger model pretrained on 10x more compute makes it possible to train a reasoning model that is meaningfully more capable, and the same applies to the next 10x, and then the 10x after that.
(The specific numbers such as $1bn for training runs are quite misleading because the companies training the models need to build the training systems with a large upfront capital cost first, and the 2024 frontier AI training systems already cost $5-7bn, while the 2026 frontier AI training systems such as Stargate Abilene that are currently being built cost $35-45bn, much more than the $1bn from the weird cost of time calculations for the previous generation of models.)