“Culmination” suggests a subsequent decline. In 2025, scaling of RLVR delivered a lot of capabilities, and late 2025 was the first time since 2023 that 10x-20x scaling in pretraining compute (compared to original Mar 2023 GPT-4) has finally made a full appearance (probably in Gemini 3 Pro and Opus 4.5). There is 100x-400x more in scaling of compute by 2029-2031 (compared to the current models), and much more in the low-hanging fruit of doing things well rather than prototyping the first thing that sort of works. The only low-hanging fruit that likely mostly ran out in 2025 is raw scaling of RLVR (in proportion to pretraining), and even that probably still has a bit to go. Setting up better tasks and RL environments will plausibly be more impactful than some feasible amount of further scaling of RLVR relative to pretraining. Then there’s continual learning that might be quite impactful in 2026-2027.
I expect some sort of culmination in 2027-2032 (assuming no AGI), when scaling of compute slows down and there have been at least 1-2 years to learn to make better use of it. The first stage of compute slowdown probably follows 2026, if 2028 won’t see 5 GW training systems (which would be on trend for training system compute growth in 2022-2026, but currently doesn’t seem to be happening). The second probably follows 2028-2030, when funding to secure ever more training compute mostly stops growing, and so compute mostly falls back to growth according to its price-performance.
Kinda, but there won’t be enough natural text data at the higher end of this range (using 2028-2030 compute) to just scale pretraining on text (you’d need more than 1,000 trillion tokens with repetition, maybe 200-300 trillion unique tokens), something else would need to happen instead or you start losing efficiency and compute ends up being less useful than it would be if there was enough text.
The steps of scaling take a long time, so only late 2025 models get to be shaped compute optimally for 2024 levels of pretraining compute, and run on hardware announced and first available in the cloud in 2024. This is just 2 years from 2022, when GPT-4 was trained, and the first of two 10x-20x steps at the 2022-2026 pace of scaling, with a third step remaining somewhere beyond 2026 if we assume $100bn per year revenue for an AI company (at that time). With 2026 compute, there just might be enough text data (with repetition) to say that scaling of pretraining is still happening in a straightforward sense, which brings the change from original Mar 2023 GPT-4 to 100x-400x (for models that might come out in 2027).
But this 100x-400x is also a confusing point of comparison, since between 2023 and 2027 there was the introduction of RLVR scaling (and test time reasoning), and also all the improvements that come from working on a product (as opposed to a research prototype) for 4 years. Continual learning might be another change complicating this comparison that happens before 2027 (meaning it might be a significant change, which remains uncertain; that it’s coming in some form, at least as effective context extension, seems quite clear at this point).
Thank you! This makes me wonder if one can predict whether CoT-based AGI will be reached at all. Setting aside any forecaster’s nightmares like a time horizon growing exponentially until the very last couple of doublings or a potentiallyinflated time horizon of Claude Opus 4.5, one might try to predict the influence of the 100x-400x increase in compute on the METR-like[1] time horizon. And how much do you expect “setting up better tasks and RL environments” to increase the logarithm of the time horizon?
“Culmination” suggests a subsequent decline. In 2025, scaling of RLVR delivered a lot of capabilities, and late 2025 was the first time since 2023 that 10x-20x scaling in pretraining compute (compared to original Mar 2023 GPT-4) has finally made a full appearance (probably in Gemini 3 Pro and Opus 4.5). There is 100x-400x more in scaling of compute by 2029-2031 (compared to the current models), and much more in the low-hanging fruit of doing things well rather than prototyping the first thing that sort of works. The only low-hanging fruit that likely mostly ran out in 2025 is raw scaling of RLVR (in proportion to pretraining), and even that probably still has a bit to go. Setting up better tasks and RL environments will plausibly be more impactful than some feasible amount of further scaling of RLVR relative to pretraining. Then there’s continual learning that might be quite impactful in 2026-2027.
I expect some sort of culmination in 2027-2032 (assuming no AGI), when scaling of compute slows down and there have been at least 1-2 years to learn to make better use of it. The first stage of compute slowdown probably follows 2026, if 2028 won’t see 5 GW training systems (which would be on trend for training system compute growth in 2022-2026, but currently doesn’t seem to be happening). The second probably follows 2028-2030, when funding to secure ever more training compute mostly stops growing, and so compute mostly falls back to growth according to its price-performance.
100x more compute means the leap from GPT-3 to GPT-4.
Kinda, but there won’t be enough natural text data at the higher end of this range (using 2028-2030 compute) to just scale pretraining on text (you’d need more than 1,000 trillion tokens with repetition, maybe 200-300 trillion unique tokens), something else would need to happen instead or you start losing efficiency and compute ends up being less useful than it would be if there was enough text.
The steps of scaling take a long time, so only late 2025 models get to be shaped compute optimally for 2024 levels of pretraining compute, and run on hardware announced and first available in the cloud in 2024. This is just 2 years from 2022, when GPT-4 was trained, and the first of two 10x-20x steps at the 2022-2026 pace of scaling, with a third step remaining somewhere beyond 2026 if we assume $100bn per year revenue for an AI company (at that time). With 2026 compute, there just might be enough text data (with repetition) to say that scaling of pretraining is still happening in a straightforward sense, which brings the change from original Mar 2023 GPT-4 to 100x-400x (for models that might come out in 2027).
But this 100x-400x is also a confusing point of comparison, since between 2023 and 2027 there was the introduction of RLVR scaling (and test time reasoning), and also all the improvements that come from working on a product (as opposed to a research prototype) for 4 years. Continual learning might be another change complicating this comparison that happens before 2027 (meaning it might be a significant change, which remains uncertain; that it’s coming in some form, at least as effective context extension, seems quite clear at this point).
Thank you! This makes me wonder if one can predict whether CoT-based AGI will be reached at all. Setting aside any forecaster’s nightmares like a time horizon growing exponentially until the very last couple of doublings or a potentially inflated time horizon of Claude Opus 4.5, one might try to predict the influence of the 100x-400x increase in compute on the METR-like[1] time horizon. And how much do you expect “setting up better tasks and RL environments” to increase the logarithm of the time horizon?
As for compute mostly falling back to growth according to its price-performance, the AI-2027 compute forecast doesn’t mention anything better than 3nm chips, and TSMC’s 2nm chips would be significantly more expensive than those of previous generations.
The METR benchmark itself has yet to include tasks requiring more than 16 hrs of work.