There is no way to predict the location of crucial thresholds of capabilities and the timing of when basic science comes up with new methods, so any theoretical arguments can only slightly slosh the probabilities along the timeline. Evals are getting better, but saturation of any given eval remains only a lower bound for crossing capability thresholds. We get to experimentally observe the capabilities once they have actually been achieved, but no earlier.
The most concrete consideration is how the speed of compute scaling changes somewhat predictably (fast now, ~3x slower after funding stops growing at current rates in 2027-2029), as it’s the key input to any methods of creating capabilities. Natural text data will be ~completely running out around 2027-2029 as well, and pretraining from other kinds of data is plausibly much less efficient, slowing down scaling of capabilities from pretraining further.
The AI companies might have some idea about the scaling laws for long reasoning training, which inputs to training influence capabilities how much, whether there are scarce inputs or bounds on capabilities inherited from the base models. Public knowledge on this is still at the stage of reproducing and slightly improving the methods of DeepSeek-R1, but unlike last year the research efforts in the open have a more defined target. This brings the crucial thresholds of capabilities closer than pretraining on its own, but it’s not clear if they will be approaching faster as well, or if it’s a one-time improvement on top of pretraining, so that scaling of pretraining will still remain a limiting factor without new training methods (whose timing of arrival is unpredictable).
Ben Todd here gets into the numbers a little bit. I think the rough argument is that after 2028, the next training run would be $100bn which becomes difficult to afford.
There is no way to predict the location of crucial thresholds of capabilities and the timing of when basic science comes up with new methods, so any theoretical arguments can only slightly slosh the probabilities along the timeline. Evals are getting better, but saturation of any given eval remains only a lower bound for crossing capability thresholds. We get to experimentally observe the capabilities once they have actually been achieved, but no earlier.
The most concrete consideration is how the speed of compute scaling changes somewhat predictably (fast now, ~3x slower after funding stops growing at current rates in 2027-2029), as it’s the key input to any methods of creating capabilities. Natural text data will be ~completely running out around 2027-2029 as well, and pretraining from other kinds of data is plausibly much less efficient, slowing down scaling of capabilities from pretraining further.
The AI companies might have some idea about the scaling laws for long reasoning training, which inputs to training influence capabilities how much, whether there are scarce inputs or bounds on capabilities inherited from the base models. Public knowledge on this is still at the stage of reproducing and slightly improving the methods of DeepSeek-R1, but unlike last year the research efforts in the open have a more defined target. This brings the crucial thresholds of capabilities closer than pretraining on its own, but it’s not clear if they will be approaching faster as well, or if it’s a one-time improvement on top of pretraining, so that scaling of pretraining will still remain a limiting factor without new training methods (whose timing of arrival is unpredictable).
Have you (or others) written about where this estimate comes from?
I wrote it up in a new shortform.
Ben Todd here gets into the numbers a little bit. I think the rough argument is that after 2028, the next training run would be $100bn which becomes difficult to afford.