Some clarifications about this post based on a question I got via email:
Note that in this post I’m talking about 10x increases in effective compute via algorithmic progress, not literally scaling up bare metal compute. This somewhat alters the picture. Further, I’m talking about the returns after the point AIs are capable enough to fully automate AI R&D which somewhat alters my estimates (skewing them higher mostly).
I discuss yielding SDs, speed, and reduced cost (more copies) simultaneously because there are often algorithmic advances which improve speed or reduce cost but which don’t allow for improving overall capabilities (adding SDs). This makes it relatively easier to improve on multiple dimensions simultaneously than to improve on just one dimension much more. (This is somewhat different than how scaling up bare metal compute would work: I’d expect less of a diversification / diminishing returns effect.)
Thus, if you focused all the effort on skills/overall capabilities you’d get more than 1.2 SD / OOM, but the increase might be pretty marginal due to this diversification effect. So, in particular, I maybe expect that if you focused all your effort on SD (and e.g. discarded speed/cost improvements that you find incidentally and which can’t be used for pure capability improvements), you’d get maybe a touch more like 1.3 or 1.4 SD / OOM.
Why are my estimates for SD / OOM substantially above the GPT-4 results (and the other recent ML results like o3) which predict more like 1 SD/OOM? I find my estimates pulled up from this data due to:
Other estimates (e.g. based on brain size scaling) imply much higher numbers are plausible.
Conditioning on having already achieved full automation of AI R&D (the post is focused on this case) gives us some evidence that we’re in a regime with higher returns to effective compute / years of algorithmic progress (and are closer to the plausible higher returns to effective compute like in the brain size scaling regime)
I think the o1/o3 numbers might skew a bit low due to only looking at scaling up RL compute rather than scaling up RL and pretraining in the optimal ratios.
To be clear, I’m pretty uncertain about the overall estimate and I’d be sympathetic to a lower estimate even at the point of full AI R&D automation, especially if you condition on no important paradigm shifts or large advances. And my estimates for marginal progress at the current level of capability are probably substantially lower, though I haven’t thought about this that much.
Another factor here is there are probably innovations which help with qualitative skill / SDs but which cost much more inference compute. This could make the tradeoffs very different. I’m mostly neglecting this in my analysis.
Another factor is that you can potentially well approximate a faster and smarter system using a mix of a slower and smarter system and a dumber but faster system mixed together. This is part of why I expect improvements on all axes to yield larger gains than a more narrow focus.
Some clarifications about this post based on a question I got via email:
Note that in this post I’m talking about 10x increases in effective compute via algorithmic progress, not literally scaling up bare metal compute. This somewhat alters the picture. Further, I’m talking about the returns after the point AIs are capable enough to fully automate AI R&D which somewhat alters my estimates (skewing them higher mostly).
I discuss yielding SDs, speed, and reduced cost (more copies) simultaneously because there are often algorithmic advances which improve speed or reduce cost but which don’t allow for improving overall capabilities (adding SDs). This makes it relatively easier to improve on multiple dimensions simultaneously than to improve on just one dimension much more. (This is somewhat different than how scaling up bare metal compute would work: I’d expect less of a diversification / diminishing returns effect.)
Thus, if you focused all the effort on skills/overall capabilities you’d get more than 1.2 SD / OOM, but the increase might be pretty marginal due to this diversification effect. So, in particular, I maybe expect that if you focused all your effort on SD (and e.g. discarded speed/cost improvements that you find incidentally and which can’t be used for pure capability improvements), you’d get maybe a touch more like 1.3 or 1.4 SD / OOM.
Why are my estimates for SD / OOM substantially above the GPT-4 results (and the other recent ML results like o3) which predict more like 1 SD/OOM? I find my estimates pulled up from this data due to:
Other estimates (e.g. based on brain size scaling) imply much higher numbers are plausible.
Conditioning on having already achieved full automation of AI R&D (the post is focused on this case) gives us some evidence that we’re in a regime with higher returns to effective compute / years of algorithmic progress (and are closer to the plausible higher returns to effective compute like in the brain size scaling regime)
I think the o1/o3 numbers might skew a bit low due to only looking at scaling up RL compute rather than scaling up RL and pretraining in the optimal ratios.
To be clear, I’m pretty uncertain about the overall estimate and I’d be sympathetic to a lower estimate even at the point of full AI R&D automation, especially if you condition on no important paradigm shifts or large advances. And my estimates for marginal progress at the current level of capability are probably substantially lower, though I haven’t thought about this that much.
Another factor here is there are probably innovations which help with qualitative skill / SDs but which cost much more inference compute. This could make the tradeoffs very different. I’m mostly neglecting this in my analysis.
Another factor is that you can potentially well approximate a faster and smarter system using a mix of a slower and smarter system and a dumber but faster system mixed together. This is part of why I expect improvements on all axes to yield larger gains than a more narrow focus.