Grats on getting this out! I am overall excited about exploring models that rely more on uplift than on time horizons. A few thoughts:
It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.
In particular, the “doubling difficulty growth factor”, which measures whether time horizon increases superexponentially, could change the date of automated coder from 2028 to 2049! I suspect that time horizon is too poorly defined to nail down this parameter, and rough estimates of more direct AI capability metrics like uplift can give much tighter confidence intervals.
I am skeptical that uplift measurements actually give much tighter confidence intervals.
After talking to Thomas about this verbally, we both agree that directly using uplift measurements rather than time horizon could plausibly be better by the end of 2026, though we might have different intuitions about the precise likelihood.
Effective labor/compute ratio only changes by 10-100x during the period in question, so it doesn’t affect results much anyway. The fastest trajectories are most affected by compute:labor ratio, but for trajectories that get to 99% automation around 2034, the ratio stays around 1:1.
This isn’t true in our model because we allow full coding automation. Given that this is the case in your model, Cobb-Douglas seems like a reasonable approximation.
I am skeptical that uplift measurements actually give much tighter confidence intervals.
I think we might already have evidence against longer timelines (2049 timelines from doubling difficulty growth factor of 1.0). Basically, the model needs to pass a backtest against where uplift was in 2025 vs now, and if there’s significant uplift now and wasn’t before 2025, this implies the automation curve is steep.
Suppose uplift at the start of 2026 was 1.6x as in the AIFM’s median (which would be 37.5% automation if you make my simplifying assumptions). If we also know that automation at start of 2025 was at most 20%, this means automation is increasing at 0.876 logits per year, or per ~8x time horizon factor, or 45x effective compute factor etc. (If d.d.g.f is 1.0, automation is logistic in both log TH and log E.) At this slope, we get to 95% AI R&D automation (or in your model, ~100% coding automation) when we hit a time horizon of 14 years, which is less than your median of 125 years and should give timelines before 2045 or so. Uplift might be increasing even faster than this, in which case timelines will be shorter. We don’t have any hard data on uplift quite yet, but I suspect that our guesses at uplift should already make us downweight longer timelines, and in Q2 or Q3 I hope we can prove it.
I would still put some weight on longer timelines, but to me this uncertainty doesn’t live in something like d.d.g.f. My understanding of the AIFM is that uncertainty in all the time horizon parameters cashes out in the effective compute required for uplift. In this frame, the remaining uncertainty lives in whether the logistic curve in log E holds—whether ease of automation in the future is similar to ease of early-stage automation we’re already observing.
It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.
Yeah, my all-things-considered views are definitely more uncertain. I don’t have well-considered probabilities—I’d probably need to think about it more and perhaps build more simple models that apply in long-timelines cases.
Suppose uplift at the start of 2026 was 1.6x as in the AIFM’s median
Where are you getting this 1.6 number?
With respect to the rest of your comment, it feels to me like we have such little evidence about current uplift and what trend it follows (e.g. whether this assumption about a % automation curve that is logistic and its translation to uplift is a reasonable functional form). I’m not sure how strongly we disagree though. I’m much more skeptical of the claim that uplift can give much tighter confidence intervals than that it can give similar or slightly better ones. Again, this could change if we had much better data in a year or two.
Our median value for the coding uplift of present-day AIs at AGI companies is that having the AIs is like having 1.6 times as many software engineers (and all the staff necessary to coordinate them effectively).
As for the rest, seems reasonable. I think you can’t get around the uncertainty by modeling uplift as some more complicated function of coding automation fraction as in the AIFM, because you’re still assuming that’s logistic, we can’t measure it any better than uplift, plus we’re still uncertain how they’re related. So we really do need better data.
I think you can’t get around the uncertainty by modeling uplift as some more complicated function of coding automation fraction as in the AIFM, because you’re still assuming that’s logistic, we can’t measure it any better than uplift, plus we’re still uncertain how they’re related. So we really do need better data.
But in the AIFM the coding automation logistic is there to predict the dynamics regarding how much coding automation speeds progress pre-AC. It doesn’t have to do with setting the effective compute requirement for AC. I might be misunderstanding something, sorry if so.
Re: the 1.6 number, oh that should actually be 1.8 sorry. I think it didn’t get updated after a last minute change to the parameter value. I will fix that soon. Also, that’s the parallel uplift. In our model, the serial multiplier/uplift is sqrt(parallel uplift).
In my model it’s parallel uplift too. Effective labor (human+AI) still goes through the diminishing returns power to get to serial uplift, which I estimate as between roughly 0.1 and 0.3.
Grats on getting this out! I am overall excited about exploring models that rely more on uplift than on time horizons. A few thoughts:
It might be nice to indicate how these outputs relate to your all-things-considered views. To me your explicit model seems to be implausibly confident in 99% automation before 2040.
I am skeptical that uplift measurements actually give much tighter confidence intervals.
After talking to Thomas about this verbally, we both agree that directly using uplift measurements rather than time horizon could plausibly be better by the end of 2026, though we might have different intuitions about the precise likelihood.
This isn’t true in our model because we allow full coding automation. Given that this is the case in your model, Cobb-Douglas seems like a reasonable approximation.
I think we might already have evidence against longer timelines (2049 timelines from doubling difficulty growth factor of 1.0). Basically, the model needs to pass a backtest against where uplift was in 2025 vs now, and if there’s significant uplift now and wasn’t before 2025, this implies the automation curve is steep.
Suppose uplift at the start of 2026 was 1.6x as in the AIFM’s median (which would be 37.5% automation if you make my simplifying assumptions). If we also know that automation at start of 2025 was at most 20%, this means automation is increasing at 0.876 logits per year, or per ~8x time horizon factor, or 45x effective compute factor etc. (If d.d.g.f is 1.0, automation is logistic in both log TH and log E.) At this slope, we get to 95% AI R&D automation (or in your model, ~100% coding automation) when we hit a time horizon of 14 years, which is less than your median of 125 years and should give timelines before 2045 or so. Uplift might be increasing even faster than this, in which case timelines will be shorter. We don’t have any hard data on uplift quite yet, but I suspect that our guesses at uplift should already make us downweight longer timelines, and in Q2 or Q3 I hope we can prove it.
I would still put some weight on longer timelines, but to me this uncertainty doesn’t live in something like d.d.g.f. My understanding of the AIFM is that uncertainty in all the time horizon parameters cashes out in the effective compute required for uplift. In this frame, the remaining uncertainty lives in whether the logistic curve in log E holds—whether ease of automation in the future is similar to ease of early-stage automation we’re already observing.
Yeah, my all-things-considered views are definitely more uncertain. I don’t have well-considered probabilities—I’d probably need to think about it more and perhaps build more simple models that apply in long-timelines cases.
Where are you getting this 1.6 number?
With respect to the rest of your comment, it feels to me like we have such little evidence about current uplift and what trend it follows (e.g. whether this assumption about a % automation curve that is logistic and its translation to uplift is a reasonable functional form). I’m not sure how strongly we disagree though. I’m much more skeptical of the claim that uplift can give much tighter confidence intervals than that it can give similar or slightly better ones. Again, this could change if we had much better data in a year or two.
I got it from your website:
As for the rest, seems reasonable. I think you can’t get around the uncertainty by modeling uplift as some more complicated function of coding automation fraction as in the AIFM, because you’re still assuming that’s logistic, we can’t measure it any better than uplift, plus we’re still uncertain how they’re related. So we really do need better data.
But in the AIFM the coding automation logistic is there to predict the dynamics regarding how much coding automation speeds progress pre-AC. It doesn’t have to do with setting the effective compute requirement for AC. I might be misunderstanding something, sorry if so.
Re: the 1.6 number, oh that should actually be 1.8 sorry. I think it didn’t get updated after a last minute change to the parameter value. I will fix that soon. Also, that’s the parallel uplift. In our model, the serial multiplier/uplift is sqrt(parallel uplift).
In my model it’s parallel uplift too. Effective labor (human+AI) still goes through the diminishing returns power to get to serial uplift, which I estimate as between roughly 0.1 and 0.3.