I basically agree with everything you say here and wish we had a better way to try to ground AGI timelines forecasts. Do you recommend any other method? E.g. extrapolating revenue? Just thinking through arguments about whether the current paradigm will work, and then using intuition to make the final call? We discuss some methods that appeal to us here.
This parameter, “Doubling Difficulty Growth Factor”, can change the date of the first Automated Coder AI between 2028 and 2050.
Note that we allow it to go subexponential, so actually it can change the date arbitrarily far in the future if you really want it to. Also, dunno what’s happening with Eli’s parameters, but with my parameter settings putting the doubling difficulty growth factor to 1 (i.e. pure exponential trend, neither super or sub exponential) gets to AC in 2035. (Though I don’t think we should put much weight on this number, as it depends on other parameters which are subjective & important too, such as the horizon length which corresponds to AC, which people disagree a lot about)
The simple model I mentioned on Slack (still WIP, hopefully to be written up this week) tracks capability directly in terms of labor speedup and extrapolates that. Of course, for a more serious timelines forecast you have to ground it in some data.
Here’s what I said to Eli on Slack; I don’t really have more thoughts since then
we can get f_2026 [uplift fraction in 2026] from
transcripts of realistic cursor usage + success judge + difficulty judge calibrated on tasks of known lengths
uplift study
asking lab people about their current uplift (since parallel uplift and 1/(1-f) are equivalent in the simple model)
v [velocity of automation as capabilities improve] can be obtained by
guessing the distribution of tasks, using time horizon, maybe using a correction factor for real vs benchmark time horizon
multiple uplift studies over time
comparing older models to newer ones, or having them try things people use 4.5 opus for
Nice. Yeah I also am excited about coding uplift as a key metric to track that would probably make time horizons obsolete (or at least, constitute a significantly stronger source of evidence than time horizons). We at AIFP don’t have capacity to estimate the trend in uplift over time (I mean we can do small-N polls of frontier AI company employees...) but we hope someone does.
I basically agree with everything you say here and wish we had a better way to try to ground AGI timelines forecasts. Do you recommend any other method? E.g. extrapolating revenue? Just thinking through arguments about whether the current paradigm will work, and then using intuition to make the final call? We discuss some methods that appeal to us here.
Note that we allow it to go subexponential, so actually it can change the date arbitrarily far in the future if you really want it to. Also, dunno what’s happening with Eli’s parameters, but with my parameter settings putting the doubling difficulty growth factor to 1 (i.e. pure exponential trend, neither super or sub exponential) gets to AC in 2035. (Though I don’t think we should put much weight on this number, as it depends on other parameters which are subjective & important too, such as the horizon length which corresponds to AC, which people disagree a lot about)
The simple model I mentioned on Slack (still WIP, hopefully to be written up this week) tracks capability directly in terms of labor speedup and extrapolates that. Of course, for a more serious timelines forecast you have to ground it in some data.
Here’s what I said to Eli on Slack; I don’t really have more thoughts since then
we can get f_2026 [uplift fraction in 2026] from
transcripts of realistic cursor usage + success judge + difficulty judge calibrated on tasks of known lengths
uplift study
asking lab people about their current uplift (since parallel uplift and 1/(1-f) are equivalent in the simple model)
v [velocity of automation as capabilities improve] can be obtained by
guessing the distribution of tasks, using time horizon, maybe using a correction factor for real vs benchmark time horizon
multiple uplift studies over time
comparing older models to newer ones, or having them try things people use 4.5 opus for
listing how many things get automated each year
Nice. Yeah I also am excited about coding uplift as a key metric to track that would probably make time horizons obsolete (or at least, constitute a significantly stronger source of evidence than time horizons). We at AIFP don’t have capacity to estimate the trend in uplift over time (I mean we can do small-N polls of frontier AI company employees...) but we hope someone does.