Broadly agreed. My sense is that our default prediction from here should be to extrapolate out the METR horizon length trend (until compute becomes more scarce at least) with somewhere between the more recent faster doubling time (122 days) and the slower doubling time we’ve seen longer term (213 days) rather than expecting substantially faster than this trend in the short term.
So, maybe I expect a ~160 day doubling trend over the next 3 years which implies a 50% reliability horizon length of ~1 week in 2 years and ~1.5 months in 3 years. By the end of this trend, I expect small speed ups due to substantial automation of engineering in AI R&D and slow downs due to reducing availability of compute, while simultaneously, these AIs will be producing a bunch of value in the economy and AI revenue will continue to grow pretty quickly. But, 50% reliability at 1.5 month long easy-to-measure SWE tasks (or 80% reliability at week long tasks) doesn’t yield crazy automation, though such systems may well be superhuman in many ways that let them add a bunch of value.
(I think you’ll be seeing some superexponentiallity in the trend due to a mix of achieving generality and AI R&D automation, but I currently don’t expect this to make much of a difference by 50% reliability at 1.5 month long horizon lengths with easy to measure tasks.)
But, this isn’t that s-curve-y in terms of interpretation? It’s just that progress will proceed at a steady rate rather than yielding super powerful AIs within 3 years.
Also, I think ways in which GPT-5 is especially bad relative to o3 might be more evidence about this being a bad product launch in particular rather than evidence that progress is that much slower overall.
I plan on writing more about this topic in the future.
Broadly agreed. My sense is that our default prediction from here should be to extrapolate out the METR horizon length trend (until compute becomes more scarce at least) with somewhere between the more recent faster doubling time (122 days) and the slower doubling time we’ve seen longer term (213 days) rather than expecting substantially faster than this trend in the short term.
So, maybe I expect a ~160 day doubling trend over the next 3 years which implies a 50% reliability horizon length of ~1 week in 2 years and ~1.5 months in 3 years. By the end of this trend, I expect small speed ups due to substantial automation of engineering in AI R&D and slow downs due to reducing availability of compute, while simultaneously, these AIs will be producing a bunch of value in the economy and AI revenue will continue to grow pretty quickly. But, 50% reliability at 1.5 month long easy-to-measure SWE tasks (or 80% reliability at week long tasks) doesn’t yield crazy automation, though such systems may well be superhuman in many ways that let them add a bunch of value.
(I think you’ll be seeing some superexponentiallity in the trend due to a mix of achieving generality and AI R&D automation, but I currently don’t expect this to make much of a difference by 50% reliability at 1.5 month long horizon lengths with easy to measure tasks.)
But, this isn’t that s-curve-y in terms of interpretation? It’s just that progress will proceed at a steady rate rather than yielding super powerful AIs within 3 years.
Also, I think ways in which GPT-5 is especially bad relative to o3 might be more evidence about this being a bad product launch in particular rather than evidence that progress is that much slower overall.
I plan on writing more about this topic in the future.