Thoughts on extrapolating time horizons

Link post

(written for a Twitter audience)

Has AI progress slowed down? I’ll write some personal takes and predictions in this post.

The main metric I look at is METR’s time horizon, which measures the length of tasks agents can perform. It has been doubling for more than 6 years now, and might have sped up recently.

Image

By measuring the length of tasks AI agents can complete, we can get a continuous metric of AI capabilities.

Since 2019, the time horizon has been doubling every 7 months. But since 2024, it’s been doubling every 4 months. What if we irresponsibly extrapolated these to 2030?

Image

If AI progress continues at its recent rate, we get AI systems which can do one month (167 hours) of low-context SWE work by the end of 2027. If AI progress continues at the long-run historical rate, we get them by the end of 2029 instead.

Image

How to interpret one work-month? I’d say it’s something like the first project a new hire would do, or the type of work a researcher who just switched teams would be able to do in a month. Our time horizon metric currently doesn’t define high time horizons super sharply.

Changing the success rate threshold from 50% to 80% only shifts the extrapolation from recent progress by a few months, but shifts the extrapolation from the long-run historical rate by around a year.

Image

I don’t think these lines should be extrapolated much past one work-month, as progress will likely speed up even more once AIs are automating significant parts of AI research. Additionally, bottlenecks identified by Epoch AI might slow down compute scaling around 2030. https://​​epoch.ai/​​blog/​​can-ai-scaling-continue-through-2030

Our task suite is currently composed of well-scoped easily-scoreable tasks, which makes them pretty different from the type of work done in the real world. This means that we should be cautious when interpreting these extrapolations.

My best guess is that future models will be a closer fit to the extrapolation from recent progress than the extrapolation from long-run progress. But even the more conservative trend implies that AIs will be doing month-long tasks by the end of the decade.

More concretely, my median is that AI research will be automated by the end of 2028, and AI will be better than humans at >95% of current intellectual labor by the end of 2029.