Thanks for the trendlines—they help us understand when AI can automate years of work!
Like you said, the choice of tasks can heavily change the trendline
Our estimate of the length of tasks that an agent can complete depends on methodological choices like the tasks used and the humans whose performance is measured. However, we’re fairly confident that the overall trend is roughly correct, at around 1-4 doublings per year. If the measured trend from the past 6 years continues for 2-4 more years, generalist autonomous agents will be capable of performing a wide range of week-long tasks.
I believe SWE-bench is the best benchmark to control for variables like the choice of task and how the agentic system is built, so I’m leaning more towards the doubling time of 70 days.
For a large scale / complex app, it takes around 1 year of development (though this is not a completely fair estimate since it doesn’t take into account the number of man-hours), but going with this estimate and doubling in SWE-bench, it takes around 13 doublings from the beginning of 2025 or June 2027 to automate production of entire apps / complex websites.
Another big factor that this trendline and other trendlines don’t take into account is the amount of AI acceleration. If AI automates a large portion of the work, the time to double would shorten as AI gets better, I’d be interested to see how that would affect this model.
Thanks for the trendlines—they help us understand when AI can automate years of work!
Like you said, the choice of tasks can heavily change the trendline
I believe SWE-bench is the best benchmark to control for variables like the choice of task and how the agentic system is built, so I’m leaning more towards the doubling time of 70 days.
For a large scale / complex app, it takes around 1 year of development (though this is not a completely fair estimate since it doesn’t take into account the number of man-hours), but going with this estimate and doubling in SWE-bench, it takes around 13 doublings from the beginning of 2025 or June 2027 to automate production of entire apps / complex websites.
Another big factor that this trendline and other trendlines don’t take into account is the amount of AI acceleration. If AI automates a large portion of the work, the time to double would shorten as AI gets better, I’d be interested to see how that would affect this model.