It seems that your argument is based on high confidence in a METR time-horizon doubling time of roughly 7 months. But the available evidence suggests the doubling time is significantly lower.
In recent years we have observed shorter doubling times:
And what we know about labs’ internal models suggests this faster trend is holding up:
An important piece of evidence is OpenAI’s Gold performance at the International Mathematics Olympiad (IMO):
IMO participants get an average of 90 minutes per problem.
The gold medal cutoff at IMO 2025 was 35 out of 42 points (~83%)
They needed to get 5⁄6 problems fully correct (each question awards a maximum of 7 points), or a number of points equivalent to that.
This is a bit rough, but if their model had a METR-80 greater than 90 minutes then we would expect OpenAI to achieve Gold at least 50% of the time.
OpenAI staff members stated that a publicly released model of this capability could be expected at roughly the end of the year (and our METR trends are of course projections of publicly available models).
So, this implies a METR-80 greater than 90 minutes at December 2025.
The projected METR-80 according to a 3.45 month doubling time is 98 minutes.
So the Gold performance which was a massive surprise to many is actually right on-trend for 3.45 month doubling times. Of course, one might object that OpenAI may have just gotten lucky. But Google also got Gold! so we have two points of data.
And here’s a recent comment from Sam Altman where he states that he expects time-horizons days in length in 2026:
Which a 7 month doubling time would not achieve, but which is in line with a doubling-time of 3.45 (that would get us to a time-horizon of roughly 3 days in December, 2026).
I expect a roughly 5.5 month doubling time in the next year or two, but somewhat lower seems pretty likely. The proposed timeline I gave consistent with Anthropic’s predictions requires <1 month doubling times (and this is prior to >2x AI R&D acceleration, at least given my view of what you get at that level of capability).
It seems that your argument is based on high confidence in a METR time-horizon doubling time of roughly 7 months. But the available evidence suggests the doubling time is significantly lower.
In recent years we have observed shorter doubling times:
And what we know about labs’ internal models suggests this faster trend is holding up:
An important piece of evidence is OpenAI’s Gold performance at the International Mathematics Olympiad (IMO):
IMO participants get an average of 90 minutes per problem.
The gold medal cutoff at IMO 2025 was 35 out of 42 points (~83%)
They needed to get 5⁄6 problems fully correct (each question awards a maximum of 7 points), or a number of points equivalent to that.
This is a bit rough, but if their model had a METR-80 greater than 90 minutes then we would expect OpenAI to achieve Gold at least 50% of the time.
OpenAI staff members stated that a publicly released model of this capability could be expected at roughly the end of the year (and our METR trends are of course projections of publicly available models).
So, this implies a METR-80 greater than 90 minutes at December 2025.
The projected METR-80 according to a 3.45 month doubling time is 98 minutes.
So the Gold performance which was a massive surprise to many is actually right on-trend for 3.45 month doubling times. Of course, one might object that OpenAI may have just gotten lucky. But Google also got Gold! so we have two points of data.
And here’s a recent comment from Sam Altman where he states that he expects time-horizons days in length in 2026:
Which a 7 month doubling time would not achieve, but which is in line with a doubling-time of 3.45 (that would get us to a time-horizon of roughly 3 days in December, 2026).
And here’s a recent comment from Jakub Pachocki:
I expect a roughly 5.5 month doubling time in the next year or two, but somewhat lower seems pretty likely. The proposed timeline I gave consistent with Anthropic’s predictions requires <1 month doubling times (and this is prior to >2x AI R&D acceleration, at least given my view of what you get at that level of capability).