I disagree that the old trend better predicted Grok 4 and GPT-5. Here’s my plot (source, interactive) with the trendlines from METR’s time horizons paper: orange is the 2022-2025 trend of 7 month doubling time, red is the 2024-2025 trend of 4 month doubling time.
Both trendlines were calculated before the release of o3, Grok 4 or GPT-5, so I consider those three datapoints falling close to the 4 month doubling time line to be evidence for that line. Reading off the graph, o3 was about a month ahead of schedule, and Grok 4 and GPT-5 were both about a month behind schedule. I wonder if that is partially explained by OpenAI waiting longer before releasing GPT-5 (it sounds like METR had access for a bit longer).
Those points arent close to the 4 month doubling time line. The line is way above them. A month behind schedule is a lot when your schedule is a 4 month doubling time.
To be fair they also don’t look that close to the slower (6 month?) doubling time line, I guess we’re still on a slightly faster trend. I’m probably seeing what I expected to see here; I expected the slope to level off and it’s easy for me to read that off of the graph even though it’s not really clear yet.
The recent trend does not look superexponential though right?
It briefly looked like the slope had increased with reasoning models but at a glance the older trend better predicted Grok 4 and GPT-5.
Too early to tell IMO.
I disagree that the old trend better predicted Grok 4 and GPT-5. Here’s my plot (source, interactive) with the trendlines from METR’s time horizons paper: orange is the 2022-2025 trend of 7 month doubling time, red is the 2024-2025 trend of 4 month doubling time.
Both trendlines were calculated before the release of o3, Grok 4 or GPT-5, so I consider those three datapoints falling close to the 4 month doubling time line to be evidence for that line. Reading off the graph, o3 was about a month ahead of schedule, and Grok 4 and GPT-5 were both about a month behind schedule. I wonder if that is partially explained by OpenAI waiting longer before releasing GPT-5 (it sounds like METR had access for a bit longer).
Those points arent close to the 4 month doubling time line. The line is way above them. A month behind schedule is a lot when your schedule is a 4 month doubling time.
To be fair they also don’t look that close to the slower (6 month?) doubling time line, I guess we’re still on a slightly faster trend. I’m probably seeing what I expected to see here; I expected the slope to level off and it’s easy for me to read that off of the graph even though it’s not really clear yet.