I disagree that the old trend better predicted Grok 4 and GPT-5. Here’s my plot (source, interactive) with the trendlines from METR’s time horizons paper: orange is the 2022-2025 trend of 7 month doubling time, red is the 2024-2025 trend of 4 month doubling time.
Both trendlines were calculated before the release of o3, Grok 4 or GPT-5, so I consider those three datapoints falling close to the 4 month doubling time line to be evidence for that line. Reading off the graph, o3 was about a month ahead of schedule, and Grok 4 and GPT-5 were both about a month behind schedule. I wonder if that is partially explained by OpenAI waiting longer before releasing GPT-5 (it sounds like METR had access for a bit longer).
Those points arent close to the 4 month doubling time line. The line is way above them. A month behind schedule is a lot when your schedule is a 4 month doubling time.
To be fair they also don’t look that close to the slower (6 month?) doubling time line, I guess we’re still on a slightly faster trend. I’m probably seeing what I expected to see here; I expected the slope to level off and it’s easy for me to read that off of the graph even though it’s not really clear yet.
I disagree that the old trend better predicted Grok 4 and GPT-5. Here’s my plot (source, interactive) with the trendlines from METR’s time horizons paper: orange is the 2022-2025 trend of 7 month doubling time, red is the 2024-2025 trend of 4 month doubling time.
Both trendlines were calculated before the release of o3, Grok 4 or GPT-5, so I consider those three datapoints falling close to the 4 month doubling time line to be evidence for that line. Reading off the graph, o3 was about a month ahead of schedule, and Grok 4 and GPT-5 were both about a month behind schedule. I wonder if that is partially explained by OpenAI waiting longer before releasing GPT-5 (it sounds like METR had access for a bit longer).
Those points arent close to the 4 month doubling time line. The line is way above them. A month behind schedule is a lot when your schedule is a 4 month doubling time.
To be fair they also don’t look that close to the slower (6 month?) doubling time line, I guess we’re still on a slightly faster trend. I’m probably seeing what I expected to see here; I expected the slope to level off and it’s easy for me to read that off of the graph even though it’s not really clear yet.