I also had this very impression looking at the METR graph since the post-o3 growth returned to the old trend. Alas, there is Claude Opus 4.5 with its 4hr49 min time horizon, which is on the pre-o3 faster trend (see, however, twocomments pointing out that the METR benchmark is no longer as trustworthy as it once was and my potential explanation of Claude’s abnormally high 50%/80% time horizon ratio). I just can’t wait for METR to evaluate Gemini 3 Pro and/or GPT-5.2 (and GPT-5.2 Codex Max when it is released?) and see if the new crop of models has a high 50% time horizon without Claude’s issues...
I also had this very impression looking at the METR graph since the post-o3 growth returned to the old trend. Alas, there is Claude Opus 4.5 with its 4hr49 min time horizon, which is on the pre-o3 faster trend (see, however, two comments pointing out that the METR benchmark is no longer as trustworthy as it once was and my potential explanation of Claude’s abnormally high 50%/80% time horizon ratio). I just can’t wait for METR to evaluate Gemini 3 Pro and/or GPT-5.2 (and GPT-5.2 Codex Max when it is released?) and see if the new crop of models has a high 50% time horizon without Claude’s issues...
See my comment trying to pushback on Daniel and Eli. I feel we both are on similar conclusions.