StanislavKrym comments on MP’s Shortform

StanislavKrym 7 Jan 2026 1:19 UTC
2 points
0
I also had this very impression looking at the METR graph since the post-o3 growth returned to the old trend. Alas, there is Claude Opus 4.5 with its 4hr49 min time horizon, which is on the pre-o3 faster trend (see, however, two comments pointing out that the METR benchmark is no longer as trustworthy as it once was and my potential explanation of Claude’s abnormally high 50%/80% time horizon ratio). I just can’t wait for METR to evaluate Gemini 3 Pro and/or GPT-5.2 (and GPT-5.2 Codex Max when it is released?) and see if the new crop of models has a high 50% time horizon without Claude’s issues...
- MP 8 Jan 2026 14:53 UTC
  2 points
  0
  Parent
  See my comment trying to pushback on Daniel and Eli. I feel we both are on similar conclusions.