jamjam comments on Cole Wyeth’s Shortform

jamjam 19 Nov 2025 0:51 UTC
2 points
0
Simple evidence to the contrary: Sonnet 4.5 is SOTA on SWE bench yet lags notably behind GPT-5 on METR task length (and the difference in SWE bench scores is greater here than the difference between 3.0 pro/sonnet)
- Cole Wyeth 19 Nov 2025 2:30 UTC
  2 points
  0
  Parent
  Yes, they’re not consistently highly correlated, my guess could be wrong.