Aaron Staley comments on tdko’s Shortform

Aaron Staley 29 Jul 2025 16:20 UTC
3 points
0
I don’t see that producing much of an update. Its SWE-bench score as you note was only 59.6%, which naively maps to ~50 minutes METR.
- Cole Wyeth 30 Jul 2025 0:28 UTC
  2 points
  0
  Parent
  I still think it’s comforting to observe that the task lengths are not increasing as quickly as feared.
  This is as I predicted so far but we’ll see about GPT-5.