Erich_Grunewald comments on Orienting to 3 year AGI timelines

Erich_Grunewald 3 Aug 2025 11:58 UTC
1 point
0
I don’t know why it’s reporting the current SWE-Bench SOTA as 60%? The official leaderboard has Claude 4 Sonnet at 65%. (Epoch has Claude 4 Opus at 62%, but the originalPost linked the official leaderboard.) Of course either way it’s far below 85%, unless you allow for best-of-N sampling via a scoring model, which brings Claude 4 up to 80%.
- megasilverfist 4 Aug 2025 6:21 UTC
  1 point
  0
  Parent
  Updated with a note.