megasilverfist comments on Orienting to 3 year AGI timelines

megasilverfist 3 Aug 2025 11:00 UTC
1 point
0
I also did some normal searches to verify ~~but I think this LLM summary is pretty good.~~ I misread one of the LLM’s tables, so the summary looks unreliable enough that I deleted it but normal searches still support my first comment, Erich posts some relevant links.
- Erich_Grunewald 3 Aug 2025 11:58 UTC
  1 point
  0
  Parent
  I don’t know why it’s reporting the current SWE-Bench SOTA as 60%? The official leaderboard has Claude 4 Sonnet at 65%. (Epoch has Claude 4 Opus at 62%, but the originalPost linked the official leaderboard.) Of course either way it’s far below 85%, unless you allow for best-of-N sampling via a scoring model, which brings Claude 4 up to 80%.
  - megasilverfist 4 Aug 2025 6:21 UTC
    1 point
    0
    Parent
    Updated with a note.