swe-bench pass @ 1 on Claude sonnet versions has been 33.4% (June − 3.5) → 49.0% (October) → 62.3% (Feb − 3.7) → 72.7% (May → 4). That’s practically linear at 3.5% gain/month. That would extrapolate to end of August at 83%.
With the leaderboard at ~75.2% on July 1, such an extrapolation also gets us to around 82%.
swe-bench pass @ 1 on Claude sonnet versions has been 33.4% (June − 3.5) → 49.0% (October) → 62.3% (Feb − 3.7) → 72.7% (May → 4). That’s practically linear at 3.5% gain/month. That would extrapolate to end of August at 83%.
With the leaderboard at ~75.2% on July 1, such an extrapolation also gets us to around 82%.