Discrete capabilities progress seemsslower this year than in 2024 (but 2024 was insanely fast). Kudos to this person for registering predictions and so reminding us what really above-trend would have meant concretely. The excellent forecaster Eli was also over-optimistic.
I haven’t done a thorough look but I think so far progress is somewhat below my predictions but not by a huge amount, with still a few weeks left in the year? If the AI 2025 predictions are what you’re referring to.
I believe the SOTA benchmark scores are higher than I predicted for Cybench, right on for OSWorld, and lower for RE-Bench, SWE-Bench Verified, and FrontierMath. RE-Bench is the one I was most wrong on though.
For non-benchmark results, I believe that the sum of annualized revenues is higher than I predicted (but the Americans’ importance lower). I think that OpenAI has hit both CBRN high and Cyber medium. They’ve removed/renamed model autonomy and persuasion.
I haven’t done a thorough look but I think so far progress is somewhat below my predictions but not by a huge amount, with still a few weeks left in the year? If the AI 2025 predictions are what you’re referring to.
I believe the SOTA benchmark scores are higher than I predicted for Cybench, right on for OSWorld, and lower for RE-Bench, SWE-Bench Verified, and FrontierMath. RE-Bench is the one I was most wrong on though.
For non-benchmark results, I believe that the sum of annualized revenues is higher than I predicted (but the Americans’ importance lower). I think that OpenAI has hit both CBRN high and Cyber medium. They’ve removed/renamed model autonomy and persuasion.
Will link this!