I just got back my results from the 2025 AI Forecasting Survey. I scored 31st out of 413 forecasters. Some takeaways about my personal performance:
I generally overestimated model improvements on benchmarks. It’s particularly surprising to me how little SWE-Bench Verified moved given how much labs are optimizing general SWE tasks and how much they care about doing well on this benchmark. I haven’t looked at the very hardest tasks in SWE-Bench Verified—it’s possible they are notably harder than the other ones
Other forecasters and I underestimated AI lab revenues. Some of this was because the 2024 baseline revenue numbers were a bit out of date and should’ve been 6.6B (by no fault of the organizers, as I believe this information came out after the survey was created). But it’s also because AI revenues are continuing to grow at a crazy rate that shocks even AI bulls. Anthropic in particular has 10x’d its revenue for the past three years
I just got back my results from the 2025 AI Forecasting Survey. I scored 31st out of 413 forecasters. Some takeaways about my personal performance:
I generally overestimated model improvements on benchmarks. It’s particularly surprising to me how little SWE-Bench Verified moved given how much labs are optimizing general SWE tasks and how much they care about doing well on this benchmark. I haven’t looked at the very hardest tasks in SWE-Bench Verified—it’s possible they are notably harder than the other ones
Other forecasters and I underestimated AI lab revenues. Some of this was because the 2024 baseline revenue numbers were a bit out of date and should’ve been 6.6B (by no fault of the organizers, as I believe this information came out after the survey was created). But it’s also because AI revenues are continuing to grow at a crazy rate that shocks even AI bulls. Anthropic in particular has 10x’d its revenue for the past three years