For posterity, my AI 2026 forecast for EOY this year. Looks like I don’t substantially disagree with the median predictions anywhere, except I think the frontiermath and remote labor index benchmarks will saturate a bit sooner than the typical respondent, and I think the software optimization benchmark will saturate slower (because that benchmark has a “review the model outputs and remove points if the model hacked the evaluation criteria instead of actually solving the problem” step, and the trend line shows the score without that correction).
For posterity, my AI 2026 forecast for EOY this year. Looks like I don’t substantially disagree with the median predictions anywhere, except I think the frontiermath and remote labor index benchmarks will saturate a bit sooner than the typical respondent, and I think the software optimization benchmark will saturate slower (because that benchmark has a “review the model outputs and remove points if the model hacked the evaluation criteria instead of actually solving the problem” step, and the trend line shows the score without that correction).