I don’t think you can just start at the HCAST timeline for software engineering and map it to IMO problems.
Alternative bearish prediction would be deepthink got 50% on May 20 (not released, lab frontier) on USAMO. 80% is ~4x the task time of 50% ones (at least for software engineering—not sure what it is for math), so we needed two doublings (6 months) to pull this off and instead only have ~0.67.
I don’t think you can just start at the HCAST timeline for software engineering and map it to IMO problems.
Alternative bearish prediction would be deepthink got 50% on May 20 (not released, lab frontier) on USAMO. 80% is ~4x the task time of 50% ones (at least for software engineering—not sure what it is for math), so we needed two doublings (6 months) to pull this off and instead only have ~0.67.