I don’t believe there’s a strong correlation between mathematical ability and agentic coding tasks (as opposed to competition coding tasks where a stronger correlation exists).
Gemini 2.5 Pro is already was well ahead of O3 on IMO, but had worse swe-bench/METR scores.
Claude is relatively bad at math but has hovered around SOTA on agentic coding.
I don’t believe there’s a strong correlation between mathematical ability and agentic coding tasks (as opposed to competition coding tasks where a stronger correlation exists).
Gemini 2.5 Pro is already was well ahead of O3 on IMO, but had worse swe-bench/METR scores.
Claude is relatively bad at math but has hovered around SOTA on agentic coding.