O O comments on tdko’s Shortform

O O 7 Aug 2025 19:51 UTC
6 points
1
It seems Gemini was ahead of openai on the IMO gold. The output was more polished so presumably they achieved a gold worthy model earlier. I expect gemini’s swe bench to thus at least be ahead of OpenAI’s 75%.
What links here?
- StanislavKrym's comment on ryan_greenblatt’s Shortform by ryan_greenblatt (8 Aug 2025 16:30 UTC; 3 points)
- Aaron Staley 7 Aug 2025 21:59 UTC
  5 points
  1
  Parent
  I don’t believe there’s a strong correlation between mathematical ability and agentic coding tasks (as opposed to competition coding tasks where a stronger correlation exists).
  1. Gemini 2.5 Pro is already was well ahead of O3 on IMO, but had worse swe-bench/METR scores.
  2. Claude is relatively bad at math but has hovered around SOTA on agentic coding.