Person comments on OpenAI Claims IMO Gold Medal

Person 19 Jul 2025 15:38 UTC
14 points
4
Don’t have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don’t know is whether it was done with a general LLM like OAI or a narrower one.
- cdt 20 Jul 2025 18:30 UTC
  2 points
  0
  Parent
  I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another. (Not saying you are, but that in general it is hard to tell which claim people are putting forward.)
  - Garrett Baker 21 Jul 2025 18:39 UTC
    2 points
    0
    Parent
    It seems your forecast here was wrong
    
    Official results are in—Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress—huge congrats to @lmthang and the team! deepmind.google/discover/blo…
    
    We achieved this year’s impressive result using an advanced version of Gemini Deep Think (an enhanced reasoning mode for complex problems). Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit! We’ll be making a version of this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI Ultra subscribers.
    
    Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved
    
    We’ve now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!
    - cdt 21 Jul 2025 19:04 UTC
      3 points
      0
      Parent
      I don’t believe anyone was forecasting this result, no.
      EDIT: Clarifying—many forecasts made no distinction whether an AI model had a major formal method component like AlphaProof or not. I’m drawing attention to the fact that the two situations are distinct and require distinct updates. What those are, I’m not sure yet.
      - Garrett Baker 22 Jul 2025 5:37 UTC
        2 points
        0
        Parent
        Oh I see, yeah that makes sense.