Don’t have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don’t know is whether it was done with a general LLM like OAI or a narrower one.
I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another. (Not saying you are, but that in general it is hard to tell which claim people are putting forward.)
Official results are in—Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress—huge congrats to @lmthang and the team! deepmind.google/discover/blo…
We achieved this year’s impressive result using an advanced version of Gemini Deep Think (an enhanced reasoning mode for complex problems). Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit! We’ll be making a version of this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI Ultra subscribers.
Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved
We’ve now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!
I don’t believe anyone was forecasting this result, no.
EDIT: Clarifying—many forecasts made no distinction whether an AI model had a major formal method component like AlphaProof or not. I’m drawing attention to the fact that the two situations are distinct and require distinct updates. What those are, I’m not sure yet.
Don’t have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don’t know is whether it was done with a general LLM like OAI or a narrower one.
I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another. (Not saying you are, but that in general it is hard to tell which claim people are putting forward.)
It seems your forecast here was wrong
I don’t believe anyone was forecasting this result, no.
EDIT: Clarifying—many forecasts made no distinction whether an AI model had a major formal method component like AlphaProof or not. I’m drawing attention to the fact that the two situations are distinct and require distinct updates. What those are, I’m not sure yet.
Oh I see, yeah that makes sense.