The headline result was obviously going to happen, not an update for anyone paying attention.
“Obviously going to happen” is very different from ‘happens at this point in time rather than later or sooner and with this particular announcement by this particular company’. You should still update off this. Hell, I was pretty confident this would be first done by Google DeepMind, so its a large update for me (I don’t know what for yet though)!
Your claim “not an update for anyone paying attention” also seems false. I’m sure there are many who are updating off this who were paying attention, for whatever reason, as they likely should.
I generally dislike this turn of phrase as it serves literally no purpose but to denigrate people who are changing their mind in light of evidence, which is just a bad thing to do.
Don’t have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don’t know is whether it was done with a general LLM like OAI or a narrower one.
I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another. (Not saying you are, but that in general it is hard to tell which claim people are putting forward.)
Official results are in—Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress—huge congrats to @lmthang and the team! deepmind.google/discover/blo…
We achieved this year’s impressive result using an advanced version of Gemini Deep Think (an enhanced reasoning mode for complex problems). Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit! We’ll be making a version of this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI Ultra subscribers.
Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved
We’ve now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!
I don’t believe anyone was forecasting this result, no.
EDIT: Clarifying—many forecasts made no distinction whether an AI model had a major formal method component like AlphaProof or not. I’m drawing attention to the fact that the two situations are distinct and require distinct updates. What those are, I’m not sure yet.
I generally dislike this turn of phrase as it serves literally no purpose but to denigrate people who are changing their mind in light of evidence, which is just a bad thing to do.
Well, fair enough, but I did specify that the surrounding context was an update.
You said “The other claims are interesting” which maybe could include “this particular announcement”, but not “at this point in time rather than later or sooner” or “by this particular company”. I also object on the grounds that the “headline result” is not “not an update for anyone paying attention”. To give proof, see this manifold market, which before the release of this model was at like 40%.
So the market was previously around 85%, and then it went down as we got further through the year. I guess this proves that many people didn’t expect it to happen in the next few months. The question wasn’t really load bearing for my models, and you’re right that I am not particularly interested that it happened at this point in time or by this particular company.
“Obviously going to happen” is very different from ‘happens at this point in time rather than later or sooner and with this particular announcement by this particular company’. You should still update off this. Hell, I was pretty confident this would be first done by Google DeepMind, so its a large update for me (I don’t know what for yet though)!
Your claim “not an update for anyone paying attention” also seems false. I’m sure there are many who are updating off this who were paying attention, for whatever reason, as they likely should.
I generally dislike this turn of phrase as it serves literally no purpose but to denigrate people who are changing their mind in light of evidence, which is just a bad thing to do.
Don’t have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don’t know is whether it was done with a general LLM like OAI or a narrower one.
I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another. (Not saying you are, but that in general it is hard to tell which claim people are putting forward.)
It seems your forecast here was wrong
I don’t believe anyone was forecasting this result, no.
EDIT: Clarifying—many forecasts made no distinction whether an AI model had a major formal method component like AlphaProof or not. I’m drawing attention to the fact that the two situations are distinct and require distinct updates. What those are, I’m not sure yet.
Oh I see, yeah that makes sense.
Well, fair enough, but I did specify that the surrounding context was an update.
You said “The other claims are interesting” which maybe could include “this particular announcement”, but not “at this point in time rather than later or sooner” or “by this particular company”. I also object on the grounds that the “headline result” is not “not an update for anyone paying attention”. To give proof, see this manifold market, which before the release of this model was at like 40%.
So the market was previously around 85%, and then it went down as we got further through the year. I guess this proves that many people didn’t expect it to happen in the next few months. The question wasn’t really load bearing for my models, and you’re right that I am not particularly interested that it happened at this point in time or by this particular company.