Gemini 2.5 pro is way worse at IMO and got 30%, and DeepThink version gets gold??
But it’s more finetuned for IMOlike problems, but I bet the OpenAI’s model was too.
Both use “novel RL methods”.
Hmm, “access to a set of high-quality solutions to previous problems and general hints and tips on how to approach IMO problems”, seems like system prompt, as they claim no tool use like OpenAI.
Both models failed the 6th question which required more creativity
Deepmind’s solutions are more organized, more readable, more well written than OpenAI’s.
But OpenAI’s style is also more compressed to save tokens, so maybe going more out of human-like language into more out of distribution territory will be the future (Neuralese).
Did OpenAI and DeepMind somehow hack the methodology, or do these new general language models truly generalize more?
Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/
Whaaat!?
Gemini 2.5 pro is way worse at IMO and got 30%, and DeepThink version gets gold??
But it’s more finetuned for IMOlike problems, but I bet the OpenAI’s model was too.
Both use “novel RL methods”.
Hmm, “access to a set of high-quality solutions to previous problems and general hints and tips on how to approach IMO problems”, seems like system prompt, as they claim no tool use like OpenAI.
Both models failed the 6th question which required more creativity
Deepmind’s solutions are more organized, more readable, more well written than OpenAI’s.
But OpenAI’s style is also more compressed to save tokens, so maybe going more out of human-like language into more out of distribution territory will be the future (Neuralese).
Did OpenAI and DeepMind somehow hack the methodology, or do these new general language models truly generalize more?