To put into perspective, there was only an 8% chance P3 would be this easy, putting substantial weight on the “unexpected” part being the problem being so easy. It’s also the first time in 20 years (5% chance) that 5 problems were of difficulty ⇐ 25.
Indeed, knowing that Gemini 2.5 Deep Think could solve an N25 (IMO result from Gemini 2.5 pro) and an A30 (known from Gemini 2.5 Deep think post), I’m somewhat less impressed. Only barriers were a medium-ish geometry problem (P2), which of course alpha geometry could solve and an easy combinatorics (P1).
* OpenAI’s LLM was able to solve a medium level geometry problem. (guessing Deepmind just used alpha geometry again) - Furman thought this would be hard for informal methods. * OpenAI’s LLM is strong enough to get the easy combinatorics problem (Furman noted informal methods would likely outperform formal ones on this one—just a matter if the LLM were smart enough)
To put into perspective, there was only an 8% chance P3 would be this easy, putting substantial weight on the “unexpected” part being the problem being so easy. It’s also the first time in 20 years (5% chance) that 5 problems were of difficulty ⇐ 25.
Indeed, knowing that Gemini 2.5 Deep Think could solve an N25 (IMO result from Gemini 2.5 pro) and an A30 (known from Gemini 2.5 Deep think post), I’m somewhat less impressed. Only barriers were a medium-ish geometry problem (P2), which of course alpha geometry could solve and an easy combinatorics (P1).
The two most impressive things are, factoring this write up by Ralph Furman:
* OpenAI’s LLM was able to solve a medium level geometry problem. (guessing Deepmind just used alpha geometry again) - Furman thought this would be hard for informal methods.
* OpenAI’s LLM is strong enough to get the easy combinatorics problem (Furman noted informal methods would likely outperform formal ones on this one—just a matter if the LLM were smart enough)