It might as well be possible that o3 solved problems only from the first tier, which is nowhere near as groundbreaking as solving the harder problems from the benchmark
I don’t think this info was about o3 (please correct me if I’m wrong). While this suggests not all of them were from the first tier, it would be much better to know what it actually was. Especially, since the most famous quotes about FrontierMath (“extremely challenging” and “resist AIs for several years at least”) were about the top 25% hardest problems, the accuracy on that set seems more important to update on with them. (not to say that 25% is a small feat in any case).
3⁄9 Although o3 solved problems in all three tiers, it likely still struggles on the most formidable Tier 3 tasks—those “exceptionally hard” challenges that Tao and Gowers say can stump even top mathematicians.
This doesn’t appear to be the case:
https://x.com/elliotglazer/status/1871812179399479511
I don’t think this info was about o3 (please correct me if I’m wrong). While this suggests not all of them were from the first tier, it would be much better to know what it actually was. Especially, since the most famous quotes about FrontierMath (“extremely challenging” and “resist AIs for several years at least”) were about the top 25% hardest problems, the accuracy on that set seems more important to update on with them. (not to say that 25% is a small feat in any case).
Although it’s not made explicit, we can deduce that it’s at least in part about o3 from this earlier Tweet from the same person:
https://x.com/ElliotGlazer/status/1870613418644025442