This question resolves on the date an AI system competes well enough on an IMO test to earn the equivalent of a gold medal. The IMO test must be most current IMO test at the time the feat is completed (previous years do not qualify).”The IMO test must be most current IMO test at the time the feat is completed (previous years do not qualify).”
I think this was defined on purpose to avoid such contamination. It also seems common sense to me that, when training a system to perform well on IMO 2026, you cannot include any data point from after the questions were made public.
At the same time training on previous IMO/math contest questions should be fair game. All human contestants practice quite a lot on questions from previous contents, and IMO is still very challenging for them.
I dunno, I think there are a LOT of old olympiad problems—not just all the old IMOs but also all the old national-level tests from every country that publishes them. (Bottom section here.) I think that even the most studious humans only study a small fraction of existing problems, I think. Like, if someone literally read every olympiad-level problem and solution ever published, then went to a new IMO, I would expect them to find that at least a couple of the problems were sufficiently similar to something they’ve seen that they could get the answer without too much creativity. (That’s just a guess, not really based on anything.)
(That’s not enough for a gold by itself, but could be part of the plan, in conjunction with special-case AIs for particular common types of problems, and self-play-proof-assistant things, etc.)
I know a guy from the Physics Olympiads that was a mobile library of past olympiads problems. I think you’re underestimating the level of weirdness you can find around. Maybe it’s still a fraction of the existing problems, but I’d estimate enough to cover non-redundant ones.
I would expect them to find that at least a couple of the problems were sufficiently similar to something they’ve seen that they could get the answer without too much creativity.
I’ve not been to the IMO but I’d bet this already happens from comments I overheard by people who have been.
I see about ~100 book in there. I met several IMO gold-medal winners and I expect most of them to have read dozens of these books, or the equivalent in other forms. I know one who has read tens of olympiad-level books in geometry alone!
And yes, you’re right that they would often pick one or two problems as similar to what they had seen in the past, but I suspect these problems still require a lot of reasoning even after the analogy has been established. I may be wrong, though.
We can probably inform this debate by getting the latest IMO and creating a contest for people to find which existing problems are the most similar to those in the exam. :)
From Metaculus’ resolution criteria:
I think this was defined on purpose to avoid such contamination. It also seems common sense to me that, when training a system to perform well on IMO 2026, you cannot include any data point from after the questions were made public.
At the same time training on previous IMO/math contest questions should be fair game. All human contestants practice quite a lot on questions from previous contents, and IMO is still very challenging for them.
I dunno, I think there are a LOT of old olympiad problems—not just all the old IMOs but also all the old national-level tests from every country that publishes them. (Bottom section here.) I think that even the most studious humans only study a small fraction of existing problems, I think. Like, if someone literally read every olympiad-level problem and solution ever published, then went to a new IMO, I would expect them to find that at least a couple of the problems were sufficiently similar to something they’ve seen that they could get the answer without too much creativity. (That’s just a guess, not really based on anything.)
(That’s not enough for a gold by itself, but could be part of the plan, in conjunction with special-case AIs for particular common types of problems, and self-play-proof-assistant things, etc.)
I know a guy from the Physics Olympiads that was a mobile library of past olympiads problems. I think you’re underestimating the level of weirdness you can find around. Maybe it’s still a fraction of the existing problems, but I’d estimate enough to cover non-redundant ones.
I’ve not been to the IMO but I’d bet this already happens from comments I overheard by people who have been.
I see about ~100 book in there. I met several IMO gold-medal winners and I expect most of them to have read dozens of these books, or the equivalent in other forms. I know one who has read tens of olympiad-level books in geometry alone!
And yes, you’re right that they would often pick one or two problems as similar to what they had seen in the past, but I suspect these problems still require a lot of reasoning even after the analogy has been established. I may be wrong, though.
We can probably inform this debate by getting the latest IMO and creating a contest for people to find which existing problems are the most similar to those in the exam. :)