What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems. It’s still a bit weird, high school math with olympiad addons is still a somewhat narrow toolkit, but for technical problems of many other kinds the mental move toolkits are not qualitatively different, even if they are larger. The claim is that solving IMO is a qualitatively new milestone from the point of view of this framing, it’s evidence about AGI potential of LLMs at the near-current scale in a way that previous results were not.
I agree that there could still be gaps and “generality” of IMO isn’t a totalizing magic that prevents existence of crucial remaining gaps. I’m not strongly claiming there aren’t any crucial gaps, just that with IMO as an example it’s no longer obvious there are any, at least as long as the training methods used for IMO can be adopted to those other areas, which isn’t always obviously the case. And of course continual learning could prove extremely hard. But there also isn’t strong evidence that it’s extremely hard yet, because it wasn’t a focus for very long while LLMs at current levels of capabilities were already available. And the capabilities of in-context learning with 50M token contexts and even larger LLMs haven’t been observed yet.
So it’s a question of calibration. There could always be substantial obstructions such that it’s no longer obvious that they are there even though they are. But also at some point there actually aren’t any. So always suspecting currently unobservable crucial obstructions is not the right heuristic either, the prediction of when the problem could actually be solved needs to be allowed to respond to some sort of observable evidence.
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems.
I took you to be saying
math is a general domain
IMO is fairly hard math
LLMs did the IMO
therefore LLMs can do well in a general domain
therefore probably maybe LLMs are generally intelligent.
But maybe you instead meant
working out math problems applying known methods is a general domain
?
Anyway, “general domain” still does not make sense here. The step from 4 to 5 is not supported by this concept of “general domain” as you’re applying it here.
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems. It’s still a bit weird, high school math with olympiad addons is still a somewhat narrow toolkit, but for technical problems of many other kinds the mental move toolkits are not qualitatively different, even if they are larger. The claim is that solving IMO is a qualitatively new milestone from the point of view of this framing, it’s evidence about AGI potential of LLMs at the near-current scale in a way that previous results were not.
I agree that there could still be gaps and “generality” of IMO isn’t a totalizing magic that prevents existence of crucial remaining gaps. I’m not strongly claiming there aren’t any crucial gaps, just that with IMO as an example it’s no longer obvious there are any, at least as long as the training methods used for IMO can be adopted to those other areas, which isn’t always obviously the case. And of course continual learning could prove extremely hard. But there also isn’t strong evidence that it’s extremely hard yet, because it wasn’t a focus for very long while LLMs at current levels of capabilities were already available. And the capabilities of in-context learning with 50M token contexts and even larger LLMs haven’t been observed yet.
So it’s a question of calibration. There could always be substantial obstructions such that it’s no longer obvious that they are there even though they are. But also at some point there actually aren’t any. So always suspecting currently unobservable crucial obstructions is not the right heuristic either, the prediction of when the problem could actually be solved needs to be allowed to respond to some sort of observable evidence.
I took you to be saying
math is a general domain
IMO is fairly hard math
LLMs did the IMO
therefore LLMs can do well in a general domain
therefore probably maybe LLMs are generally intelligent.
But maybe you instead meant
working out math problems applying known methods is a general domain
?
Anyway, “general domain” still does not make sense here. The step from 4 to 5 is not supported by this concept of “general domain” as you’re applying it here.