The way you’re using this concept is poisoning your mind. Generality of a domain does imply that if you can do all the stuff in that domain, then you are generally capable (and, depending, that could imply general intelligence; e.g. if you’ve ruled out GLUT-like things). But if you can do half of the things in the domain and not the other half, then you have to ask whether you’re exhibiting general competence in that domain, vs. competence in some sub-domain and incompetence in the general domain. Making this inference enthymemically is poisoning your mind.
For example, suppose that X is “self-play”. One important thing about self-play is that it’s an infinite source of data, provided in a sort of curriculum of increasing difficulty and complexity. Since we have the idea of self-play, and we have some examples of self-play that are successful (e.g. AlphaZero), aren’t we most of the way to having the full power of self-play? And isn’t the full power of self-play quite powerful, since it’s how evolution made AGI? I would say “doubtful”. The self-play that evolution uses (and the self-play that human children use) is much richer, containing more structural ideas, than the idea of having an agent play a game against a copy of itself.
Most instances of a category are not the most powerful, most general instances of that category. So just because we have, or will soon have, some useful instances of a category, doesn’t strongly imply that we can or will soon be able to harness most of the power of stuff in that category. I’m reminded of the politician’s syllogism: “We must do something. This is something. Therefore, we must do this.”.
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems. It’s still a bit weird, high school math with olympiad addons is still a somewhat narrow toolkit, but for technical problems of many other kinds the mental move toolkits are not qualitatively different, even if they are larger. The claim is that solving IMO is a qualitatively new milestone from the point of view of this framing, it’s evidence about AGI potential of LLMs at the near-current scale in a way that previous results were not.
I agree that there could still be gaps and “generality” of IMO isn’t a totalizing magic that prevents existence of crucial remaining gaps. I’m not strongly claiming there aren’t any crucial gaps, just that with IMO as an example it’s no longer obvious there are any, at least as long as the training methods used for IMO can be adopted to those other areas, which isn’t always obviously the case. And of course continual learning could prove extremely hard. But there also isn’t strong evidence that it’s extremely hard yet, because it wasn’t a focus for very long while LLMs at current levels of capabilities were already available. And the capabilities of in-context learning with 50M token contexts and even larger LLMs haven’t been observed yet.
So it’s a question of calibration. There could always be substantial obstructions such that it’s no longer obvious that they are there even though they are. But also at some point there actually aren’t any. So always suspecting currently unobservable crucial obstructions is not the right heuristic either, the prediction of when the problem could actually be solved needs to be allowed to respond to some sort of observable evidence.
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems.
I took you to be saying
math is a general domain
IMO is fairly hard math
LLMs did the IMO
therefore LLMs can do well in a general domain
therefore probably maybe LLMs are generally intelligent.
But maybe you instead meant
working out math problems applying known methods is a general domain
?
Anyway, “general domain” still does not make sense here. The step from 4 to 5 is not supported by this concept of “general domain” as you’re applying it here.
The way you’re using this concept is poisoning your mind. Generality of a domain does imply that if you can do all the stuff in that domain, then you are generally capable (and, depending, that could imply general intelligence; e.g. if you’ve ruled out GLUT-like things). But if you can do half of the things in the domain and not the other half, then you have to ask whether you’re exhibiting general competence in that domain, vs. competence in some sub-domain and incompetence in the general domain. Making this inference enthymemically is poisoning your mind.
Cf. https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#_We_just_need_X__intuitions :
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems. It’s still a bit weird, high school math with olympiad addons is still a somewhat narrow toolkit, but for technical problems of many other kinds the mental move toolkits are not qualitatively different, even if they are larger. The claim is that solving IMO is a qualitatively new milestone from the point of view of this framing, it’s evidence about AGI potential of LLMs at the near-current scale in a way that previous results were not.
I agree that there could still be gaps and “generality” of IMO isn’t a totalizing magic that prevents existence of crucial remaining gaps. I’m not strongly claiming there aren’t any crucial gaps, just that with IMO as an example it’s no longer obvious there are any, at least as long as the training methods used for IMO can be adopted to those other areas, which isn’t always obviously the case. And of course continual learning could prove extremely hard. But there also isn’t strong evidence that it’s extremely hard yet, because it wasn’t a focus for very long while LLMs at current levels of capabilities were already available. And the capabilities of in-context learning with 50M token contexts and even larger LLMs haven’t been observed yet.
So it’s a question of calibration. There could always be substantial obstructions such that it’s no longer obvious that they are there even though they are. But also at some point there actually aren’t any. So always suspecting currently unobservable crucial obstructions is not the right heuristic either, the prediction of when the problem could actually be solved needs to be allowed to respond to some sort of observable evidence.
I took you to be saying
math is a general domain
IMO is fairly hard math
LLMs did the IMO
therefore LLMs can do well in a general domain
therefore probably maybe LLMs are generally intelligent.
But maybe you instead meant
working out math problems applying known methods is a general domain
?
Anyway, “general domain” still does not make sense here. The step from 4 to 5 is not supported by this concept of “general domain” as you’re applying it here.