It’s certainly possible—but “efficient continual learning” sounds a lot like AGI! So, to say that is the thing missing for AGI is not such a strong statement about the distance left, is it?
I don’t think this is moving goalposts on the current paradigm. The word “continual” seems to have basically replaced “online” since the rise of LLMs—perhaps because they manage a bit of in-context learning which is sort-of-online but not-quite-continual and makes a distinction necessary. However, “a system that learns efficiently over the course of its lifetime” is basically what we always expected from AGI, e.g. this is roughly what Hofstadter claimed was missing in “Fluid Concepts and Creative Analogies” as far back as 1995.
I agree that we can’t rule out roughly current scale LLMs reaching AGI. I just want to guard against the implication (which others may read into your words) that this is some kind of default expectation.
The question for this subthread is the scale of LLMs necessary for first AGIs, what the IMO results say about that. Continual learning through post-training doesn’t obviously require more scale, and IMO is an argument about the current scale being almost sufficient. It could be very difficult conceptually/algorithmically to figure out how to actually do continual learning with automated post-training, but that still doesn’t need to depend on more scale for the underlying LLM, that’s my point about the implications of the IMO results. Before those results, it was far less clear if the current (or near term feasible) scale would be sufficient for the neural net cognitive engine part of the AGI puzzle.
It could be that LLMs can’t get there at the current scale because LLMs can’t get there at any (potentially physical) scale with the current architecture.
So in some sense yes that wouldn’t be a prototypical example of a scale bottleneck.
It’s certainly possible—but “efficient continual learning” sounds a lot like AGI! So, to say that is the thing missing for AGI is not such a strong statement about the distance left, is it?
I don’t think this is moving goalposts on the current paradigm. The word “continual” seems to have basically replaced “online” since the rise of LLMs—perhaps because they manage a bit of in-context learning which is sort-of-online but not-quite-continual and makes a distinction necessary. However, “a system that learns efficiently over the course of its lifetime” is basically what we always expected from AGI, e.g. this is roughly what Hofstadter claimed was missing in “Fluid Concepts and Creative Analogies” as far back as 1995.
I agree that we can’t rule out roughly current scale LLMs reaching AGI. I just want to guard against the implication (which others may read into your words) that this is some kind of default expectation.
The question for this subthread is the scale of LLMs necessary for first AGIs, what the IMO results say about that. Continual learning through post-training doesn’t obviously require more scale, and IMO is an argument about the current scale being almost sufficient. It could be very difficult conceptually/algorithmically to figure out how to actually do continual learning with automated post-training, but that still doesn’t need to depend on more scale for the underlying LLM, that’s my point about the implications of the IMO results. Before those results, it was far less clear if the current (or near term feasible) scale would be sufficient for the neural net cognitive engine part of the AGI puzzle.
It could be that LLMs can’t get there at the current scale because LLMs can’t get there at any (potentially physical) scale with the current architecture.
So in some sense yes that wouldn’t be a prototypical example of a scale bottleneck.