The question for this subthread is the scale of LLMs necessary for first AGIs, what the IMO results say about that. Continual learning through post-training doesn’t obviously require more scale, and IMO is an argument about the current scale being almost sufficient. It could be very difficult conceptually/algorithmically to figure out how to actually do continual learning with automated post-training, but that still doesn’t need to depend on more scale for the underlying LLM, that’s my point about the implications of the IMO results. Before those results, it was far less clear if the current (or near term feasible) scale would be sufficient for the neural net cognitive engine part of the AGI puzzle.
It could be that LLMs can’t get there at the current scale because LLMs can’t get there at any (potentially physical) scale with the current architecture.
So in some sense yes that wouldn’t be a prototypical example of a scale bottleneck.
The question for this subthread is the scale of LLMs necessary for first AGIs, what the IMO results say about that. Continual learning through post-training doesn’t obviously require more scale, and IMO is an argument about the current scale being almost sufficient. It could be very difficult conceptually/algorithmically to figure out how to actually do continual learning with automated post-training, but that still doesn’t need to depend on more scale for the underlying LLM, that’s my point about the implications of the IMO results. Before those results, it was far less clear if the current (or near term feasible) scale would be sufficient for the neural net cognitive engine part of the AGI puzzle.
It could be that LLMs can’t get there at the current scale because LLMs can’t get there at any (potentially physical) scale with the current architecture.
So in some sense yes that wouldn’t be a prototypical example of a scale bottleneck.