And in particular, I think the fact that LLM capability degrades way faster than humans as @abramdemski saw is tied to a lack of continual learning, and ICL not being enough currently to actually subsititute for weight-level continual learning.
I don’t agree with this connection. Why would you think that continual learning would help with this specific sort of thing? It seems relevantly similar to just throwing more training data at the problem, which has shown only modest progress so far.
The key reason is to bend the shape of the curve, and my key crux is I don’t expect throwing more training data to change the shape of the curve where past a certain point, LLMs sigmoid/fall off hard, and my expectation is more training data would make LLMs improve, but they’d still have a point where once LLMs are asked to do any task harder than that point, LLMs start becoming incapable more rapidly in humans.
To quote Gwern:
But of course, the interesting thing here is that the human baselines do not seem to hit this sigmoid wall. It’s not the case that if a human can’t do a task in 4 hours there’s basically zero chance of them doing it in 48 hours and definitely zero chance of them doing it in 96 hours etc. Instead, human success rates seem to gradually flatline or increase over time, especially if we look at individual steps: the more time that passes, the higher the success rates become, and often the human will wind up solving the task eventually, no matter how unprepossessing the early steps seemed. In fact, we will often observe that a step that a human failed on earlier in the episode, implying some low % rate, will be repeated many times and quickly approach 100% success rates! And this is true despite earlier successes often being millions of vision+text+audio+sensorimotor tokens in the past (and interrupted by other episodes or tasks themselves equivalent to millions of tokens), raising questions about whether self-attention over a context window can possibly explain it.
It seems to me like the improvement in learning needed for what Gwern describes has little to do with “continual” and is more like “better learning” (better generalization, generalization from less examples).
I don’t agree with this connection. Why would you think that continual learning would help with this specific sort of thing? It seems relevantly similar to just throwing more training data at the problem, which has shown only modest progress so far.
The key reason is to bend the shape of the curve, and my key crux is I don’t expect throwing more training data to change the shape of the curve where past a certain point, LLMs sigmoid/fall off hard, and my expectation is more training data would make LLMs improve, but they’d still have a point where once LLMs are asked to do any task harder than that point, LLMs start becoming incapable more rapidly in humans.
To quote Gwern:
From this link:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF
(Note that I have a limit on how many comments I can make per week, so I will likely respond slowly, if at all to any responses to this).
It seems to me like the improvement in learning needed for what Gwern describes has little to do with “continual” and is more like “better learning” (better generalization, generalization from less examples).