Currently, yes (modulo a few hacks like Retrieval-Augumented Generation and context resummarization). If we actually manage to engineer an LLM++ solution to this that works well, I would expect the problem you describe to go away (as in, if it hasn’t gone away, then we haven’t actually solved continual learning for LLMs). My understanding is that the human brain is thought to use a complicated Heath-Robinson combination of sensory memory, short-term memory, working memory, long term memory, episodic memory, consolidation during dreams, memory reconsolidation during recall, dedicated memory just for traumatic events, and so forth. That is not a simple system: it bears no resemblance to a tape recorder. So I would expect a solution to the same problem for LLMs, if one is possible, to have roughly as many working parts and also look like an architectural Heath-Robinson solution: so very much not Bitter Lesson style. I’m guessing we’ve so far solved maybe 25–30% of the problem.
I tend to divide possible forms that AGI may take into LLM-AGI, LLM++-AGI (which has a lot of framework and architectural hacks bolted around and into something that still rather resembles an LLM), and non-LLM-(with-an-LLM-available-as-a-subcomponent-if-needed)-AGI. I am very dubious about the first of these three categories: primarily because I’m convinced that even just solving continual learning for an LLM is going to require an LLM++ architecture (and there are O(5) other major concerns popular among LLM-skeptic AI experts, each of which might in fact be solved by just scaling and the right training data, or might also require something similar).
Currently, yes (modulo a few hacks like Retrieval-Augumented Generation and context resummarization). If we actually manage to engineer an LLM++ solution to this that works well, I would expect the problem you describe to go away (as in, if it hasn’t gone away, then we haven’t actually solved continual learning for LLMs). My understanding is that the human brain is thought to use a complicated Heath-Robinson combination of sensory memory, short-term memory, working memory, long term memory, episodic memory, consolidation during dreams, memory reconsolidation during recall, dedicated memory just for traumatic events, and so forth. That is not a simple system: it bears no resemblance to a tape recorder. So I would expect a solution to the same problem for LLMs, if one is possible, to have roughly as many working parts and also look like an architectural Heath-Robinson solution: so very much not Bitter Lesson style. I’m guessing we’ve so far solved maybe 25–30% of the problem.
I tend to divide possible forms that AGI may take into LLM-AGI, LLM++-AGI (which has a lot of framework and architectural hacks bolted around and into something that still rather resembles an LLM), and non-LLM-(with-an-LLM-available-as-a-subcomponent-if-needed)-AGI. I am very dubious about the first of these three categories: primarily because I’m convinced that even just solving continual learning for an LLM is going to require an LLM++ architecture (and there are O(5) other major concerns popular among LLM-skeptic AI experts, each of which might in fact be solved by just scaling and the right training data, or might also require something similar).