Current LLM memory systems are like writing notes. The analogy is pretty strong with the human experience of writing and reading notes. They help me remember things, so I don’t remember them in my head, and instead just look at the note. That means the memory mostly stays in short and medium term memory while I need it, and then is maybe not turned into a long term memory because I know I can just look it up again later. If I rely too much on notes, I start to feel like a goldfish.
Continual learning would be more like consolidating long term memories. The memories aren’t just for the task at hand, but reshape the brain’s network.
These are useful for different purposes. Notes help a lot for complex things that, for us, don’t make sense to commit to memory (or would have made sense to commit to memory if we didn’t have notes). The benefit is that, with notes, we can do much more precise work on a larger set of context than we might otherwise be able to do. The downside is, notes take up space in short term memory and limit how much other stuff we can do while remembering them.
AI faces the same challenges. They will face different limits than we do. They don’t have the same 5-7 short term memory slot issue. They can probably get a lot further with notes than we can, but they will ultimately face the same limits we do and need to do the equivalent of consolidating memories if they want to do more, since one of the things that distinguishes experts from noobs is how the expert naturally knows their area because it’s deeply integrated into their memories and neural circuitry.
My guess is that proper continual learning will continue to be a blocker for building strong AGI/ASI, even if we can get weak AGI without it.
I see your point, and definitely agree that learning / deep understanding as a human is different to having access to lots of notes, but I think that the huge context + speed of reading + new model releases might be enough to get nearly all the benefits that require humans to use our long term memory.
For example, could this requirement in humans just be that we can store so little in conscious thought? If I can only hold 10 things ‘front of mind’ at a time, to do anything complex I am forced to rely on my larger long term memory to surface relevant things. If I could hold 1,000,000 things front of mind, maybe I wouldn’t need that at all. This is also somewhat justified by in-context learning (arguably) doing the same thing as gradient descent, though that is debated.
Regarding notes, I think the same difference in scale applies, as well as a qualitative difference. A human can only read maybe 10 words a second, but an AI can read on the order of 100,000. I argue that makes it closer to a human “reading” from their memory compared with a human reading notes, especially since all of the notes are loaded into the context window / working memory of the AI. That, I think, is more analogous to the way human memories are used than to how humans use notes, since the human reading can’t hold more than a sentence or so in working memory at a time without compressing it.
Yeah, it’s hard to say. However, spending all day every day interacting with LLMs, they have attention issues that are not totally dissimilar from humans (in fact, their attention is generally worse than humans in a lot of ways), and much will hinge on how their attention scales.
Yes, but consider this analogy.
Current LLM memory systems are like writing notes. The analogy is pretty strong with the human experience of writing and reading notes. They help me remember things, so I don’t remember them in my head, and instead just look at the note. That means the memory mostly stays in short and medium term memory while I need it, and then is maybe not turned into a long term memory because I know I can just look it up again later. If I rely too much on notes, I start to feel like a goldfish.
Continual learning would be more like consolidating long term memories. The memories aren’t just for the task at hand, but reshape the brain’s network.
These are useful for different purposes. Notes help a lot for complex things that, for us, don’t make sense to commit to memory (or would have made sense to commit to memory if we didn’t have notes). The benefit is that, with notes, we can do much more precise work on a larger set of context than we might otherwise be able to do. The downside is, notes take up space in short term memory and limit how much other stuff we can do while remembering them.
AI faces the same challenges. They will face different limits than we do. They don’t have the same 5-7 short term memory slot issue. They can probably get a lot further with notes than we can, but they will ultimately face the same limits we do and need to do the equivalent of consolidating memories if they want to do more, since one of the things that distinguishes experts from noobs is how the expert naturally knows their area because it’s deeply integrated into their memories and neural circuitry.
My guess is that proper continual learning will continue to be a blocker for building strong AGI/ASI, even if we can get weak AGI without it.
I see your point, and definitely agree that learning / deep understanding as a human is different to having access to lots of notes, but I think that the huge context + speed of reading + new model releases might be enough to get nearly all the benefits that require humans to use our long term memory.
For example, could this requirement in humans just be that we can store so little in conscious thought? If I can only hold 10 things ‘front of mind’ at a time, to do anything complex I am forced to rely on my larger long term memory to surface relevant things. If I could hold 1,000,000 things front of mind, maybe I wouldn’t need that at all. This is also somewhat justified by in-context learning (arguably) doing the same thing as gradient descent, though that is debated.
Regarding notes, I think the same difference in scale applies, as well as a qualitative difference. A human can only read maybe 10 words a second, but an AI can read on the order of 100,000. I argue that makes it closer to a human “reading” from their memory compared with a human reading notes, especially since all of the notes are loaded into the context window / working memory of the AI. That, I think, is more analogous to the way human memories are used than to how humans use notes, since the human reading can’t hold more than a sentence or so in working memory at a time without compressing it.
Yeah, it’s hard to say. However, spending all day every day interacting with LLMs, they have attention issues that are not totally dissimilar from humans (in fact, their attention is generally worse than humans in a lot of ways), and much will hinge on how their attention scales.