[Question] Why no major LLMs with memory?

One thing that I’m slightly puzzled by is that an obvious improvement to LLMs would be adding some kind of long-term memory that would allow them to retain more information than fits their context window. Naively, I would imagine that even just throwing some recurrent neural net layers in there would be better than nothing?

But while I’ve seen LLM papers that talk about how they’re multimodal or smarter than before, I don’t recall seeing any widely-publicized model that would have extended the memory beyond the immediate context window, and that confuses me.

No comments.