Noosphere89 comments on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Noosphere89 17 Apr 2025 21:00 UTC
2 points
0
I’d say the main reason memory is useful is as a way to enable longer-term meta-learning, as well as enable the foundation for continuous learning to work out.
From @Seth Herd’s post:
Stateless LLMs/foundation models are already useful. Adding memory to LLMs and LLM-based agents will make them useful in more ways. The effects might range from minor to effectively opening up new areas of capability, particularly for longer time-horizon tasks. Even current memory systems would be enough to raise some of the alignment stability problems I discuss here, once they’re adapted for self-directed or autobiographical memory. I think the question is less whether this will be done, and more how soon and how much they will boost capabilities.
Let’s consider how memory could help agents do useful tasks. A human with lots of knowledge and skills but who can’t learn anything new is a useful intuition pump for the “employability” of agents without new memory mechanisms. (Such “dense anterograde amnesia” is rare because it requires bilateral medial temporal lobe damage while leaving the rest of the brain intact. Two patients occupied most of the clinical literature when I studied it).
Such a “memoryless” person could do simple office tasks by referencing instructions, just like an LLM “agent” can be prompted to perform multi-step tasks. However, almost every task has subtleties and challenges, so can benefit from learning on the job. Even data entry benefits from recognizing common errors and edge cases. More complex tasks usually benefit more from learning. For our memoryless human or an LLM, we could try giving better, more detailed instructions to cover subtleties and edge cases. Current agent work takes this route, whether by prompting or by hand-creating fine-tuning datasets, and it is reportedly just as maddening as supervising a memoryless human would be.
A standard human worker would learn many subtleties of their task over time. They would notice (if they cared) themes for common errors and edge cases. A little human guidance (“watch that these two fields aren’t reversed”) would go a long way. This would make teaching agents new variations in their tasks, or new tasks, vastly easier. We’ll consider barriers to agents directing their own learning below.
Runtime compute scaling creates another reason to add continuous learning mechanisms (“memory” in common parlance) to LLM agents. If an LLM can “figure out” something important for its assigned task, you don’t want to pay that compute and time cost every time, nor take the chance it won’t figure it out again.
Use of agents as companions or assistants would also benefit from memory, in which even limited systems for user’s preferences and prior interactions would be economically valuable.
Longer-term tasks benefit from memory in another way. Humans rely heavily on long-term one-shot memory (“episodic memory”) for organizing large tasks and their many subtasks. There are often dependencies between subtasks. LLM agents can perform surprisingly and increasingly well in their current mode of proceeding largely from start to finish and just getting everything right with little planning or checking, using only their context window for memory.
But it is possible that the tasks included in METR’s report of exponential progress in task length are effectively selected for not needing much memory. And long task progress may be hampered by models’ inability to remember which subtasks they’ve completed, and by memory limitations on effective context length (e.g.). Whether or not this is the case, it seems pretty likely that some long time-horizon tasks would benefit from more memory capabilities of different types.
Even if additional memory systems would be useful for LLM agents, one might think they would take years and breakthroughs to develop.
Or @gwern’s comment here:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/?commentId=hSkQG2N8rkKXosLEF
- Jonas Hallgren 18 Apr 2025 6:39 UTC
  1 point
  0
  Parent
  Yeah, I agree with that and I still feel there’s something missing from that discussion?
  Like, there’s some degree that to have good planning capacity you want to have good world model to plan over in the future. You then want to assign relative probabilities to your action policies working out well. To do this having a clear self-environment boundary is quite key, so yes memory enables in-context learning but I do not believe that will be the largest addition, I think the fact that memory allows for more learning about self-environment boundaries is a more important part?
  There’s stuff in RL, Active Inference and Michael levin’s work I can point to for this but it is rather like a bunch of information spread out over many different papers so it is hard to give something definitive on it.