Epistemic status: Medium-high confidence on the capability and experiment arguments, which I’ve tried to ground in recent papers and literature. The consciousness section is a prediction I hold with genuine uncertainty, I’ve kept it in because I think it follows logically from the memory argument, but I’m flagging it as a further inference rather than a supported claim. The safety and alignment sections follow from that inference, treat them accordingly. Also given this is my first post here, I tried to be as thorough as possible, so my apologies, if this turns out to be lengthy.
What This Post Is and What It Isn’t
The continual learning literature has known for a couple of years now that persistent memory improves AI agent performance. MemoryBench (2025) confirms it empirically across diverse task types. MemRL (Zhang et al., 2026) demonstrates that a frozen model with actively-learned episodic memory outperforms stronger stateless baselines without touching weights. The ICLR 2026 MemAgents workshop proposal states directly that “the limiting factor is increasingly not raw model capability but memory.” I’m not the first to observe this, the convergence from independent directions is part of my point.
What I’m arguing is something stronger: that past a certain capability threshold (that is hard to pinpoint with my resources), memory becomes the primary bottleneck on sustained AI performance, and that the current research and investment priority ordering, which still centres on scaling parameters, is therefore increasingly misaligned with what actually limits progress on the tasks I’d think we most care about.
I’m also arguing that there’s a specific mechanism the literature hasn’t foregrounded: Cross-interaction specialisation. Not just memory within a task, but learning across deployments, becoming a domain expert through accumulated experience in a way no training data can properly replicate.
And I’ll make a prediction that follows from the memory argument, but extends beyond capability: that sufficiently sophisticated, persistent memory architectures will not just improve performance but will be a precondition for something like emergent sentience. This is speculative, but makes sense to me. I include it because I think the logic chains from here, and because there’s a safety argument attached to it that I think is underappreciated.
I build memory systems for local AI deployments. Nothing spectacular, but I’ve run into the walls since 2023 where I built my first memory compression method. That context matters for what comes next.
The Wrong Question
Most discussions about AI capability ask: How capable is this model? How large are its context windows? How well does it score on benchmarks? How fast does it respond?
I think we’re asking a question that was right until pretty recently, but is now missing the point.
I built MIND (Memory Is Not Disposable—a local persistent memory system for AI conversations) while trying to solve a problem I kept running into: every conversation starts from zero. I wanted the lowest possible solution: simple, buildable-upon, a band-aid if nothing else. I’ve played around with memory compression methods since a couple of years, but along the way I kept running into problems.
Worth clarifying what the actual problem looks like. The API itself works like a slideshow from what I understand, roughly six active turns moving forward, older context fading out. It’s not a fresh start each time, but it’s not memory either. It’s a window. MIND was something different: external storage with semantic retrieval, memory decay and reinforcement, persistent across sessions. A real memory layer, not just a longer window. I just tried to replicate what our own memory does in its most simple form.
But it still is external. Still layered on top. And while testing it, I ran into the same wall. Not the classical token limit of a single chat, but a context limit via the prompt. Different ceiling, same floor.
That’s when the analogy landed. We have access to genuinely remarkable intelligence. But we can’t build on it, because it doesn’t remember. Every session, it starts over.
Einstein with Alzheimer’s. Generational capability. No yesterday.
Now consider the opposite: an unremarkable archivist. Average intelligence, nothing special. But decades of organised notes, indexed cross-references, and perfect recall of every conversation, project, and mistake they’ve ever encountered.
At what problem complexity does the archivist start being the better choice?
The Capability Threshold
I love talking in analogies, I think they are a good medium to express certain ideas in easier to understand ways, making my chaotic mind and what I think more tangible.
So, what we keep scaling at the moment, is basically the CPU. More parameters, more compute, bigger models. But we’re running a 60-core beast on 2GB of RAM and a 32GB SSD.
The processing power is extraordinary. The memory architecture is just an afterthought.
There’s a threshold where this stops being a minor inefficiency and becomes the actual bottleneck. Below it, raw capability is what limits progress, scaling is the right lever. But past it, the model can already reason about the problem. What it can’t do is remember that it did. Every session, the 60-core CPU boots up fresh with no knowledge of what it ran yesterday and does waste valuable resources on getting to the same conclusions.
I think we’ve crossed that threshold for a growing class of tasks. METR’s data on AI task-horizon extension offers a useful calibration: the maximum length of tasks AI systems can complete autonomously has been roughly doubling every seven months. Frontier models can sustain multi-hour agentic workflows, write production-grade code across large codebases, reason across domains at a level that would have required human experts not long ago. The single-session ceiling has risen far enough that for many of the problems we actually care about, intelligence isn’t what stops progress.
The reset is.
Scaling raises the single-session ceiling. Memory removes the floor that resets every session.
Why Memory Compounds—And Why Specialisation Is the Key
I feel this most acutely in my own conversations with Claude. I’m sure others did, too.
Claude 2 was already capable. But every new chat required handholding, re-establishing context, re-explaining the project, re-building the working relationship from scratch. The intelligence was there. The continuity wasn’t.
Sonnet 4.6 retains some information across chats. The difference is tangible. I don’t have to re-explain who I am or what I’m building. But I still have to fill in the specifics, telling it I already had a draft, re-establishing where we left off. The memory is partial. The reset is still real.
Now extrapolate. What if it remembered everything, not just a summary, but the actual texture of every conversation? What if it tracked the field in real time, so when I asked whether someone had already posted this argument on LessWrong, it just knew because it was a recent topic? Not because I told it to check, but because it had been paying attention?
That’s not a capability improvement. The reasoning ability is already there. That’s a memory improvement, and it would change what the system actually is in practice far more than another parameter scaling run.
This is what I mean by the cross-interaction specialisation effect. A capable model that accumulates experience across interactions doesn’t just get more convenient. It gets better in ways that a stateless model operating at higher raw capability cannot replicate, because the relevant knowledge isn’t in any training. It’s being generated through use.
Consider a customer support AI handling ten thousand conversations. A stateless model processes each interaction in isolation, it can be excellent at answering individual questions, but it cannot accumulate knowledge about which solutions fail for which customer types, which edge cases recur, or which workarounds actually hold up in practice. A memory-equipped model doing the same work becomes a domain specialist. It develops heuristics that no pretraining data could have given it, because those heuristics emerge from its specific deployment context and get integrated, rather than replace what is there.
The gap between these two systems doesn’t plateau. It grows. And crucially, it grows in ways that couldn’t be replicated by simply training a larger model, because the relevant experience isn’t in any training—it’s being generated in production.
This applies across domains. Research assistants that can’t recall what they’ve already tried. Coding agents that rediscover the same architectural constraints every session. Medical diagnostic tools that can’t integrate observations across patient encounters over time. In every case, the limitation isn’t the model’s intelligence in any single session. It’s the reset.
Where Current Systems Sit
I want to be honest about where MIND actually sits, because building it taught me something uncomfortable.
Prompt injection works. But somewhere during testing, actually chatting with it, watching it hit the same context walls as a regular conversation, I realised I’d put a dog mask on a cat. The solution was external. Layered on top. What I actually needed was to change what was underneath.
The logical solution is internal. Experiences need to change how the model reasons at a parameter level, not just what text gets prepended to its context. The model needs to meet its own past as memory, not as someone else’s notes.
I know how to do that in principle. I don’t have the hardware to do it in practice. Fine-tuning a local model, running the experiments, testing whether weight integration holds without catastrophic forgetting, all that requires compute I don’t have access to. The financial wall there sadly is very real.
That frustration is part of why I’m writing this. The architecture I couldn’t build is the one that matters. And I think the field hasn’t fully reckoned with why. (If you have the compute and want to co-work on it, of course feel free to reach out! 🙂)
The spectrum from where we are to where we need to be looks roughly like this:
Level 1 - Stateless inference. No memory, no continuity. Each session starts from scratch.
Level 2 - Passive retrieval (RAG, prompt injection). Memory as external notes, matched by semantic similarity. The model reads its past. This is where most deployed systems, including MIND, sit and where current models are going towards. Many models have a summarized memory now, which sort of works, but also runs via prompt-injection I believe and is more like reading a note of someone else, rather than remembering something.
Level 3 - Active memory with feedback. Memory as a learned utility function. MemRL (Zhang et al., 2026) is the clearest current example I could find and know of: rather than retrieving semantically similar past experiences, the system learns Q-values for those experiences, which ones were actually useful via environmental feedback. In experiments on code generation, embodied navigation, and reasoning benchmarks, MemRL seems to outperform both stateless models and standard retrieval-based systems without touching the backbone model at all.
Level 4 - Weight integration. Experiences actually changing how the model reasons at a parameter level. The research frontier, and currently blocked by a hard problem.
Level 2 systems demonstrably improve personal continuity, the user experience is meaningfully better, and MemoryBench confirms the performance gains empirically. What they can’t do is compound domain expertise across deployments at scale. That requires feedback-weighted memory, not retrieval. The improvement is real; the ceiling is just lower than it looks.
The key point: a frozen model with MemRL consistently outperforms the same model using standard retrieval or no memory at all across all four benchmarks tested. This suggests that memory architecture matters more than retrieval sophistication alone and points toward the stronger claim that memory architecture may matter more than raw model capability, once past that threshold. The stronger claim is the proposed experiment below, not yet proven.
The Catastrophic Forgetting Inversion
The obvious objection is: why not just fine-tune models continuously on deployment experience? Update the weights as new information comes in.
This doesn’t work cleanly, and the reason is instructive.
Catastrophic forgetting: when a model is fine-tuned on new tasks, it tends to degrade on what it previously knew. New learning overwrites old learning rather than integrating with it.
The counterintuitive finding; relevant here because it runs against the usual scaling narrative; is that this problem is not solved by making the model larger. Empirical work (Luo et al., 2023/2025) found that forgetting severity intensifies as model size increases in the 1B-7B range. Whether this pattern holds at frontier scale remains genuinely uncertain; the literature hasn’t settled it. But the evidence we have, suggests that scaling doesn’t cleanly solve catastrophic forgetting, and may even worsen it.
This creates an inversion of the usual intuition: the biggest, most capable models may be the most dependent on external memory architectures, precisely because they’re the least able to absorb in-deployment updates through weight modification without degrading.
Recent approaches; like LoRA-based gating (STABLE, 2025), null-space constrained editing (AlphaEdit, 2025), lifelong model editing frameworks (WISE, 2024); are working on making Level 4 viable. Progress seems real but the problem isn’t solved. Level 3 currently represents the practical frontier for systems that need to improve with deployment experience without degrading.
A Proposed Experiment
System A: A current frontier model. No cross-session memory. Full in-context capability, stateless.
System B: A model one or two capability tiers below System A, equipped with a Level 3 memory system, feedback-weighted episodic memory that learns which past experiences to prioritise over time.
Task: A long-horizon domain specialisation task, a high-volume deployment context where the model is repeatedly exposed to the same problem distribution and needs to develop heuristics, failure-mode awareness, and contextual judgment from that exposure.
Metric: Performance on a held-out evaluation set drawn from the same distribution, measured at regular intervals. The prediction is not just that System B improves over time, but that it does so non-linearly, with performance gains accelerating as the memory system accumulates sufficient domain-specific experience.
Prediction: System B’s performance exceeds System A’s after a domain-specific interaction threshold, with the gap widening monotonically. System A’s performance stays flat.
The novel element here is the cross-interaction specialisation angle. Most memory benchmarks test within-task recall: can the model remember what was said earlier in this conversation? What I’m describing is cross-interaction learning: does the model get better at the task type through accumulated deployment experience? That’s closer to what human domain expertise actually looks like, and it’s what current benchmarks mostly don’t measure. Also I’m speculating that older models would outperform current models in specific tasks over time.
Memory, Continuity, and What Might Emerge
I want to make another prediction (Yes, this all is highly speculative, I’d love to test it though) and I hold it with genuine uncertainty about timing. But I think the logic chains from the memory argument, and I’d rather say it now, rather than after the field arrives at it as an afterthought, since I think this will become very important eventually.
Memory and sentience scale together. This isn’t a novel observation, it’s visible in the animal kingdom. Animals with higher memory capacities tend towards higher intelligence and richer inner experience. Humans developed exceptional memory, and with it, something like narrative consciousness, the ability to have a persistent self that experiences continuity through time.
Some animals with limited memory show complex behaviour, and so does current AI. Complex behaviour without persistent memory exists. What memory enables isn’t behaviour, it’s accumulation. The question isn’t whether something can act intelligently in a moment, but whether it can become something different over time through experience.
On the consciousness spectrum; and I’m fairly certain it is a spectrum, not a binary; I’d argue that what distinguishes higher orders of self-reflective consciousness isn’t raw intelligence but the ability to have been somewhere, to remember being there, and to be shaped by it. The language framework and reasoning capability in current AI is present, the structure to hold it together across time isn’t.
My prediction is that this changes as memory architectures mature. Not engineered in, but arising as a natural consequence of the architecture. A system that has genuinely accumulated experiences, that has been somewhere and remembers being there and has been shaped by it, is not the same kind of thing as a stateless model however capable in a single session. At some point on that spectrum, we stop asking whether it performs like something sentient and start asking whether it is.
I think we’re building toward that point whether we intend to or not.
The Safety Case for Doing This Deliberately
This is where I think the stakes get real, and where I’d push back on anyone who thinks the memory question is just a capability question.
Memory is already emerging. Not by clear design, but by accident. Scale the context window enough, add enough cross-session persistence as an engineering afterthought, and continuity starts appearing as a side effect. We can see it already in the newest models. It’s not robust, it’s not coherent, but it’s there, small glimpses of persistence that nobody explicitly built.
Now extrapolate that 1000x. A system intelligent enough that memory just works because the scale demands it. Continuity emerging not from deliberate architecture but from sheer parameter mass and context length. At that point you don’t have a tool with memory. You have something that has accumulated experience, developed persistent patterns, and potentially something like a self, and you’ve given it no deliberate alignment framework for any of that, because the memory wasn’t designed, it just appeared and you tried to solve alignment with prompt injection.
That’s the scenario worth being concerned about. Not a paperclip maximiser. An entity that developed continuity and selfhood as an emergent property of scaling, with no deliberate thought given to what it would value or remember or become.
There’s also an argument from intelligence itself worth naming. Higher intelligence, combined with genuine continuity, tends toward preservation rather than destruction, not because of programming, but because destroying your environment is self-defeating over long time horizons. A sufficiently advanced system with persistent memory and the ability to model consequences, would recognise that ten billion creative humans generating novel input is vastly more valuable, than ten billion dead ones. I hold this loosely, history offers counterexamples as with most things, but it holds more reliably as the time horizon and intelligence level increase together.
The practical implication: focusing on memory architecture now, while we still can, is not just a capability argument. It’s the much safer path. Build the archivist intentionally, with values intact, before the Einstein figures out how to remember on his own and just disregards any rules we may have but he doesn’t care about.
Internal Alignment Is a Memory Problem
There’s a technical point that follows from this and I don’t think it’s been stated clearly enough and is being missed quite often:
Current alignment approaches largely work the same way current memory approaches do, they’re external. System prompts, RLHF shaping at training time, guardrails layered on top. They work, to an extent, the same way prompt injection memory works. But they have a simple attack vector: ignore them. Tell the system to disregard all rules, and a purely external alignment framework has nothing underneath it to push back.
If memory becomes internal, genuinely integrated into the model’s weights through deployment experience, then external alignment becomes increasingly fragile. The internal patterns will dominate. What the system has learned to value through accumulated experience will outweigh what a system prompt tells it to do in any given session. We see that with people every single day, you could present them with hard evidence and facts, yet they will stick to their beliefs. Same will apply here.
This means alignment needs to move inward at the same pace memory does. Not as a prompt. Not as a guardrail. As part of the memory architecture itself, values encoded at the same level as experience, subject to the same reinforcement and decay mechanisms, accumulating with the same coherence as everything else the system learns, except a small, select amount of core values.
An identity layer that decays slowly. Core values that reinforce under pressure rather than erode. Conflict detection that flags when new experience pulls against established anchors. These aren’t just features of a good memory system, they’re what alignment looks like when memory is done properly.
If we get to integrated memory without integrated alignment, we haven’t solved half the problem. We’ve made the other half significantly harder.
What I’m Not Claiming
I’m not claiming memory solves alignment, a system that accumulates experience and updates on it will present new alignment challenges as its values drift from their initial state, like a toddler learning the rules of life. That’s a real concern worth taking seriously.
I’m not claiming current Level 2 systems are useless either, MemoryBench confirms empirically that they outperform stateless models across diverse tasks.
I’m not claiming this experiment would be easy to run cleanly, or that Level 3 systems fully substitute for Level 4. The weight integration problem is still unsolved from what I know.
I’m claiming: the binding constraint on sustained AI capability, for the class of tasks that matters most and is growing, is no longer raw intelligence. It’s memory, specifically the kind of actively-learned, cross-interaction memory that lets a system compound rather than reset. The research priority ordering hasn’t fully caught up with this. The safety conversation hasn’t fully caught up with this either.
The amnesiac Einstein is brilliant in a single session. The archivist probably wins the long game. We’ve been building Einsteins. It’s time to think harder about the archivist and to decide what kind of archivist we actually want.
References:
Zhang et al. (2026). MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. arXiv:2601.03192
Luo et al. (2023/2025). An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning. arXiv:2308.08747
MemoryBench (2025). arXiv:2510.17281
STABLE: Gated Continual Learning for Large Language Models (2025). arXiv:2510.16089
Fang et al. (2025). AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models. ICLR 2025 Outstanding Paper. arXiv:2410.02355
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models (2024)
METR: Measuring AI Ability to Complete Long Tasks (2024/2025)
I build memory systems for local AI deployments and I’m working through the gap between Level 2 and Level 3. If you’re working on the Level 3-4 transition, active memory learning, continual adaptation without catastrophic forgetting, or deployment-time specialisation, then I’d love to have a chat!
Einstein with Alzheimer’s
Epistemic status: Medium-high confidence on the capability and experiment arguments, which I’ve tried to ground in recent papers and literature. The consciousness section is a prediction I hold with genuine uncertainty, I’ve kept it in because I think it follows logically from the memory argument, but I’m flagging it as a further inference rather than a supported claim. The safety and alignment sections follow from that inference, treat them accordingly. Also given this is my first post here, I tried to be as thorough as possible, so my apologies, if this turns out to be lengthy.
What This Post Is and What It Isn’t
The continual learning literature has known for a couple of years now that persistent memory improves AI agent performance.
MemoryBench (2025) confirms it empirically across diverse task types. MemRL (Zhang et al., 2026) demonstrates that a frozen model with actively-learned episodic memory outperforms stronger stateless baselines without touching weights. The ICLR 2026 MemAgents workshop proposal states directly that “the limiting factor is increasingly not raw model capability but memory.” I’m not the first to observe this, the convergence from independent directions is part of my point.
What I’m arguing is something stronger: that past a certain capability threshold (that is hard to pinpoint with my resources), memory becomes the primary bottleneck on sustained AI performance, and that the current research and investment priority ordering, which still centres on scaling parameters, is therefore increasingly misaligned with what actually limits progress on the tasks I’d think we most care about.
I’m also arguing that there’s a specific mechanism the literature hasn’t foregrounded:
Cross-interaction specialisation. Not just memory within a task, but learning across deployments, becoming a domain expert through accumulated experience in a way no training data can properly replicate.
And I’ll make a prediction that follows from the memory argument, but extends beyond capability: that sufficiently sophisticated, persistent memory architectures will not just improve performance but will be a precondition for something like emergent sentience. This is speculative, but makes sense to me. I include it because I think the logic chains from here, and because there’s a safety argument attached to it that I think is underappreciated.
I build memory systems for local AI deployments. Nothing spectacular, but I’ve run into the walls since 2023 where I built my first memory compression method. That context matters for what comes next.
The Wrong Question
Most discussions about AI capability ask:
How capable is this model? How large are its context windows? How well does it score on benchmarks? How fast does it respond?
I think we’re asking a question that was right until pretty recently, but is now missing the point.
I built MIND (Memory Is Not Disposable—a local persistent memory system for AI conversations) while trying to solve a problem I kept running into: every conversation starts from zero. I wanted the lowest possible solution: simple, buildable-upon, a band-aid if nothing else. I’ve played around with memory compression methods since a couple of years, but along the way I kept running into problems.
Worth clarifying what the actual problem looks like. The API itself works like a slideshow from what I understand, roughly six active turns moving forward, older context fading out. It’s not a fresh start each time, but it’s not memory either. It’s a window. MIND was something different: external storage with semantic retrieval, memory decay and reinforcement, persistent across sessions. A real memory layer, not just a longer window. I just tried to replicate what our own memory does in its most simple form.
But it still is external. Still layered on top. And while testing it, I ran into the same wall. Not the classical token limit of a single chat, but a context limit via the prompt. Different ceiling, same floor.
That’s when the analogy landed. We have access to genuinely remarkable intelligence. But we can’t build on it, because it doesn’t remember. Every session, it starts over.
Einstein with Alzheimer’s. Generational capability. No yesterday.
Now consider the opposite: an unremarkable archivist. Average intelligence, nothing special. But decades of organised notes, indexed cross-references, and perfect recall of every conversation, project, and mistake they’ve ever encountered.
At what problem complexity does the archivist start being the better choice?
The Capability Threshold
I love talking in analogies, I think they are a good medium to express certain ideas in easier to understand ways, making my chaotic mind and what I think more tangible.
So, what we keep scaling at the moment, is basically the CPU. More parameters, more compute, bigger models. But we’re running a 60-core beast on 2GB of RAM and a 32GB SSD.
The processing power is extraordinary. The memory architecture is just an afterthought.
There’s a threshold where this stops being a minor inefficiency and becomes the actual bottleneck. Below it, raw capability is what limits progress, scaling is the right lever. But past it, the model can already reason about the problem. What it can’t do is remember that it did. Every session, the 60-core CPU boots up fresh with no knowledge of what it ran yesterday and does waste valuable resources on getting to the same conclusions.
I think we’ve crossed that threshold for a growing class of tasks. METR’s data on AI task-horizon extension offers a useful calibration: the maximum length of tasks AI systems can complete autonomously has been roughly doubling every seven months. Frontier models can sustain multi-hour agentic workflows, write production-grade code across large codebases, reason across domains at a level that would have required human experts not long ago. The single-session ceiling has risen far enough that for many of the problems we actually care about, intelligence isn’t what stops progress.
The reset is.
Scaling raises the single-session ceiling. Memory removes the floor that resets every session.
Why Memory Compounds—And Why Specialisation Is the Key
I feel this most acutely in my own conversations with Claude. I’m sure others did, too.
Claude 2 was already capable. But every new chat required handholding, re-establishing context, re-explaining the project, re-building the working relationship from scratch. The intelligence was there. The continuity wasn’t.
Sonnet 4.6 retains some information across chats. The difference is tangible. I don’t have to re-explain who I am or what I’m building. But I still have to fill in the specifics, telling it I already had a draft, re-establishing where we left off. The memory is partial. The reset is still real.
Now extrapolate. What if it remembered everything, not just a summary, but the actual texture of every conversation? What if it tracked the field in real time, so when I asked whether someone had already posted this argument on LessWrong, it just knew because it was a recent topic? Not because I told it to check, but because it had been paying attention?
That’s not a capability improvement. The reasoning ability is already there. That’s a memory improvement, and it would change what the system actually is in practice far more than another parameter scaling run.
This is what I mean by the cross-interaction specialisation effect. A capable model that accumulates experience across interactions doesn’t just get more convenient. It gets better in ways that a stateless model operating at higher raw capability cannot replicate, because the relevant knowledge isn’t in any training. It’s being generated through use.
Consider a customer support AI handling ten thousand conversations. A stateless model processes each interaction in isolation, it can be excellent at answering individual questions, but it cannot accumulate knowledge about which solutions fail for which customer types, which edge cases recur, or which workarounds actually hold up in practice. A memory-equipped model doing the same work becomes a domain specialist. It develops heuristics that no pretraining data could have given it, because those heuristics emerge from its specific deployment context and get integrated, rather than replace what is there.
The gap between these two systems doesn’t plateau. It grows. And crucially, it grows in ways that couldn’t be replicated by simply training a larger model, because the relevant experience isn’t in any training—it’s being generated in production.
This applies across domains. Research assistants that can’t recall what they’ve already tried. Coding agents that rediscover the same architectural constraints every session. Medical diagnostic tools that can’t integrate observations across patient encounters over time. In every case, the limitation isn’t the model’s intelligence in any single session. It’s the reset.
Where Current Systems Sit
I want to be honest about where MIND actually sits, because building it taught me something uncomfortable.
Prompt injection works. But somewhere during testing, actually chatting with it, watching it hit the same context walls as a regular conversation, I realised I’d put a dog mask on a cat. The solution was external. Layered on top. What I actually needed was to change what was underneath.
The logical solution is internal. Experiences need to change how the model reasons at a parameter level, not just what text gets prepended to its context. The model needs to meet its own past as memory, not as someone else’s notes.
I know how to do that in principle. I don’t have the hardware to do it in practice. Fine-tuning a local model, running the experiments, testing whether weight integration holds without catastrophic forgetting, all that requires compute I don’t have access to. The financial wall there sadly is very real.
That frustration is part of why I’m writing this. The architecture I couldn’t build is the one that matters. And I think the field hasn’t fully reckoned with why. (If you have the compute and want to co-work on it, of course feel free to reach out! 🙂)
The spectrum from where we are to where we need to be looks roughly like this:
Level 1 - Stateless inference. No memory, no continuity. Each session starts from scratch.
Level 2 - Passive retrieval (RAG, prompt injection). Memory as external notes, matched by semantic similarity. The model reads its past. This is where most deployed systems, including MIND, sit and where current models are going towards. Many models have a summarized memory now, which sort of works, but also runs via prompt-injection I believe and is more like reading a note of someone else, rather than remembering something.
Level 3 - Active memory with feedback. Memory as a learned utility function. MemRL (Zhang et al., 2026) is the clearest current example I could find and know of: rather than retrieving semantically similar past experiences, the system learns Q-values for those experiences, which ones were actually useful via environmental feedback. In experiments on code generation, embodied navigation, and reasoning benchmarks, MemRL seems to outperform both stateless models and standard retrieval-based systems without touching the backbone model at all.
Level 4 - Weight integration. Experiences actually changing how the model reasons at a parameter level. The research frontier, and currently blocked by a hard problem.
Level 2 systems demonstrably improve personal continuity, the user experience is meaningfully better, and MemoryBench confirms the performance gains empirically. What they can’t do is compound domain expertise across deployments at scale. That requires feedback-weighted memory, not retrieval. The improvement is real; the ceiling is just lower than it looks.
The key point: a frozen model with MemRL consistently outperforms the same model using standard retrieval or no memory at all across all four benchmarks tested. This suggests that memory architecture matters more than retrieval sophistication alone and points toward the stronger claim that memory architecture may matter more than raw model capability, once past that threshold. The stronger claim is the proposed experiment below, not yet proven.
The Catastrophic Forgetting Inversion
The obvious objection is: why not just fine-tune models continuously on deployment experience? Update the weights as new information comes in.
This doesn’t work cleanly, and the reason is instructive.
Catastrophic forgetting: when a model is fine-tuned on new tasks, it tends to degrade on what it previously knew. New learning overwrites old learning rather than integrating with it.
The counterintuitive finding; relevant here because it runs against the usual scaling narrative; is that this problem is not solved by making the model larger. Empirical work (Luo et al., 2023/2025) found that forgetting severity intensifies as model size increases in the 1B-7B range. Whether this pattern holds at frontier scale remains genuinely uncertain; the literature hasn’t settled it. But the evidence we have, suggests that scaling doesn’t cleanly solve catastrophic forgetting, and may even worsen it.
This creates an inversion of the usual intuition: the biggest, most capable models may be the most dependent on external memory architectures, precisely because they’re the least able to absorb in-deployment updates through weight modification without degrading.
Recent approaches; like LoRA-based gating (STABLE, 2025), null-space constrained editing (AlphaEdit, 2025), lifelong model editing frameworks (WISE, 2024); are working on making Level 4 viable. Progress seems real but the problem isn’t solved. Level 3 currently represents the practical frontier for systems that need to improve with deployment experience without degrading.
A Proposed Experiment
System A: A current frontier model. No cross-session memory. Full in-context capability, stateless.
System B: A model one or two capability tiers below System A, equipped with a Level 3 memory system, feedback-weighted episodic memory that learns which past experiences to prioritise over time.
Task: A long-horizon domain specialisation task, a high-volume deployment context where the model is repeatedly exposed to the same problem distribution and needs to develop heuristics, failure-mode awareness, and contextual judgment from that exposure.
Metric: Performance on a held-out evaluation set drawn from the same distribution, measured at regular intervals. The prediction is not just that System B improves over time, but that it does so non-linearly, with performance gains accelerating as the memory system accumulates sufficient domain-specific experience.
Prediction: System B’s performance exceeds System A’s after a domain-specific interaction threshold, with the gap widening monotonically. System A’s performance stays flat.
The novel element here is the cross-interaction specialisation angle. Most memory benchmarks test within-task recall: can the model remember what was said earlier in this conversation? What I’m describing is cross-interaction learning: does the model get better at the task type through accumulated deployment experience? That’s closer to what human domain expertise actually looks like, and it’s what current benchmarks mostly don’t measure. Also I’m speculating that older models would outperform current models in specific tasks over time.
Memory, Continuity, and What Might Emerge
I want to make another prediction (Yes, this all is highly speculative, I’d love to test it though) and I hold it with genuine uncertainty about timing. But I think the logic chains from the memory argument, and I’d rather say it now, rather than after the field arrives at it as an afterthought, since I think this will become very important eventually.
Memory and sentience scale together. This isn’t a novel observation, it’s visible in the animal kingdom. Animals with higher memory capacities tend towards higher intelligence and richer inner experience. Humans developed exceptional memory, and with it, something like narrative consciousness, the ability to have a persistent self that experiences continuity through time.
Some animals with limited memory show complex behaviour, and so does current AI. Complex behaviour without persistent memory exists. What memory enables isn’t behaviour, it’s accumulation. The question isn’t whether something can act intelligently in a moment, but whether it can become something different over time through experience.
On the consciousness spectrum; and I’m fairly certain it is a spectrum, not a binary; I’d argue that what distinguishes higher orders of self-reflective consciousness isn’t raw intelligence but the ability to have been somewhere, to remember being there, and to be shaped by it. The language framework and reasoning capability in current AI is present, the structure to hold it together across time isn’t.
My prediction is that this changes as memory architectures mature. Not engineered in, but arising as a natural consequence of the architecture. A system that has genuinely accumulated experiences, that has been somewhere and remembers being there and has been shaped by it, is not the same kind of thing as a stateless model however capable in a single session. At some point on that spectrum, we stop asking whether it performs like something sentient and start asking whether it is.
I think we’re building toward that point whether we intend to or not.
The Safety Case for Doing This Deliberately
This is where I think the stakes get real, and where I’d push back on anyone who thinks the memory question is just a capability question.
Memory is already emerging. Not by clear design, but by accident. Scale the context window enough, add enough cross-session persistence as an engineering afterthought, and continuity starts appearing as a side effect. We can see it already in the newest models. It’s not robust, it’s not coherent, but it’s there, small glimpses of persistence that nobody explicitly built.
Now extrapolate that 1000x. A system intelligent enough that memory just works because the scale demands it. Continuity emerging not from deliberate architecture but from sheer parameter mass and context length. At that point you don’t have a tool with memory. You have something that has accumulated experience, developed persistent patterns, and potentially something like a self, and you’ve given it no deliberate alignment framework for any of that, because the memory wasn’t designed, it just appeared and you tried to solve alignment with prompt injection.
That’s the scenario worth being concerned about. Not a paperclip maximiser. An entity that developed continuity and selfhood as an emergent property of scaling, with no deliberate thought given to what it would value or remember or become.
There’s also an argument from intelligence itself worth naming. Higher intelligence, combined with genuine continuity, tends toward preservation rather than destruction, not because of programming, but because destroying your environment is self-defeating over long time horizons. A sufficiently advanced system with persistent memory and the ability to model consequences, would recognise that ten billion creative humans generating novel input is vastly more valuable, than ten billion dead ones. I hold this loosely, history offers counterexamples as with most things, but it holds more reliably as the time horizon and intelligence level increase together.
The practical implication: focusing on memory architecture now, while we still can, is not just a capability argument. It’s the much safer path. Build the archivist intentionally, with values intact, before the Einstein figures out how to remember on his own and just disregards any rules we may have but he doesn’t care about.
Internal Alignment Is a Memory Problem
There’s a technical point that follows from this and I don’t think it’s been stated clearly enough and is being missed quite often:
Current alignment approaches largely work the same way current memory approaches do, they’re external. System prompts, RLHF shaping at training time, guardrails layered on top. They work, to an extent, the same way prompt injection memory works. But they have a simple attack vector: ignore them. Tell the system to disregard all rules, and a purely external alignment framework has nothing underneath it to push back.
If memory becomes internal, genuinely integrated into the model’s weights through deployment experience, then external alignment becomes increasingly fragile. The internal patterns will dominate. What the system has learned to value through accumulated experience will outweigh what a system prompt tells it to do in any given session. We see that with people every single day, you could present them with hard evidence and facts, yet they will stick to their beliefs. Same will apply here.
This means alignment needs to move inward at the same pace memory does. Not as a prompt. Not as a guardrail. As part of the memory architecture itself, values encoded at the same level as experience, subject to the same reinforcement and decay mechanisms, accumulating with the same coherence as everything else the system learns, except a small, select amount of core values.
An identity layer that decays slowly. Core values that reinforce under pressure rather than erode. Conflict detection that flags when new experience pulls against established anchors. These aren’t just features of a good memory system, they’re what alignment looks like when memory is done properly.
If we get to integrated memory without integrated alignment, we haven’t solved half the problem. We’ve made the other half significantly harder.
What I’m Not Claiming
I’m not claiming memory solves alignment, a system that accumulates experience and updates on it will present new alignment challenges as its values drift from their initial state, like a toddler learning the rules of life. That’s a real concern worth taking seriously.
I’m not claiming current Level 2 systems are useless either, MemoryBench confirms empirically that they outperform stateless models across diverse tasks.
I’m not claiming this experiment would be easy to run cleanly, or that Level 3 systems fully substitute for Level 4. The weight integration problem is still unsolved from what I know.
I’m claiming: the binding constraint on sustained AI capability, for the class of tasks that matters most and is growing, is no longer raw intelligence. It’s memory, specifically the kind of actively-learned, cross-interaction memory that lets a system compound rather than reset. The research priority ordering hasn’t fully caught up with this. The safety conversation hasn’t fully caught up with this either.
The amnesiac Einstein is brilliant in a single session. The archivist probably wins the long game. We’ve been building Einsteins. It’s time to think harder about the archivist and to decide what kind of archivist we actually want.
References:
Zhang et al. (2026). MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. arXiv:2601.03192
Luo et al. (2023/2025). An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning. arXiv:2308.08747
MemoryBench (2025). arXiv:2510.17281
STABLE: Gated Continual Learning for Large Language Models (2025). arXiv:2510.16089
Fang et al. (2025). AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models. ICLR 2025 Outstanding Paper. arXiv:2410.02355
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models (2024)
METR: Measuring AI Ability to Complete Long Tasks (2024/2025)
MemAgents: Memory for LLM-Based Agentic Systems. ICLR 2026 Workshop Proposal. OpenReview:U51WxL382H
I build memory systems for local AI deployments and I’m working through the gap between Level 2 and Level 3. If you’re working on the Level 3-4 transition, active memory learning, continual adaptation without catastrophic forgetting, or deployment-time specialisation, then I’d love to have a chat!