This is great; I think it’s important to have this discussion. It’s key for where we put our all-too-limited alignment efforts.
I roughly agree with you that pure transformers won’t achieve AGI, for the reasons you give. They’re hitting a scaling wall, and they have marked cognitive blindspots like you document here, and Thane Ruthenis argues for convincingly in his bear case. But transformer-based agents (which are simple cognitive architectures) can still get there- and I don’t think they need breakthroughs, just integration and improvement. And people are already working on that.
To put it this way: humans have all of the cognitive weaknesses you identify, too. But we can use online learning (and spatial reasoning) to overcome them. We actually generalize only rarely and usually with careful thought. Scaffolded and/or RL-trained CoT models can do that too. Then we remember our conclusions and learn from them. Add-on memory systems and fine-tuning setups can replicate that.
Generalization: It’s a sort of informal general conclusion in cognitive psychology that “wow are people bad at generalizing”. For instance, if you teach them a puzzle, then change the names and appearances, it looks like they don’t apply the learning at all. These are undergraduates who are usually unpaid, so they’re not doing the careful thinking it requires humans to generalize knowledge. LLMs of the generation you’re testing don’t think carefully either (you note that in a couple of places “it’s like it’s not thinking” which is exactly right), but CoT RL on a variety of reward models is making disturbingly rapid progress at teaching them when and how to think carefully—and enabling generalization. Good memory systems will be necessary for them to internalize the new general principles they’ve learned, but those might be good enough already, and if not, they probably will be very soon.
I think you’re overstating the case, and even pure transformers could easily reach human-level “Real AGI” that’s dangerous as hell. Continued improvements in effectively using long context windows and specific training for better CoT reasoning do enable a type of online learning. Use Gemini 2.5 Pro for some scary demonstrations, although it’s not there yet. Naked transformers probably won’t reach AGI, which I’m confident of mostly because adding outside memory (and other cognitive) systems is so much easier and already underway.
My LLM cognitive architectures article also discusses other add-on cognitive systems like vision systems that would emulate how humans solve tic-tac-toe, sliding puzzles etc. - but now I don’t think transformers need those, just more multimodal training. (Note that Claude and o1 preview weren’t multimodal, so were weak at spatial puzzles. If this was full o1, I’m surprised.) They’ll still be relatively bad at fundamentally spatial tasks like yours and ARC-AGI, but they can fill that in by learning or being taught the useful concepts/tricks for actually useful problem spaces. That’s what a human would do if they happened to be have poor spatial cognition and had to solve spatial tasks.
I think transformers not reaching AGI is a common suspicion/hope amongst serious AGI thinkers. It could be true, but it’s probably not, so I’m worried that too many good thinkers are optimistically hoping we’ll get some different type of AGI. It’s fine and good if some of our best thinkers focus on other possible routes to AGI. It is very much not okay if most of our best thinkers incorrectly assume they won’t. Prosaic alignment work is not sufficient to align LLMs with memory/online learning, so we need more agent foundations style thinking there.
I address how we need to go beyond static alignment of LLMs to evolving learned systems of beliefs, briefly and inadequately in my memory changes alignment post and elsewhere. I am trying to crystallize the reasoning in a draft post with the working title “if Claude achieves AGI will it be aligned?”
Note that Claude and o1 preview weren’t multimodal, so were weak at spatial puzzles. If this was full o1, I’m surprised.
I just tried the sliding puzzle with o1 and it got it right! Though multimodality may not have been relevant, since it solved it by writing a breadth-first search algorithm and running it.
Interesting! Nonetheless, I agree with your opening statement that LLMs learning to do any of these things individually doesn’t address the larger point that the have important cognitive gaps and fail.to generalize in ways that humans can.
I think transformers not reaching AGI is a common suspicion/hope amongst serious AGI thinkers. It could be true, but it’s probably not, so I’m worried that too many good thinkers are optimistically hoping we’ll get some different type of AGI.
To clarify, my position is not “transformers are fundamentally incapable of reaching AGI”, it’s “if transformers do reach AGI, I expect it to take at least a few breakthroughs more”.
If we were only focusing on the kinds of reasoning failures I discussed here, just one breakthrough might be enough. (Maybe it’s memory—my own guess was different and I originally had a section discussing my own guess but it felt speculative enough that I cut it.) Though I do think that that full AGI would also require a few more competencies that I didn’t get into here if it’s to generalize beyond code/game/math type domains.
Right, I got that. To be clear, my argument is that no breakthroughs are necessary, and further that progress is underway and rapid on filling in the existing gaps in LLM capabilities.
Memory definitely doesn’t require a breakthrough. Add-on memory systems (RAG and fine-tuning, as well as more sophisticated context management through prompting; CoT RL training effectively does this too).
Other cognitive capacities also exist in nascent form and so probably require no breakthroughs. Although I think no other external cognitive systems are needed given the rapid progress in multimodal and reasoning transformers.
Extrapolating current trends provides weak evidence that the AGI will end up being too expensive to use properly, since even the o3 and o4-mini models are rumored to become accessible at a price which is already comparable with the cost of hiring a human expert, and the rise to AGI could require a severe increase of compute-related cost.
UPD: It turned out that the PhD-level-agent-related rumors are a fake. But the actual cost of applying o3 and o4-mini has yet to be revealed by the ARC-AGI team...
Reasoning based on presumably low-quality extrapolation
OpenAI’s o3 and o4-mini models are likely to become accessible for $20000 per month, or $240K per year. The METR estimate of the price of hiring a human expert is $143.61 per hour, or about $287K per year, since a human is thought to spend 2000 hours a year working. For comparison, the salary of a Harvard professor is less than $400K per year, meaning that one human professor cannot yet be replaced with twice as many subscriptions to the models (which are compared with PhD-level experts[1] and not with professors) .
As the ARC-AGI data tells us, the o3-low model, which cost $200 per task, solved 75% tasks of the ARC-AGI-1 test. The o3-mini model solved 11-35% of tasks, which is similar to the o1 model, implying that the o4-mini model’s performance is similar to the o3 model. Meanwhile, the price of usage of GPT 4.1-nano is at most four times less than that of GPT 4.1-mini, while performance is considerably worse. As I already pointed out, I find it highly unlikely that ARC-AGI-2-level tasks are solvable by a model cheaper than o5-mini[2] and unlikely that they are solvable by a model cheaper than o6-mini. On the other hand, the increase in the cost from o1-low to o3-low is 133 times, while the decrease from o3-low to o3-mini (low) is 5000 times. Therefore, the cost of forcing o5-nano to do ONE task is unlikely to be much less than that of o3 (which is $200 per task!), while the cost of forcing o6-nano to do one task is likely to be tens of thousands of dollars, which ensures that it will not be used unless it replaces at least half a month of human work.
Were existing trends to continue, replacing at least a month of human work would happen with 80% confidence interval from late 2028 to early 2031. The o1 model was previewed on September 12, 2024, the o3 model was previewed on December 20, 2024. The release of o3-mini happened on January 31, 2025, the release of o4-mini is thought to happen within a week, implying that the road from each model to the next takes from 3 to 4 months or exponentially longer[3] given enough compute and data. Even a scenario of the history of the future assuming solved alignment estimates o5 (or o6-nano?) to be released in late 2025 and o6 to be released in 2026, while the doubling time of tasks is 7 months. Do the estimates of the time when the next model is too expensive to be used unless it replaces a month of human work and the time when the next model is capable of replacing a month of human work end up ensuring that the AGI is highly likely to become too expensive to use?
Unfortunately, the quality of officially-same-level experts varies from country to country. For instance, the DMCS of SPBU provides a course on Lie theory for undergraduate students, while in Stanford Lie theory is a graduate-level course.
Here I assume that each model in the series o1-o3-o4-o5-o6 is the same number of times more capable than the previous one. If subsequent training of more capable models ends up being slowed down by compute deficiency or even World War III, then this will obviously impact both the METR doubling law and the times when costly models appear, but not the order in which AI becomes too expensive to use and capable of replacing workers.
That article is sloppily written enough to say “Early testers report that the AI [i.e. o3 and/or o4-mini] can generate original research ideas in fields like nuclear fusion, drug discovery, and materials science; tasks usually reserved for PhD-level experts” linking, as a citation, to OpenAI’s January release announcement of o3-mini.
The $20,000/month claims seems to originate from that atrocious The Information article, which threw together a bunch of unrelated sentences at the end to create the (false) impression that o3 and o4-mini are innovator-agents which will become available for $20,000/month this week. In actuality, the sentences “OpenAI believes it can charge $20,000 per month for doctorate-level AI”, “new AI aims to resemble inventors”, and “OpenAI is preparing to launch [o3 and o4-mini] this week” are separately true, but have nothing to do with each other.
I want to ask a question here which is not necessarily related to the post but rather your conceptualisation of the underlying reason why memory is a crux for more capabilities style things.
I’m thinking that it has to do with being able to model boundaries of what itself is compared to the environment. It then enables it to get this conception of a consistent Q-function that it can apply whilst if it doesn’t have this, there’s some degree that there’s almost no real object permanence, no consistent identities through time?
Memory allows you to “touch” the world and see what is you and what isn’t if that makes sense?
I’d say the main reason memory is useful is as a way to enable longer-term meta-learning, as well as enable the foundation for continuous learning to work out.
Stateless LLMs/foundation models are already useful. Adding memory to LLMs and LLM-based agents will make them useful in more ways. The effects might range from minor to effectively opening up new areas of capability, particularly for longer time-horizon tasks. Even current memory systems would be enough to raise some of the alignment stability problems I discuss here, once they’re adapted for self-directed or autobiographical memory. I think the question is less whether this will be done, and more how soon and how much they will boost capabilities.
Let’s consider how memory could help agents do useful tasks. A human with lots of knowledge and skills but who can’t learn anything new is a useful intuition pump for the “employability” of agents without new memory mechanisms. (Such “dense anterograde amnesia” is rare because it requires bilateral medial temporal lobe damage while leaving the rest of the brain intact. Two patients occupied most of the clinical literature when I studied it).
Such a “memoryless” person could do simple office tasks by referencing instructions, just like an LLM “agent” can be prompted to perform multi-step tasks. However, almost every task has subtleties and challenges, so can benefit from learning on the job. Even data entry benefits from recognizing common errors and edge cases. More complex tasks usually benefit more from learning. For our memoryless human or an LLM, we could try giving better, more detailed instructions to cover subtleties and edge cases. Current agent work takes this route, whether by prompting or by hand-creating fine-tuning datasets, and it is reportedly just as maddening as supervising a memoryless human would be.
A standard human worker would learn many subtleties of their task over time. They would notice (if they cared) themes for common errors and edge cases. A little human guidance (“watch that these two fields aren’t reversed”) would go a long way. This would make teaching agents new variations in their tasks, or new tasks, vastly easier. We’ll consider barriers to agents directing their own learning below.
Runtime compute scaling creates another reason to add continuous learning mechanisms (“memory” in common parlance) to LLM agents. If an LLM can “figure out” something important for its assigned task, you don’t want to pay that compute and time cost every time, nor take the chance it won’t figure it out again.
Use of agents as companions or assistants would also benefit from memory, in which even limited systems for user’s preferences and prior interactions would be economically valuable.
Longer-term tasks benefit from memory in another way. Humans rely heavily on long-term one-shot memory (“episodic memory”) for organizing large tasks and their many subtasks. There are often dependencies between subtasks. LLM agents can perform surprisingly and increasingly well in their current mode of proceeding largely from start to finish and just getting everything right with little planning or checking, using only their context window for memory.
But it is possible that the tasks included in METR’s report of exponential progress in task length are effectively selected for not needing much memory. And long task progress may be hampered by models’ inability to remember which subtasks they’ve completed, and by memory limitations on effective context length (e.g.). Whether or not this is the case, it seems pretty likely that some long time-horizon tasks would benefit from more memory capabilities of different types.
Even if additional memory systems would be useful for LLM agents, one might think they would take years and breakthroughs to develop.
Yeah, I agree with that and I still feel there’s something missing from that discussion?
Like, there’s some degree that to have good planning capacity you want to have good world model to plan over in the future. You then want to assign relative probabilities to your action policies working out well. To do this having a clear self-environment boundary is quite key, so yes memory enables in-context learning but I do not believe that will be the largest addition, I think the fact that memory allows for more learning about self-environment boundaries is a more important part?
There’s stuff in RL, Active Inference and Michael levin’s work I can point to for this but it is rather like a bunch of information spread out over many different papers so it is hard to give something definitive on it.
This is great; I think it’s important to have this discussion. It’s key for where we put our all-too-limited alignment efforts.
I roughly agree with you that pure transformers won’t achieve AGI, for the reasons you give. They’re hitting a scaling wall, and they have marked cognitive blindspots like you document here, and Thane Ruthenis argues for convincingly in his bear case. But transformer-based agents (which are simple cognitive architectures) can still get there- and I don’t think they need breakthroughs, just integration and improvement. And people are already working on that.
To put it this way: humans have all of the cognitive weaknesses you identify, too. But we can use online learning (and spatial reasoning) to overcome them. We actually generalize only rarely and usually with careful thought. Scaffolded and/or RL-trained CoT models can do that too. Then we remember our conclusions and learn from them. Add-on memory systems and fine-tuning setups can replicate that.
Generalization: It’s a sort of informal general conclusion in cognitive psychology that “wow are people bad at generalizing”. For instance, if you teach them a puzzle, then change the names and appearances, it looks like they don’t apply the learning at all. These are undergraduates who are usually unpaid, so they’re not doing the careful thinking it requires humans to generalize knowledge. LLMs of the generation you’re testing don’t think carefully either (you note that in a couple of places “it’s like it’s not thinking” which is exactly right), but CoT RL on a variety of reward models is making disturbingly rapid progress at teaching them when and how to think carefully—and enabling generalization. Good memory systems will be necessary for them to internalize the new general principles they’ve learned, but those might be good enough already, and if not, they probably will be very soon.
I think you’re overstating the case, and even pure transformers could easily reach human-level “Real AGI” that’s dangerous as hell. Continued improvements in effectively using long context windows and specific training for better CoT reasoning do enable a type of online learning. Use Gemini 2.5 Pro for some scary demonstrations, although it’s not there yet. Naked transformers probably won’t reach AGI, which I’m confident of mostly because adding outside memory (and other cognitive) systems is so much easier and already underway.
My LLM cognitive architectures article also discusses other add-on cognitive systems like vision systems that would emulate how humans solve tic-tac-toe, sliding puzzles etc. - but now I don’t think transformers need those, just more multimodal training. (Note that Claude and o1 preview weren’t multimodal, so were weak at spatial puzzles. If this was full o1, I’m surprised.) They’ll still be relatively bad at fundamentally spatial tasks like yours and ARC-AGI, but they can fill that in by learning or being taught the useful concepts/tricks for actually useful problem spaces. That’s what a human would do if they happened to be have poor spatial cognition and had to solve spatial tasks.
I think transformers not reaching AGI is a common suspicion/hope amongst serious AGI thinkers. It could be true, but it’s probably not, so I’m worried that too many good thinkers are optimistically hoping we’ll get some different type of AGI. It’s fine and good if some of our best thinkers focus on other possible routes to AGI. It is very much not okay if most of our best thinkers incorrectly assume they won’t. Prosaic alignment work is not sufficient to align LLMs with memory/online learning, so we need more agent foundations style thinking there.
I address how we need to go beyond static alignment of LLMs to evolving learned systems of beliefs, briefly and inadequately in my memory changes alignment post and elsewhere. I am trying to crystallize the reasoning in a draft post with the working title “if Claude achieves AGI will it be aligned?”
I just tried the sliding puzzle with o1 and it got it right! Though multimodality may not have been relevant, since it solved it by writing a breadth-first search algorithm and running it.
Interesting! Nonetheless, I agree with your opening statement that LLMs learning to do any of these things individually doesn’t address the larger point that the have important cognitive gaps and fail.to generalize in ways that humans can.
Thanks!
To clarify, my position is not “transformers are fundamentally incapable of reaching AGI”, it’s “if transformers do reach AGI, I expect it to take at least a few breakthroughs more”.
If we were only focusing on the kinds of reasoning failures I discussed here, just one breakthrough might be enough. (Maybe it’s memory—my own guess was different and I originally had a section discussing my own guess but it felt speculative enough that I cut it.) Though I do think that that full AGI would also require a few more competencies that I didn’t get into here if it’s to generalize beyond code/game/math type domains.
Right, I got that. To be clear, my argument is that no breakthroughs are necessary, and further that progress is underway and rapid on filling in the existing gaps in LLM capabilities.
Memory definitely doesn’t require a breakthrough. Add-on memory systems (RAG and fine-tuning, as well as more sophisticated context management through prompting; CoT RL training effectively does this too).
Other cognitive capacities also exist in nascent form and so probably require no breakthroughs. Although I think no other external cognitive systems are needed given the rapid progress in multimodal and reasoning transformers.
Extrapolating current trends provides weak evidence that the AGI will end up being too expensive to use properly, since even the o3 and o4-mini models are rumored to become accessible at a price which is already comparable with the cost of hiring a human expert, and the rise to AGI could require a severe increase of compute-related cost.
UPD: It turned out that the PhD-level-agent-related rumors are a fake. But the actual cost of applying o3 and o4-mini has yet to be revealed by the ARC-AGI team...
Reasoning based on presumably low-quality extrapolation
OpenAI’s o3 and o4-mini models are likely to become accessible for $20000 per month, or $240K per year. The METR estimate of the price of hiring a human expert is $143.61 per hour, or about $287K per year, since a human is thought to spend 2000 hours a year working. For comparison, the salary of a Harvard professor is less than $400K per year, meaning that one human professor cannot yet be replaced with twice as many subscriptions to the models (which are compared with PhD-level experts[1] and not with professors) .
As the ARC-AGI data tells us, the o3-low model, which cost $200 per task, solved 75% tasks of the ARC-AGI-1 test. The o3-mini model solved 11-35% of tasks, which is similar to the o1 model, implying that the o4-mini model’s performance is similar to the o3 model. Meanwhile, the price of usage of GPT 4.1-nano is at most four times less than that of GPT 4.1-mini, while performance is considerably worse. As I already pointed out, I find it highly unlikely that ARC-AGI-2-level tasks are solvable by a model cheaper than o5-mini[2] and unlikely that they are solvable by a model cheaper than o6-mini. On the other hand, the increase in the cost from o1-low to o3-low is 133 times, while the decrease from o3-low to o3-mini (low) is 5000 times. Therefore, the cost of forcing o5-nano to do ONE task is unlikely to be much less than that of o3 (which is $200 per task!), while the cost of forcing o6-nano to do one task is likely to be tens of thousands of dollars, which ensures that it will not be used unless it replaces at least half a month of human work.
Were existing trends to continue, replacing at least a month of human work would happen with 80% confidence interval from late 2028 to early 2031. The o1 model was previewed on September 12, 2024, the o3 model was previewed on December 20, 2024. The release of o3-mini happened on January 31, 2025, the release of o4-mini is thought to happen within a week, implying that the road from each model to the next takes from 3 to 4 months or exponentially longer[3] given enough compute and data. Even a scenario of the history of the future assuming solved alignment estimates o5 (or o6-nano?) to be released in late 2025 and o6 to be released in 2026, while the doubling time of tasks is 7 months. Do the estimates of the time when the next model is too expensive to be used unless it replaces a month of human work and the time when the next model is capable of replacing a month of human work end up ensuring that the AGI is highly likely to become too expensive to use?
Unfortunately, the quality of officially-same-level experts varies from country to country. For instance, the DMCS of SPBU provides a course on Lie theory for undergraduate students, while in Stanford Lie theory is a graduate-level course.
Here I assume that each model in the series o1-o3-o4-o5-o6 is the same number of times more capable than the previous one. If subsequent training of more capable models ends up being slowed down by compute deficiency or even World War III, then this will obviously impact both the METR doubling law and the times when costly models appear, but not the order in which AI becomes too expensive to use and capable of replacing workers.
Exponential increase in time spent between the models’ appearance does not ensure that o5 and o6 are released later than in the other scenario.
That article is sloppily written enough to say “Early testers report that the AI [i.e. o3 and/or o4-mini] can generate original research ideas in fields like nuclear fusion, drug discovery, and materials science; tasks usually reserved for PhD-level experts” linking, as a citation, to OpenAI’s January release announcement of o3-mini.
TechCrunch attributes the rumor to a paywalled article in The Information (and attributes the price to specialized agents, not o3 or o4-mini themselves).
Already available for $20/month.
The $20,000/month claims seems to originate from that atrocious The Information article, which threw together a bunch of unrelated sentences at the end to create the (false) impression that o3 and o4-mini are innovator-agents which will become available for $20,000/month this week. In actuality, the sentences “OpenAI believes it can charge $20,000 per month for doctorate-level AI”, “new AI aims to resemble inventors”, and “OpenAI is preparing to launch [o3 and o4-mini] this week” are separately true, but have nothing to do with each other.
I want to ask a question here which is not necessarily related to the post but rather your conceptualisation of the underlying reason why memory is a crux for more capabilities style things.
I’m thinking that it has to do with being able to model boundaries of what itself is compared to the environment. It then enables it to get this conception of a consistent Q-function that it can apply whilst if it doesn’t have this, there’s some degree that there’s almost no real object permanence, no consistent identities through time?
Memory allows you to “touch” the world and see what is you and what isn’t if that makes sense?
I’d say the main reason memory is useful is as a way to enable longer-term meta-learning, as well as enable the foundation for continuous learning to work out.
From @Seth Herd’s post:
Or @gwern’s comment here:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/?commentId=hSkQG2N8rkKXosLEF
Yeah, I agree with that and I still feel there’s something missing from that discussion?
Like, there’s some degree that to have good planning capacity you want to have good world model to plan over in the future. You then want to assign relative probabilities to your action policies working out well. To do this having a clear self-environment boundary is quite key, so yes memory enables in-context learning but I do not believe that will be the largest addition, I think the fact that memory allows for more learning about self-environment boundaries is a more important part?
There’s stuff in RL, Active Inference and Michael levin’s work I can point to for this but it is rather like a bunch of information spread out over many different papers so it is hard to give something definitive on it.