Insofar As I Think LLMs “Don’t Really Understand Things”, What Do I Mean By That?
When I put on my LLM skeptic hat, sometimes I think things like “LLMs don’t really understand what they’re saying”. What do I even mean by that? What’s my mental model for what is and isn’t going on inside LLMs minds?
First and foremost: the phenomenon precedes the model. That is, when interacting with LLMs, it sure feels like there’s something systematically missing which one could reasonably call “understanding”. I’m going to articulate some mental models below, but even if I imagine all those mental models are wrong, there’s still this feeling that LLMs are missing something and I’m not quite sure what it is.
That said, I do have some intuitions and mental models for what the missing thing looks like. So I’ll run the question by my intuitions a few times, and try to articulate those models.
First Pass: A Bag Of Map-Pieces
Imagine taking a map of the world, then taking a bunch of pictures of little pieces of the map—e.g. one picture might be around the state of Rhode Island, another might be a patch of Pacific Ocean, etc. Then we put all the pictures in a bag, and forget about the original map.
A smart human-like mind looking at all these pictures would (I claim) assemble them all into one big map of the world, like the original, either physically or mentally.
An LLM-like mind (I claim while wearing my skeptic hat) doesn’t do that. It just has the big bag of disconnected pictures. Sometimes it can chain together three or four pictures to answer a question, but anything which requires information spread across too many different pictures is beyond the LLM-like mind. It would, for instance, never look at the big map and hypothesize continental drift. It would never notice if there’s a topological inconsistency making it impossible to assemble the pictures into one big map.
Second Pass: Consistent Domains
Starting from the map-in-a-bag picture, the next thing which feels like it’s missing is something about inconsistency.
For example, when tasked with proving mathematical claims, a common pattern I’ve noticed from LLMs is that they’ll define a symbol to mean one thing… and then make some totally different and incompatible assumption about the symbol later on in the proof, as though it means something totally different.
Bringing back the map-in-a-bag picture: rather than a geographical map, imagine lots of little pictures of a crystal, taken under an electron microscope. As with the map, we throw all the pictures in a bag. A human-like mind would try to assemble the whole thing into a globally-consistent picture of the whole crystal. An LLM-like mind will kinda… lay out a few pieces of the picture in one little consistent pattern, and then separately lay out a few pieces of the picture in another little consistent pattern, but at some point as it’s building out the two chunks they run into each other (like different crystal domains, but the inconsistency is in the map rather than the territory). And then the LLM just forges ahead without doing big global rearrangements to make the whole thing consistent.
That’s the mental picture I associate with the behavior of LLMs in proofs, where they’ll use a symbol to mean one thing in one section of the proof, but then use it in a totally different and incompatible way in another section.
Third Pass: Aphantasia
What’s the next thing which feels like it’s missing?
Again thinking about mathematical proofs… the ideal way I write a proof is to start with an intuitive story/picture for why the thing is true, and then translate that story/picture into math and check that all the pieces follow as my intuition expects.[1]
Coming back to the map analogy: if I were drawing a map, I’d start with this big picture in my head of the whole thing, and then start filling in pieces. The whole thing would end up internally consistent by default, because I drew each piece to match the pre-existing picture in my head. Insofar as I draw different little pieces in a way that doesn’t add up to a consistent big picture, that’s pretty strong evidence that I wasn’t just drawing out a pre-existing picture from my head.
I’d weakly guess that aphantasia induces this sort of problem: an aphantasic, asked to draw a bunch of little pictures of different parts of an object or animal or something, would end up drawing little pictures which don’t align with each other, don’t combine into one consistent picture of the object or animal.
That’s what LLMs (and image generators) feel like. It feels like they have a bunch of little chunks which they kinda stitch together but not always consistently. That, in turn, is pretty strong evidence that they’re not just transcribing a single pre-existing picture or proof or whatever which is already “in their head”. In that sense, it seems like they lack a unified mental model.
Fourth Pass: Noticing And Improving
A last piece: it does seem like, as LLMs scale, they are able to assemble bigger and bigger consistent chunks. So do they end up working like human minds as they get big?
Maybe, and I think that’s a pretty decent argument, though the scaling rate seems pretty painful.
My counterargument, if I’m trying to play devil’s advocate, is that humans seem to notice this sort of thing in an online way. We don’t need to grow a 3x larger brain in order to notice and fix inconsistencies. Though frankly, I’m not that confident in that claim.
- ^
I don’t always achieve that ideal; sometimes back-and-forth between intuition and math is needed to flesh out the story and proof at the same time, which is what most of our meaty research looks like.
On my model, humans are pretty inconsistent about doing this.
I think humans tend to build up many separate domains of knowledge and then rarely compare them, and even believe opposite heuristics by selectively remembering whichever one agrees with their current conclusion.
For example, I once had a conversation about a video game where someone said you should build X “as soon as possible”, and then later in the conversation they posted their full build priority order and X was nearly at the bottom.
In another game, I once noticed that I had a presumption that +X food and +X industry are probably roughly equally good, and also a presumption that +Y% food and +Y% industry are probably roughly equally good, but that these presumptions were contradictory at typical food and industry levels (because +10% industry might end up being about 5 industry, but +10% food might end up being more like 0.5 food). I played for dozens of hours before realizing this.
I’m old enough to have wanted to get from point A to point B in a city for which I literally had a torn map in a bag (I mean, it was in 2 pieces). I can’t imagine a human experienced with paper maps who would not figure it out… But I would not put it beyond a robot powered by a current-gen LLM to screw it up ~half the time.
When you did realize this, eventually, did you feel like you were maximally smart already (no improvement possible) or did you feel like you want to at least try to not make the same mistake tomorrow (without eating more calories per day and without forgetting how to tie your shoelaces)?
Disagree with these. Humans don’t automatically make all the facts in their head cohere. I think its plausible that they’re worse at humans at doing this. But that seems insufficient for making a discrete demarcation. For example:
This happens pretty often with humans actually? Like one of the most common ways people (compsci undergrads and professional mathematicians alike) make errors in proofs is like
I agree that there are ways LLMs understanding is shallower than humans, but from my PoV, a lot of that impression comes from.
When you use the models you consistently restart and rewind them. This means their flaws are laid much more bare than when talking to humans. Like, if we were having a debate about, for example, the degree to which LLMs can be said to understand things, and we had that debate 100 times, but my memory was wiped between each debate, and yours kept, I’m pretty sure I’d look very stupid and shallow. Not that different from a chatbot.
Lack of continual learning type things? Like if in a conversation, you say something that makes me realize my understanding of a concept is incomplete or subtly inconsistent, I might in the background update my understanding and reevaluate whether my claims still make sense in lieu of that revised understanding. LLMs seem to have a hard time doing this. But that seems like a meaningfully different problem from “not understanding” stuff
Having a lot of knowledge. Like if you met a human who could solve algebraic geometry problems, you’d assume they were pretty smart. And if they were unable to like, connect their headset to their phone via bluetooth despite trying for a week, and nothing being wrong with their phone/headset, you’d be seriously confused/surprised, and think they probably had some mental disorder, or were just messing with you. But like, LLMs, are more like a 60IQ human who knows a huge amount of things, and is unusually good at manipulating symbols and dealing with (from a human perspective) abstraction. And like, that a 60IQ human will forget what they meant with a term, or will not have totally clear mental images of the domain they’re dealing with, would maybe not be that surprising.
Hm, do you see the OP as arguing that it happens “automatically”? My reading was more like that it happens “eventually, if motivated to figure it out” and that we don’t know how to “motivate” LLMs to be good at this in an efficient way (yet).
Sure, and would you hire those people and rely on them to do a good job BEFORE they learn better?
Having spent a lot of time attempting to explain things to my 3 year old children, I’m far from certain this is the case. No matter how many times we explain the difference between a city and a country, and when we go on a car, vs a plane she’ll ask as us right after a five hour flight whether we’re close to my work, which we usually go to on a one hour train.
My 5 year old groks all this intuitively, but there’s very little point explaining it to the 3 year old (even though she talks beautifully, and can understand all the sentences we say as standalone facts). At some point her brain will grow more sophisticated and she’ll grok all of this too.
I observed the same process with puzzles. No matter how many times I point out what corner pieces and edge pieces are, they simply cannot work out that a corner piece has to be next to an edge piece until they’re about 3. No amount of explaining or examples will help.
Does this cache out into concrete predictions of tasks which you expect LLMs to make little progress on in the future?
A very literal eval your post would suggest is to literally take two maps or images of some kind of similar stylistic form but different global structure, cut them into little square sections, and ask a model to partition the pieces from both puzzles into two coherent wholes. I expect LLMs to be really bad at this task right now, but they’re very bad at vision in general so “true understanding” isn’t really the bottleneck IMO.
But one could do a similar test for text-based data; eg one could ask a model to reconstruct two math proofs with shared variable names based on an unordered list of the individual sentences in each proof. Is this the kind of thing you expect models to make unusually little progress on relative to other tasks of similar time horizon? (I might be down to bet on something like this, though I think it’ll be tricky to operationalize something crisply enough.)
I’ve been working towards automated research (for safety) for a long time. After a ton of reflection and building in this direction, I’ve landed on a similar opinion as presented in this post.
I think LLM scaffolds will solve some problems, but I think they will be limited in ways that make it hard to solve incredibly hard problems. You can claim that LLMs can just use a scratchpad as a form of continual online learning, it feels like this will hit limits. Information loss and being able to internalize new information feels like bottlenecks.
Scale will help, but unclear how far it will go and clearly not economical.
That said, I still think automated research for safety is underinvested.
Stepping back to the meta level (the OP seems a fine), I worry that you fail to utilize LLMs.
“There is are ways in which John could use LLMs that would be useful in significant ways, that he currently isn’t using, because he doesn’t know how to do it. Worse he doesn’t even know these exist.”
I am not confident this statement is true, but based on things you say, and based on how useful I find LLMs, I intuit there is a significant chance it is true.
If the statement is true or not doesn’t really matter, if the following is true: “John never seriously sat down for 2 hours and really tried to figure out how to utilize LLMs full.”
E.g. I expect when you had the problem that the LLM reused symbols randomly you didn’t go: “Ok how could I prevent this from happening? Maybe I could create an append only text pad, in which the LLM records all definitions and descriptions of each symbol, and have this text pad be always appended to the prompt. And then I could have the LLM verify that the current response has not violated the pad’s contents, and that no duplicate definitions have been added to the pad.”
Maybe this would resolve the issue, probably not based on priors. But it seems important to think this kind of thing (and think for longer such that you get multiple ideas, of which one might work, and ideally first focus on trying to build a mechanistic model of why the error is happening in the first place, that allows you to come up with better interventions).
I somewhat agree with your description of how LLMs seem to think, but I don’t think it is an explanation of a general limitation of LLMs. But the patterns you describe do not seem to me to be a good explanation for how humans think in general. Ever since The Cognitive Science of Rationality has it been discussed here that humans usually do not integrate their understanding into a single, coherent map of the world. Humans instead build and maintain many partial, overlapping, and sometimes contradictory maps that only appear unified. Isn’t that the whole point of Heuristics & Biases? I don’t doubt that the process you describe exists or is behind the heights of human reasoning, but it doesn’t seem to be the basis of the main body of “reasoning” out there on the internet on which LLMs are trained. Maybe they just imitate that? Or at least they will have a lot of trouble imitating human thinking while still building a coherent picture underneath that.
I think “understanding” in humans is an active process that demands cognitive skills we develop with continuous learning. I think you’re right that LLMs are missing “the big picture” and organizing their local concepts to be consistent with it. I don’t think humans do this automatically (per Dweomite’s comment on this post), but that we need to learn skills to do it. I think this a lot of what LLMs are missing (TsviBT’s “dark matter of intelligence”).
I wrote about this in Sapience, understanding, and “AGI” but I wasn’t satisfied and it’s out of date. This is an attempt to do a better and briefer explanation, as a sort of run-up to doing an updated post.
We’ve learned skills for thought management/metacognition/executive function. They’re habits, not beliefs (episodic memories or declarative knowledge), so they’re not obvious to us. We develop “understanding” by using those skills to metaphorically turn over concepts in our minds. This is actively comparing them to memories of data, and other beliefs. Doing this checks their consistency with other things we know. Learning from these investigations improves our future understanding of that concept, and our skills for understanding others.
What LLMs are missing relative to humans is profound right now, but may be all too easy to add adequately to get takeover-capable AGI. Among other things (below), they’re missing cognitive skills that aren’t well-described in the text training set, but may be pretty easy to learn with a system 2 type approach that can be “habitized” with continuous learning. This might be as easy as a little fine-tuning, if the interference problem is adequately solved—and what’s adequate might not be a high bar. Fine-tuning already adds this type of skills, but it seems to produce too much interference for it to keep going. And I don’t know of a full self-teaching loop, although there is constant progress on most or all of the components to build one.
There may be other routes to filling in that missing executive function and active processing for human-like understanding.
This is why I’m terrified of short timelines while most people have slightly longer timelines at this point.
I’ve been thinking about this a lot in light of the excellent critiques of LLM thinking over the last year. My background is “computational cognitive neuroscience,” so comparing LLMs to humans is my main tool for alignment thinking.
When I was just getting acquainted with LLMs in early 2023, my answers were that they’re missing episodic memory (for “snapshot” continuous learning) and “executive function”, a vague term that I’m now thinking is mostly skills for managing cognition. I wrote about this in Capabilities and alignment of LLM cognitive architectures in early 2023. If you can overlook my focus on scaffolding, I think it stands up as a partial analysis of what LLMs are missing and the emergent/ synergistic/ multiplicative advantages of adding those things.
But it’s incomplete. I didn’t emphasize continuous skill learning there, but I now think it’s pretty crucial for how humans develop executive function and therefore understanding. I don’t see a better way to give it to agentic LLMs. RL on tasks could do it, but that has a data problem if it’s not self-directed like human learning is. But there might be other solutions.
I think this is important to figure out. It’s pretty crucial for both timelines and alignment strategy.
>It would, for instance, never look at the big map and hypothesize continental drift.
Millions of humans must have looked at relatively accurate maps of the globe without hypothesizing continental drift. A large number must have also possessed sufficient background knowledge of volcanism, tectonic activity etc to have had the potential to connect the dots.
Even the concept of evolution experienced centuries or millenia of time between widespread understanding and application of selective breeding, without people before Darwin/Wallace making the seemingly obvious connection that the selection pressure on phenotype and genotype could work out in the wild. Human history is littered with a lot of low hanging fruit, as well as discoveries that seem unlikely to have been made without multiple intermediate discoveries.
I believe it was Gwern who suggested that future architectures or training programs might have LLMs “dream” and attempt to draw connections between separate domains of their training data. In the absence of such efforts, I doubt we can make categorical claims that LLMs are incapable of coming up with truly novel hypotheses or paradigms. And even if they did, would we recognize it? Would they be capable of, or even allowed to follow up on them?
Edit: Even in something as restricted as artistic “style”, Gwern raised the very important question of whether a truly innovative leap by an image model would be recognized as such (assuming it would if a human artist made it) or dismissed as weird/erroneous. The old deep dream was visually distinct from previous human output, yet I can’t recall anyone endorsing it as an AI-invented style.
I personally, as a child, looked at a map of the world and went “huh, it sure looks like these continents over here kinda fit in over there, maybe they moved?”, before I had learned of continental drift.
(For some reason I remember the occasion quite well, like I remember the spot where I was sitting at the time.)
Gwern’s essay you mentioned, in case others are curious: https://gwern.net/ai-daydreaming
I’d also highlight the obstacles and implications sections:
First off, TFTP. I marked some stuff I thought was most relevant. This is helping remind me of some things I think about LLM confabulation and lack of binding/reasoning… I don’t have my thoughts fully formed but there’s something here about global inconsistency despite local compatibility, and how that cashes out in Problems. Something a little like an inability to define a sheaf, or homology detection, or something like that? I might say more better words later about it.