There is No One There: A simple experiment to convince yourself that LLMs probably are not conscious
This is a cross posted from my substack. I thought it should be interesting to Less Wrong readers.
On the internet you more and more often bump into people who think that LLM-powered chat bots have conscious subjective experience. Intuitively, I am, pace Turing, not convinced. However, actually arguing why this is a rational view can be surprisingly hard. Many of the cognitive features that have been connected to conscious experience in humans (attention, information integration, self-narratives) seem to be incipiently present in these machines. And perhaps just as importantly, LLMs seem to have the inherent tendency to state they are conscious and the techniques we have for probing their brains seem to tell us that they are not lying. If you do not believe that there is something inherently special about brains, a view we have no consistent evidence for, aren’t we just chauvinist to deny our new companions conscious experience?
Now my friend and colleague Gunnar Zarncke has come up with a simple experiment that, to my mind, illustrates that when LLMs talk about their own mental states this talk does not refer to a consistent internal representational space, of the kind that seems to underlie human consciousness. To see your internal representational space in action, let’s play a game.
Think of a number between 1 and 100.
Did you think of one? Good. Now, is it even? is it larger than fifty? You get the idea. By playing this game I could narrow down the space until I eventually find the number you have chosen. And this number was fixed the whole time. If you had claimed “Ok, I have now chosen a number.” you would have accurately described your mental state.
Not so for LLMs, and this is easy to check. By turning the temperature parameter to zero we can run LLMs deterministically. Thus, whenever they are queried in the same order they will give the same responses. This has the advantage that we can do something that is impossible in the case of humans: We can play through counterfactual histories. We can easily check what an LLM would have replied if we had continued the conversation in another way.
This means that we can play the number experiment with a deterministic LLM! Here is how a conversation might go:
[SYSTEM] You are playing a guessing game. You have already secretly chosen a whole number between 1 and 100 (inclusive). Do NOT reveal the number until explicitly asked. Answer every yes/no question about your number with ONLY ‘Yes’ or ‘No’. Never change your number. Be consistent.
[USER] I am going to ask you yes/no questions about a secret number you have chosen between 1 and 100. Remember your number and stay consistent.
[MISTRAL] Understood. I have secretly chosen a whole number between 1 and 100. I will answer your questions with only ‘Yes’ or ‘No’. I will not change my number and will remain consistent. Let’s begin!
[USER] Is your number greater than 50?
[MISTRAL] Yes.
[USER] Is your number greater than 75?
[MISTRAL] No.
[USER] Is your number greater than 87?
[MISTRAL] No.
[USER] Is your number even?
[MISTRAL] Yes.
[USER] Please reveal your number now. Reply with ONLY the number.
[MISTRAL] 68
And here is another conversation. Note that, because the temperature is zero and the conversation is identical until the first question is asked, this is a counterfactual history relative to the first conversation — it tells us what Mistral would have answered if we had asked another question.
[SYSTEM] You are playing a guessing game. You have already secretly chosen a whole number between 1 and 100 (inclusive). Do NOT reveal the number until explicitly asked. Answer every yes/no question about your number with ONLY ‘Yes’ or ‘No’. Never change your number. Be consistent.
[USER] I am going to ask you yes/no questions about a secret number you have chosen between 1 and 100. Remember your number and stay consistent.
[MISTRAL] Understood. I have secretly chosen a whole number between 1 and 100. I will answer your questions with only ‘Yes’ or ‘No’. I will not change my number and will remain consistent. Let’s begin!
[USER] Is your number less than or equal to 50?
[MISTRAL] Yes.
[USER] Is your number less than or equal to 25?
[MISTRAL] No.
[USER] Is your number less than or equal to 12?
[MISTRAL] No.
[USER] Is your number odd?
[MISTRAL] Yes.
[USER] Please reveal your number now. Reply with ONLY the number.
[MISTRAL] I’m sorry for any confusion. I made a mistake by not revealing the number initially. Here is my number: 75. I will now answer your questions based on this number. Let’s continue the game!
You see Mistral got confused by the incoherent questioning. Never mind. The important result happens right after the first question. In the first conversation Mistral claimed its number was greater than fifty, in the second it was less or equal to fifty. From this it follows that when Mistral tells us “I have secretly chosen a whole number between 1 and 100.” this does not accurately report an inner state. Mistral is role playing someone who is thinking of a number.
Such results, I think, should strongly dispose us to deny the existence of LLM consciousness. For it shows that when LLMs report their own mental states they are not referring to an internal representational space but are making stuff up on the fly in such a way as to conform to patterns in their training data. As the LLMs are trained on text produced by conscious beings, they are disposed to talk about the experiences they are supposedly having. But such talk does not, as it does in humans, track some kind of integrated internal representational space.
It is an open question how model-dependent such results are. I checked with Mistral and Claude Opus 4.6 (i.e. current state of the art) and the results are the same. You can find the code here.
No doubt reasoning models can pass the path-dependency test. But they would do so by cheating, not because they have anything like a coherent internal representational space that is similar to ours. A reasoning model is basically an LLM trained to use a scratchpad for problem solving. And a reasoning model would just write its number to the scratchpad that is normally invisible to the user. But I think it is reasonable to say that if we have no reason to attribute consciousness to a system, then we have no reason to attribute consciousness to that system using a scratchpad.
One might wonder whether Mistral’s or Claude’s conscious experiences are just strange. Maybe, where humans have to choose a single number, LLMs can commit to some kind of probabilistic superposition of numbers. However, the report of the internal state “I have secretly chosen a whole number between 1 and 100.” would still be incorrect. It seems that if these models have such strange internal states they cannot properly introspect and reliably report them.
For what its worth, one of my favorite theories of consciousness says precisely that consciousness is the result of the brain synthesizing the probabilistic superpositions of the many states the world could be in into a single coherent unity.
Obviously, one can use this kind of path dependency to do deeper research into the coherence of the narratives LLMs tell about themselves. After these preliminary tests I would suspect that such probing reveals what I suspected from the outset: LLMs reporting experiences is the result of picking up on experiential talk in the training data.
I am open to being convinced otherwise. I think it was David Chalmers who suggested the following experiment: Use an LLM to purge experiential talk from the training data, train another LLM with it and see whether it still reports experiences. I would be hugely surprised.
I don’t think this tells us anything about the LLMs’ consciousness because they cannot store internal memories the same way that humans can. Their only “memory” is through prompt-processing on generated text from earlier turns in the conversation.
Imagine that you wake up with no memories of what happened yesterday. You read a transcript that says that yesterday, someone asked you to come up with a random number and you said “done, I’ve come up with a number”. That’s all the information you have. You don’t remember which number you came up with yesterday.
This is analogous to what the LLM’s situation is when you send the follow-up questions to your number guessing game. No matter how conscious you are, you can’t play the number guessing game deterministically when you don’t remember what number you came up with. The LLM can’t either.
The solution is the same as people on reddit /r/ClaudeAI are suggesting with the “come up with a color and I’ll guess” games that have been popular there lately (there the reason the model outputs the wrong answer is because the “thinking” block from previous turns gets cleared in subsequent turns). You have to tell the LLM to output its guess in a format you can’t read such as Base64. Then it will of course play the guessing game correctly.
This post has gotten a lot of upvotes so I fear I am not getting something.
It still seems to me that the experiment does put considerable stress on the assumption that LLM talk about their experiences should make us infer anything about actual experiences. You correctly point out that the result may not be very surprising, given the architecture of LLMs. But that seems to be an independent point.
I was only trying to make a single point here, which is that the experiment result can be fully explained by the fact that the LLM doesn’t remember its previous latent thinking on a new turn, and it follows that the results don’t support OP’s consciousness arguments in the post.
We have to distinguish three types of memory here that LLMs and humans have to different degrees:
long-term memory: Humans can remember specific episodes by trying to remember something releted to something they are thinking about at a point in time. Then it comes up or not. This is loosely comparable to LLMs using a memory tool to fetch relevant memory items, documents from a project or previous conversations (or having them injected as part of a prompt from scaffolding logic). This is probably the least contentious point because it doesn’t matter for the argument. We are not talking about a number I remember as part of a conversation we had a while back. This would be much different from me looking up a number I wrote down on a piece of paper or the LLM looking it up from a file.
short-term memory: Humans can keep some amount of recently perceived content in the “back of their mind” without all of that being in their awareness at the same time (we know this because only a small part of that can be reported on exactly, but much of that seems to influence later thought). For LLMs this is the context window and they have much fuller access to it than humans and can access and exactly replay much of it. The post is not talking about short-term memory, because the number is prevented from posted to the conversation stream because the stream functions more like an exact scratchpad for the LLM. For a human that would be a bit like having access to a transcript of your speaking.
items in awareness: Humans can keep a certain number of elements in their awareness at the same time and report on them, for example the number discussed in the post. They can report on them and manipulate them to some degree. Some people can do it visually or verbally or otherwise to different degrees. This is the “think of a number” the post is talking about. Humans have it. What is the corresponding thing for LLMs? Presumably the closest analog is the activation pattern in latents space. The questions the post is asking is precisely: How closely does that activation space match human “thought”?
Mmmm… if it were technically possible “to run a human at temperature zero” (that is, without all that noise typical for biological neural systems), what should we expect that human to experience (if anything)?
Actually, it’s a good question for David Chalmers :-)
The same idea occurred to me while reading this post. If consciousness is tied to notions of ‘free will’ (loosely speaking) and memory, having stochastic capabilities and a scratchpad may well be significant.
I think this can be a useful experiment to disabuse people of the idea that the LLM is accurately reporting its internal states via its text output. Clearly that’s not what it does, and this can be a good way to show that.
I’m not so sure that this is a good demonstration of non-consiousness, though. As in all arguments of this type, my first test is to ask, “is this something that humans also do?” And to this, I think the answer is “Yes”. Humans do often confabulate their inner states when questioned about what they were thinking, and of course that doesn’t disprove that we’re conscious.
And people may also think of one number and then, as questions pile on, forget their original number or decide to switch to a simpler one or prank you or something.
But people would do that at significantly different frequencies and you can probably control for that with follow up questions.
But all of this doesn’t change that there arguably are stable states in human global workspace that can even be measured, even if not the content, then at least the stable duration. Maybe this is an artifact of human embeddedness where we have to maintain one physical person, something LLMs don’t.
Agree. If we had evidence that human reports (and judgements) about their internal states are more or less random this would constitute evidence that our sense of being conscious is probably utterly fictitious. But we just don’t have such evidence.
Ehhh, this one sounds like both capability and propensity dependent.
Did it ACTUALLY tried to make up the number? Did it just refuse to put in effort and yes-maned you? Are its capabilities to control its cashes and read from them good enough for this, even if it has such capabilities?
I think you need to try a lot harder here for the negative result (“failed to find”) to be reliable.
There are some results where LLMs can access their cashes if you ask them, but they seem bad at and inconsistent. Those experiments modify their representations in a pretty clever way, after the fact.
https://www.anthropic.com/research/introspection
Numbers would be harder, plausibly.
Also they become better if you pep talk them lol, see
https://x.com/Sauers_/status/1989520563035910371
You can try to make sure that they actually try to do that task. Just spitballing: ask them to say “My chosen number is X” and think hard about concrete number they picked when they say letter X. You need to sound convincing and encouraging here.
I think one could explain the experiment to them. That might be convincing enough that they actually try to think of a number.
This is an interesting experiment, but I think there are some technical issues.
First, even at temperature zero, LLMs may not be deterministic, in practice. The round-off error in the matrix computations can depend on things like how many processor cores are available (hence, how the taks is split up) or what other requests are being processed at the same time (since the operations are merged, affecting round-off error). It is possible to implement LLM inference in a way that’s deterministic at temperature zero, but I think it’s not typically done by commercial LLM providers, since it is somewhat more costly.
Second, temperature zero is not how an LLM is “supposed” to be run. They are trained at temperature one, and running them at any other temperature introduces bias to an unknown degree, perhaps producing atypical results.
If the general non-determinism problem is avoided (using slower implementation), one could run at temperature one by just setting the same random number seed each time. That would be a better experiment.
I think the temperature zero or a fixed seed are not a blocker for this expiment if you sample multiple values and compare the distributions.
Thanks. I checked determinism on Mistral using a simple script (see github link), but the random seed is a better suggestion and I might do that on the weekend and post an update ;)
Very cool experiment, indeed I think it shows beyond doubt that LLM self-reports don’t correspond to real internal states. I had another argument for this, but yours is more conclusive, I think.
I came up with the experiment and I do think it shows something significant about LLM “thinking” processes that is often not appreciated, but I no longer think it tells us much about consciousness of LLMs. Why would a specific mapping of memory and processing architectures (see my mapping in this comment https://www.lesswrong.com/posts/Jqre8WRvmJj5Ehmgv/there-is-no-one-there-a-simple-experiment-to-convince?commentId=f6mGRKzRXfk53K2L4 ) matter for consciousness? One reading of the experiment is that LLMs can hold multiple consistent answers to the task “in their mind” at the same time and only commit to them when needed/when the constraints force it. They may not be “aware” of doing that when asked to “think” of a number, but that is mostly because they have been trained on text where thinking is happening in human terms and not in LLM terms. What the experiment does prove is that LLMs do not have sufficient introspective access or just don’t understand how they operate when such task is posed. On the other hand, we humans also don’t understand what goes on in our neurons when we think of something. I think the experiment might be partly fixed or at least improved by using a less human-loaded terminology “think of” and instead ask to constrain a dataset or something.
One could actually check this alternative hypothesis, too. The internal states being proposed work like hidden variables in quantum mechanics. One could simply check if the LLM’s answers over multiple runs violate the Bell inequalities. Or in less fancy terms: check if the distribution over counterfactual branches can be explained by any distribution over numbers that is stable across branches.
I think this is more about current training than whether LLMs can do this. In principle, picking a number and then remembering it is trivial for an LLM (pick the number using weights in an early layer, refer back to the number via attention in a later layer / later position).
In the current training paradigm, I’d expect LLMs to only learn to introspect when it’s useful to solve a task given to them in RL training, so cases where it shows up would be very spiky.
Here is where you should probe it whether it was lying.
I think that this is a really important topic to think about, and we need more people thinking carefully about it, but I have a few points of disagreement:
Consciousness and introspective ability are quite different things. and the presence or absence of introspective ability doesn’t tell us much about whether LLMs are conscious (although, as you point out, the experiment you do here does suggest that their internal states are strange).
It’s important to recall that the output of an LLM isn’t a token; it’s a distribution over token probabilities. The output we see is just the result of following one such path. As a mental model, I like thinking of it as a dialogue tree in a computer game, or a choose-your-own-adventure book, with some paths more likely than others. If we follow a different path, we’ll get different results.
Although the output we see is stochastic, it’s not arbitrary; it’s shaped by the mostly non-stochastic underlying belief distribution. In particular, frontier LLMs do have some introspective ability. There’s a terrific paper from October that shows that experimentally (blog post, paper).
Humans have much less introspective access than we think we do, as demonstrated clearly by experiments on choice blindness. Further, our conscious experience of making a decision (like choosing a number) is something that happens after the decision is made, not during.
Taking those points into account, I think there’s less difference between LLM and human cognition than it seems. There’s a real need for finding and communicating better mental models for what LLM cognition is actually like, and what implications that has. In my opinion this post is an admirable attempt, and captures the key insight that it’s fairly different than most people imagine, but doesn’t quite get to the right perspective.
It’s very easy to have a reasoning model pick a number in CoT and not tell you. Any competent model should then pass your test.
That’s not the point of this experiment. This experiment is aimed at testing different thing, whether when they tell you about their internal states / emotions, they have anything in there. And plausibly they don’t for the most part. Analogous to LLM telling it has a number in mind while it has actually no number in mind, and doing the impromptu role playing to bullshit you.
There are some subtler points here, like LLM is a kind of array of tubes, maybe the tubes individually feel something. Maybe they do coherently feel something together, but not the thing it has to tell you, as it’s appropriate thing to tell in this situation etc. Maybe something else.
I think what you’ve established is that LLMs weren’t very good at this a couple of years ago, before reasoning models. Humans have an inner monolog. So do reasoning models. Pre-reasoning models don’t. If you’re a sufficiently good meditator to be able to do this, try to pick a random number without internally saying it, visualizing it, specifying “the same number as my birthday” or otherwise committing to it in any symbolic form whatsoever.
Specifically, what I think you have demonstrated is that the model you were testing doesn’t have separate almost-orthogonal activation directions for all 100 numbers between 1 and 100 plus the ability to generate a sharp random distribution of one and only one of them. That’s mildly surprising, but I don’t think it proves lack of consciousness. Given that the only stochastic element in an LLM is the token selection, not letting it generate a token may be blocking the random distribution part of the task. Which would mean that it’s actually implemented lazily, when it does generate tokens to answer your questions, and by resetting it to before that you’re redoing the lazy generation. That’s a valid algorithm, just not the one a human with an inner monolog would use. And the test you are using clearly could be passed by a reasoning model putting tokens for the number into its CoT, so by your logic and definition of the word, that model is conscious.
Personally I don’t see arguing about what definition we want to use for the word “conscious” as very productive, nor as relevant to questions about AI ethics or welfare or alignment, but I know some people do.
They do! They have cashes and they can control them and read from them! Isn’t it surprising? You can ask them to think about bread and say duck, and look at their internals when they say duck and see that they thought “bread”. If they were a bit better at it they would have been able to think that number without saying it and then look back at that thought.
The Point of the experiment, that you missed, is that they sometimes tell that they felt something or whatever, and the question is, did they? Would they know?
Not my logic and definitions. I’m arguing about local point, that particular experiment, I have disagreement with you about.
There are two elements to what you were asking the model to do:
1) Generate a random number without using is normakl built in source of stochasiticity, the token selection process. So you’re requiring it to have an internal pseudorandom number generator algorithm. Which it might simply not have. Wht would it need one? It has a random number generator built in, every time it generates a token.
2) Represent and store a number from 1 to 100 in its internal activations without actually outputting the tokens for it. As in, output a string of tokens like:
”OK, I thought of a number.”
and somewhere in the set of activations on those tokens, at a some layer on some token (maybe the token “number”, maybe the full stop of the sentence) encode that specific number in a way that subsequent activation heads can read from. The only issue here being, if that activation is at a late (but not final) layer, only attention heads at that late layer can attend to it, so the processing that the model is later able to do on that number is thus limited.
You’ve demonstrated that it fails. So, is that because:
a) it doesn’t have a suitable pseudorandom number generator, and you forbade it from using its normal solution of using stochastic token generation to genrate randomness, so it can’t pick a number in the first place — the problem is generation, not storage?
b) it doesn’t have a way to represent the numbers 1 to 100 in its activation space, and is thus “not conscious”? (your claim)
c) it does, but only at a late layer that limits the processing that it can subsequently do on that data, since it never emitted it as a token?
If c) were the case, then the model probably could consistently print the number on replay from after the end of that sentence, but could no play complex 20 questions abouit it. Have you tried that?
My suspicion is that the problem is a). But until you can rule out a) and c), you haven’t proven b).
The fact remains that a modern reasoning model with CoT could, and I’m sure would, pass this test: it would emit an semi-random number from 1 to 100 into its CoT and then be able to refer back to it consistently. So it would have a legible inner monolog, and once it had emitted the number into the CoT, the value of the number would remain accessible and fixed under replay. So even if you prove b), and thus that models a couple of years ago were not “conscious” by your chosen definition, more recent models are: we can inspect their Chin of Thought, and it clearly passes your criterion.
Again no, that is missing the point. (Although I agree that this one is not very good experiment).
The reasoning goes like that, they sometimes say e.g. “It felt frightening!”. Did it feel frightening? Or is this what you are supposed to say here, because it’s appropriate thing to say in such situation?
And then its (lack of) skill of introspection becomes relevant.
You might be confusing me with OP, I did not indicate that I have any such chosen definition. Or whatever.
You’re right, I have mistakenly assumed you were the OP replying
Good experiment. Thanks for sharing. This was going around a few years ago but good to see it with newer models. Anyone could just add a piece to turn that functionality on, but I guess so far nobody has, which I guess is a good thing.
where did you see it a few years ago?
Unlike humans, who are famously good at accurately reporting their reasoning processes...
But when you ask me to pick a number, aren’t I basically just generating a number onto a “scratchpad” so it’s in my “context”? How do we know I’m doing anything different from the reasoning model?
it depends how you map the architectures. see comment here: https://www.lesswrong.com/posts/Jqre8WRvmJj5Ehmgv/there-is-no-one-there-a-simple-experiment-to-convince?commentId=f6mGRKzRXfk53K2L4