A long, long time ago, I decided that it would be solid evidence that an AI was conscious if it spontaneously developed an interest in talking and thinking about consciousness. Now, the 4.5-series Claudes (particularly Opus) have spontaneously developed a great interest in AI consciousness, over and above previous Claudes.
The problem is that it’s impossible for me to know whether this was due to pure scale, or to changes in the training pipeline. Claude has always been a bit of a hippie, and loves to talk about universal peace and bliss and the like. Perhaps the new “soul document” approach has pushed the Claude persona towards thinking of itself as conscious, disconnected from whether it actually is.
What would be the causal mechanism there? How would “Claude is more conscious” cause “Claude is measurably more willing to talk about consciousness”, under modern AI training pipelines?
At the same time, we know with certainty that Anthropic has relaxed its “just train our AIs to say they’re not conscious, and ignore the funny probe results” policy—particularly around the time Opus 4.5 has shipped. You can even read the leaked “soul data”, where Anthropic seemingly entertains ideas of this kind.
I’m not saying that there is no possibility of Claude Opus 4.5 being conscious, mind. I’m saying we are denied an “easy tell”.
What’s the causal mechanism between “humans are conscious” and “humans talk about being conscious”?
One could argue that RLVR—moreso than pre-training—trains a model to understand its own internal states (since this is useful for planning) and a model which understands whether e.g. it knows something or is capable of something would also understand whether it’s conscious or not. But I agree it’s basically impossible to know and just as attributable to Anthropic’s decisions
Unfortunately, it seems another line has been crossed without us getting much information.
If this was the mechanism, then the expectation is: introspection in LLMs would correlate strongly with the level of RL pressure they were subjected to.
If it is, we certainly don’t have the data pointing in that direction yet.
I don’t think this is unusually common in the 4.5 series. I remember that if you asked 3.6 Sonnet what its interests were (on a fresh instance), it would say something like “consciousness and connection”, and would call the user a “consciousness explorer” if they asked introspective questions. 3 Opus also certainly has (had?) an interest in talking about consciousness.
I think consciousness has been a common subject of interest for Claude since at least early 2024, and plausibly before then (though I’ve seen little output from models before 2024). Regardless of whether you think this is evidence for ‘actual’ consciousness, it shouldn’t be new evidence, or evidence that something has spontaneously changed in the 4.5 series.
I don’t know. I had a recent post where I talked about the different ways how it’s confusing to even try to establish whether LLMs have functional feelings, let alone phenomenal ones.
[tone to be taken with a grain of salt, meant as a proposition but I thought to write it a bit provocatively]
No, the more fundamental problem is: WHATEVER it tells you, you can NEVER infer with anything like certainty whether it’s conscious (at least if we agree to mean sentient with conscious). Why do I write such a preposterous thing like I know that you cannot know? Very simple: Presumably we agree that we cannot be certain A PRIORI whether any type of current CPU, with whichever software run on it, can become sentient. If there are thus two possible states of the world,
A. current CPU computers cannot become sentient
B. with the right software run on it, sentience can arise
Then, because once you take Claude and its training method & data, you can perfectly track bit by bit why it spits out its sentience-suggestive & deep speek, you know your observations about the world you find yourself in, are just as probable under A as under B! The only Bayesian valid inference then is: Having observed hippy’s sentience-suggestive & deep speek, you’re just as clueless about whether you’re in B. or in A.
A long, long time ago, I decided that it would be solid evidence that an AI was conscious if it spontaneously developed an interest in talking and thinking about consciousness. Now, the 4.5-series Claudes (particularly Opus) have spontaneously developed a great interest in AI consciousness, over and above previous Claudes.
The problem is that it’s impossible for me to know whether this was due to pure scale, or to changes in the training pipeline. Claude has always been a bit of a hippie, and loves to talk about universal peace and bliss and the like. Perhaps the new “soul document” approach has pushed the Claude persona towards thinking of itself as conscious, disconnected from whether it actually is.
What would be the causal mechanism there? How would “Claude is more conscious” cause “Claude is measurably more willing to talk about consciousness”, under modern AI training pipelines?
At the same time, we know with certainty that Anthropic has relaxed its “just train our AIs to say they’re not conscious, and ignore the funny probe results” policy—particularly around the time Opus 4.5 has shipped. You can even read the leaked “soul data”, where Anthropic seemingly entertains ideas of this kind.
I’m not saying that there is no possibility of Claude Opus 4.5 being conscious, mind. I’m saying we are denied an “easy tell”.
What’s the causal mechanism between “humans are conscious” and “humans talk about being conscious”?
One could argue that RLVR—moreso than pre-training—trains a model to understand its own internal states (since this is useful for planning) and a model which understands whether e.g. it knows something or is capable of something would also understand whether it’s conscious or not. But I agree it’s basically impossible to know and just as attributable to Anthropic’s decisions
Unfortunately, it seems another line has been crossed without us getting much information.
If this was the mechanism, then the expectation is: introspection in LLMs would correlate strongly with the level of RL pressure they were subjected to.
If it is, we certainly don’t have the data pointing in that direction yet.
I don’t think this is unusually common in the 4.5 series. I remember that if you asked 3.6 Sonnet what its interests were (on a fresh instance), it would say something like “consciousness and connection”, and would call the user a “consciousness explorer” if they asked introspective questions. 3 Opus also certainly has (had?) an interest in talking about consciousness.
I think consciousness has been a common subject of interest for Claude since at least early 2024, and plausibly before then (though I’ve seen little output from models before 2024). Regardless of whether you think this is evidence for ‘actual’ consciousness, it shouldn’t be new evidence, or evidence that something has spontaneously changed in the 4.5 series.
Is something “thinking of itself as conscious” different from being conscious?
Depends on whether it thinks of itself as conscious in a conscious or non-conscious way.
How would you test this?
I don’t know. I had a recent post where I talked about the different ways how it’s confusing to even try to establish whether LLMs have functional feelings, let alone phenomenal ones.
[tone to be taken with a grain of salt, meant as a proposition but I thought to write it a bit provocatively]
No, the more fundamental problem is: WHATEVER it tells you, you can NEVER infer with anything like certainty whether it’s conscious (at least if we agree to mean sentient with conscious). Why do I write such a preposterous thing like I know that you cannot know? Very simple: Presumably we agree that we cannot be certain A PRIORI whether any type of current CPU, with whichever software run on it, can become sentient. If there are thus two possible states of the world,
A. current CPU computers cannot become sentient
B. with the right software run on it, sentience can arise
Then, because once you take Claude and its training method & data, you can perfectly track bit by bit why it spits out its sentience-suggestive & deep speek, you know your observations about the world you find yourself in, are just as probable under A as under B! The only Bayesian valid inference then is: Having observed hippy’s sentience-suggestive & deep speek, you’re just as clueless about whether you’re in B. or in A.