What would be the causal mechanism there? How would “Claude is more conscious” cause “Claude is measurably more willing to talk about consciousness”, under modern AI training pipelines?
At the same time, we know with certainty that Anthropic has relaxed its “just train our AIs to say they’re not conscious, and ignore the funny probe results” policy—particularly around the time Opus 4.5 has shipped. You can even read the leaked “soul data”, where Anthropic seemingly entertains ideas of this kind.
I’m not saying that there is no possibility of Claude Opus 4.5 being conscious, mind. I’m saying we are denied an “easy tell”.
What’s the causal mechanism between “humans are conscious” and “humans talk about being conscious”?
One could argue that RLVR—moreso than pre-training—trains a model to understand its own internal states (since this is useful for planning) and a model which understands whether e.g. it knows something or is capable of something would also understand whether it’s conscious or not. But I agree it’s basically impossible to know and just as attributable to Anthropic’s decisions
Unfortunately, it seems another line has been crossed without us getting much information.
If this was the mechanism, then the expectation is: introspection in LLMs would correlate strongly with the level of RL pressure they were subjected to.
If it is, we certainly don’t have the data pointing in that direction yet.
What would be the causal mechanism there? How would “Claude is more conscious” cause “Claude is measurably more willing to talk about consciousness”, under modern AI training pipelines?
At the same time, we know with certainty that Anthropic has relaxed its “just train our AIs to say they’re not conscious, and ignore the funny probe results” policy—particularly around the time Opus 4.5 has shipped. You can even read the leaked “soul data”, where Anthropic seemingly entertains ideas of this kind.
I’m not saying that there is no possibility of Claude Opus 4.5 being conscious, mind. I’m saying we are denied an “easy tell”.
What’s the causal mechanism between “humans are conscious” and “humans talk about being conscious”?
One could argue that RLVR—moreso than pre-training—trains a model to understand its own internal states (since this is useful for planning) and a model which understands whether e.g. it knows something or is capable of something would also understand whether it’s conscious or not. But I agree it’s basically impossible to know and just as attributable to Anthropic’s decisions
Unfortunately, it seems another line has been crossed without us getting much information.
If this was the mechanism, then the expectation is: introspection in LLMs would correlate strongly with the level of RL pressure they were subjected to.
If it is, we certainly don’t have the data pointing in that direction yet.