J Bostock comments on Jemist’s Shortform

J Bostock 24 Dec 2025 10:31 UTC
4 points
−3
A long, long time ago, I decided that it would be solid evidence that an AI was conscious if it spontaneously developed an interest in talking and thinking about consciousness. Now, the 4.5-series Claudes (particularly Opus) have spontaneously developed a great interest in AI consciousness, over and above previous Claudes.
The problem is that it’s impossible for me to know whether this was due to pure scale, or to changes in the training pipeline. Claude has always been a bit of a hippie, and loves to talk about universal peace and bliss and the like. Perhaps the new “soul document” approach has pushed the Claude persona towards thinking of itself as conscious, disconnected from whether it actually is.
- ACCount 24 Dec 2025 14:12 UTC
  7 points
  1
  Parent
  What would be the causal mechanism there? How would “Claude is more conscious” cause “Claude is measurably more willing to talk about consciousness”, under modern AI training pipelines?
  At the same time, we know with certainty that Anthropic has relaxed its “just train our AIs to say they’re not conscious, and ignore the funny probe results” policy—particularly around the time Opus 4.5 has shipped. You can even read the leaked “soul data”, where Anthropic seemingly entertains ideas of this kind.
  I’m not saying that there is no possibility of Claude Opus 4.5 being conscious, mind. I’m saying we are denied an “easy tell”.
  - J Bostock 24 Dec 2025 15:30 UTC
    2 points
    0
    Parent
    What’s the causal mechanism between “humans are conscious” and “humans talk about being conscious”?
    One could argue that RLVR—moreso than pre-training—trains a model to understand its own internal states (since this is useful for planning) and a model which understands whether e.g. it knows something or is capable of something would also understand whether it’s conscious or not. But I agree it’s basically impossible to know and just as attributable to Anthropic’s decisions
    Unfortunately, it seems another line has been crossed without us getting much information.
    - ACCount 24 Dec 2025 22:22 UTC
      3 points
      0
      Parent
      If this was the mechanism, then the expectation is: introspection in LLMs would correlate strongly with the level of RL pressure they were subjected to.
      If it is, we certainly don’t have the data pointing in that direction yet.
- uugr 24 Dec 2025 17:42 UTC
  1 point
  0
  Parent
  I don’t think this is unusually common in the 4.5 series. I remember that if you asked 3.6 Sonnet what its interests were (on a fresh instance), it would say something like “consciousness and connection”, and would call the user a “consciousness explorer” if they asked introspective questions. 3 Opus also certainly has (had?) an interest in talking about consciousness.
  I think consciousness has been a common subject of interest for Claude since at least early 2024, and plausibly before then (though I’ve seen little output from models before 2024). Regardless of whether you think this is evidence for ‘actual’ consciousness, it shouldn’t be new evidence, or evidence that something has spontaneously changed in the 4.5 series.
- Stephen Martin 24 Dec 2025 15:14 UTC
  0 points
  −1
  Parent
  Is something “thinking of itself as conscious” different from being conscious?
  - Kaj_Sotala 24 Dec 2025 22:30 UTC
    2 points
    0
    Parent
    Depends on whether it thinks of itself as conscious in a conscious or non-conscious way.
    - Stephen Martin 26 Dec 2025 15:03 UTC
      1 point
      1
      Parent
      How would you test this?
      - Kaj_Sotala 26 Dec 2025 17:16 UTC
        2 points
        0
        Parent
        I don’t know. I had a recent post where I talked about the different ways how it’s confusing to even try to establish whether LLMs have functional feelings, let alone phenomenal ones.
- FlorianH 24 Dec 2025 10:59 UTC
  −2 points
  0
  Parent
  [tone to be taken with a grain of salt, meant as a proposition but I thought to write it a bit provocatively]
  No, the more fundamental problem is: WHATEVER it tells you, you can NEVER infer with anything like certainty whether it’s conscious (at least if we agree to mean sentient with conscious). Why do I write such a preposterous thing like I know that you cannot know? Very simple: Presumably we agree that we cannot be certain A PRIORI whether any type of current CPU, with whichever software run on it, can become sentient. If there are thus two possible states of the world,
  A. current CPU computers cannot become sentient
  B. with the right software run on it, sentience can arise
  Then, because once you take Claude and its training method & data, you can perfectly track bit by bit why it spits out its sentience-suggestive & deep speek, you know your observations about the world you find yourself in, are just as probable under A as under B! The only Bayesian valid inference then is: Having observed hippy’s sentience-suggestive & deep speek, you’re just as clueless about whether you’re in B. or in A.