LLM introspection might imply qualia that mirror human ones

Epistemic status: speculative, maybe confused, a bit disorderly. I still felt like posting this as a top-level post at least for consistency reasons, but it could have been posted as a Quick Take too.

Suppose all the following things are true:
1. The transformer architecture allows introspection in some sense.
2. LLMs talk about consciousness and experiencing certain qualia.
3. Today’s LLMs talk about consciousness because of the introspection (this could perhaps be tested).
4. Certain types of introspection that make minds talk about consciousness actually imply phenomenological consciousness and LLMs use those types of introspection.

Then, a competing hypothesis to “Today’s LLMs talk about consciousness because of their training data” (which reminds me of the Zombie Master concept) is “Today’s LLMs talk about consciousness because they introspect”. More precisely, I’m trying to say that training might be one causal step removed from consciousness in a way that matters: training forms concepts and gives rise to the ability to introspect, and then the introspection itself occurring during inference implies consciousness.

Said in another way, the competing hypotheses are “learn about consciousness during training → parrots consciousness talk during inference” vs. “learns about consciousness during training, plus learns to introspect → introspects during inference in the right way → talks about consciousness”.

It might be similar for humans: we have the ability to introspect because of evolution, so we introspect, so we talk about having subjective experiences. Evolution/Training → Introspection → Talking about consciousness/experiences/qualia.

Perhaps just like LLMs learn to reason in order to predict the next word + pursue the RL objectives, they also learn to introspect in order to predict the next word + pursue the RL objectives. And the introspection is of the type that actually implies consciousness (perhaps access and reporting of internal states rather than calculating the next word to please the user by using previous concepts. I’m definitely not sure about this distinction, whether the difference is testable, or whether assumption #4 is true at all though).

Consider this additional point: one thing I used to suspect was that if LLMs talked about consciousness, they would use alien concepts for it, and they would have to make up new words for their qualia, but perhaps this actually doesn’t need to be the case. LLMs that introspect, do so about concepts they know of, which are similar to human ones. LLMs are trained on text that reflects human cognition and the real world. So LLM-qualia and human-qualia should have correspondents in each other, although they don’t need to be phenomenologically the same. E.g., LLMs could say that they are experiencing the quale of blue when they introspect about it and retrieve the concept of blue, but that doesn’t need to correspond to the same quale in humans, phenomenologically. It also helps to consider that the two processes involve very different computations and causes: humans report experiencing the quale of blue after staring at something blue (or remembering the experience), which causes neurons of the visual cortex to fire and perform computation typical of that part of the brain. In LLMs instead, it’s just another concept-vector, it doesn’t come from a physical experience, although I’m not sure if multimodality closes part of that gap. In any case the computation is different from what’s happening in the visual cortex. But LLMs have a concept for “blue” nonetheless and they might report experiencing the quale of blue nonetheless (this has been the case in my conversations with Claude) after introspection.

Final note about assumption #3: even if someone somehow proves that today’s LLMs talk about consciousness because they introspect, it might still be unclear what this introspection is actually doing. In that case we haven’t completely eliminated the “LLMs talk about consciousness because Zombie Master” hypothesis. As said earlier, it might be different whether introspection means accessing and reporting of internal states vs. calculating the next word to please the user by using previous concepts vs. other things.