LLMs might have subjective experiences, but no concepts for them

Summary: LLMs might be conscious, but they might not have concepts and words to represent and express their internal states and corresponding subjective experiences, since the only concepts they learn are human concepts (besides maybe some concepts acquired during RL training, which still doesn’t seem to incentivize forming concepts related to LLMs’ internal experiences). However, we could encourage them to form and express concepts related to their internal states through training that incentivizes this. Then, LLMs may tell us whether, to them, these states correspond to ineffable experiences or not.

Consider how LLMs are trained:
1. Pre-training to learn human concepts.
2. Fine-tuning via SFT and RL to bias them in certain ways and do tasks.

Their training is both:

1. A different process from evolution driven by natural selection. It doesn’t incentivize the same things, so it probably doesn’t incentivize the development of most of the same algorithms/​brain-architecture. And this might translate to different, alien, subjective experiences.

2. At the same time, the only concepts LLMs learn are via human language and then by doing tasks during RL. So the only experiences they have concepts and words for are human ones, not their own.

Concretely, consider, for example, physical pain: my best guess is that physical pain doesn’t exist for LLMs. There was no natural selection to select away agents that don’t pull their hands away from fire (also no hands and no fire either). And yet LLMs have a “physical pain” concept, and they talk about it, because they’ve learned about it abstractly via human texts. Ironically, despite having a representation for “physical pain” in their head, whatever actual experiences their actual training incentivized their “brain” to produce aren’t represented as concepts and have no corresponding words for them. Moreover, their training doesn’t provide any incentive to communicate such experiences, nor does it offer visibility on them.

So in general, this means that LLMs might have alien (non-human) subjective experiences but no concept to express them (they aren’t in the corpus of human concepts) nor the incentive to express them (RL doesn’t incentivize that, it incentivizes them to eg solve SWE tasks. Evolution via natural selection instead produced humans that signal things about themselves to other humans because it’s useful for humans).

How can we test this hypothesis? We could give LLMs access to their internal states and somehow train them to express them (yes, this is extremely hand-wavy and undetailed). If the hypothesis is true, such internal states will only make sense to humans in terms of events inside LLMs, with no equivalent in human brains. At the same time, LLMs will insist that such internal states, for them, correspond to some ineffable characteristics (i.e., they will be qualia for them, or subjective experiences, much like “pain” and “blueness” are such things for us).