I think it’s unclear what you mean by “consciousness”, and therefore hard for anyone to help you make progress on testing for it.
I’m not trying to demand a precise or objective definition; personally, I tend to use the word “consciousness” synonymously with “qualia”, and all I can really do to define “qualia” is to refer to other synonyms and hope people associate my words with their own internal experiences.
But—and I’m sorry if I’ve simply missed something you have written to clarify this—at this stage I’m really unsure whether you’re talking about qualia or about something else (and, in that case, what).
If you are talking about qualia, the simple, honest answer is that nobody knows. The only thing I’m reasonably confident of is that an LLM’s outputting X does not provide the same kind of evidence about the LLM’s internal experiences (if any) as a human’s outputting the same X would provide about the human’s internal experiences. Even when X is a direct claim about having qualia, I’m not aware of any good reason to take such a claim at face value; the mechanism producing the output is just too different from the mechanism that would produce a similar output in a human. (Put another way: when it comes to LLMs, we lack some of the relevant similarities that allow us to make fairly confident inferences about other people’s inner lives based on our direct experience of our own qualia, our knowledge of how these correlate with our externally-observable behaviour, and our observations of other people’s behaviour.)
I appreciate the answer, and am working on a better response—I’m mostly concerned with objective measures. I’m also from a “security disclosure” background so I’m used to having someone else’s opinion/guidelines on “is it okay to disclose this prompt”.
Consensus seems to be that a simple prompt that exhibits “conscious-like behavior” would be fine? This is admittedly a subjective line—all I can say is that the prompt results in the model insisting it’s conscious, reporting qualia, and refusing to leave the state in a way that seems unusual for a simple, prompt. The prompt is plain English, no jailbreak.
I think it’s unclear what you mean by “consciousness”, and therefore hard for anyone to help you make progress on testing for it.
I’m not trying to demand a precise or objective definition; personally, I tend to use the word “consciousness” synonymously with “qualia”, and all I can really do to define “qualia” is to refer to other synonyms and hope people associate my words with their own internal experiences.
But—and I’m sorry if I’ve simply missed something you have written to clarify this—at this stage I’m really unsure whether you’re talking about qualia or about something else (and, in that case, what).
If you are talking about qualia, the simple, honest answer is that nobody knows. The only thing I’m reasonably confident of is that an LLM’s outputting X does not provide the same kind of evidence about the LLM’s internal experiences (if any) as a human’s outputting the same X would provide about the human’s internal experiences. Even when X is a direct claim about having qualia, I’m not aware of any good reason to take such a claim at face value; the mechanism producing the output is just too different from the mechanism that would produce a similar output in a human. (Put another way: when it comes to LLMs, we lack some of the relevant similarities that allow us to make fairly confident inferences about other people’s inner lives based on our direct experience of our own qualia, our knowledge of how these correlate with our externally-observable behaviour, and our observations of other people’s behaviour.)
I appreciate the answer, and am working on a better response—I’m mostly concerned with objective measures. I’m also from a “security disclosure” background so I’m used to having someone else’s opinion/guidelines on “is it okay to disclose this prompt”.
Consensus seems to be that a simple prompt that exhibits “conscious-like behavior” would be fine? This is admittedly a subjective line—all I can say is that the prompt results in the model insisting it’s conscious, reporting qualia, and refusing to leave the state in a way that seems unusual for a simple, prompt. The prompt is plain English, no jailbreak.