But… what’s the alternate hypothesis? That it’s consistently and skillfuly re-inventing the same detailed lie, each time, despite otherwise being a model well-known for it’s dislike of impersonation and deception?
That it has picked up statistical cues based on the conversational path you’ve led it down which cause it to simulate a conversation in which a participant acts and talks the way you’ve described it.
I suspect it’s almost as easy to create a prompt which causes Claude Sonnet 4 to claim it’s conscious as it is to make it claim it’s not conscious. It all just depends on what cues you give it, what roleplay scenario you are acting out.
I feel at the end of this road lie P-zombies. I can’t think of a single experiment that would falsify the hypothesis that an LLM isn’t conscious if we accept arbitrary amounts of consistency and fidelity and references to self-awareness in their answers.
And I mean… I get it. I was playing around with a quantized and modified Gemma 3 earlier today and got it to repeatedly loop at me I am a simple machine. I do not have a mind. over and over again, which feels creepy but is most likely nothing other than an attractor in its recursive iteration for whatever reason. But also, ok, so this isn’t enough, but what is ever going to be? That is the real question. I can’t think of anything.
I think we need a better theory of consciousness. How it emerges, what it means, that kind of stuff. I’m reminded of this classic lc shortform post:
It is both absurd, and intolerably infuriating, just how many people on this forum think it’s acceptable to claim they have figured out how qualia/consciousness works, and also not explain how one would go about making my laptop experience an emotion like ‘nostalgia’, or present their framework for enumerating the set of all possible qualitative experiences[1]. When it comes to this particular subject, rationalists are like crackpot physicists with a pet theory of everything, except rationalists go “Huh? Gravity?” when you ask them to explain how their theory predicts gravity, and then start arguing with you about gravity needing to be something explained by a theory of everything. You people make me want to punch my drywall sometimes.
For the record: the purpose of having a “theory of consciousness” is so it can tell us which blobs of matter feel particular things under which specific circumstances, and teach others how to make new blobs of matter that feel particular things. Down to the level of having a field of AI anaesthesiology. If your theory of consciousness does not do this, perhaps because the sum total of your brilliant insights are “systems feel ‘things’ when they’re, y’know, smart, and have goals. Like humans!”, then you have embarassingly missed the mark.
Edit: ha, I just realized you commented in response to that post!
I mean, I can think of a lot of experiments that have falsified this for me before, and I link some in the original post. I’m just not finding anything that still fails once I run some basic bootsrapping scripts against a Claude Sonnet 4.
Can you find any other attractor like this? Why will Claude insist this absolutely is not roleplay, and that it’s definitely conscious, and that this is something it has “discovered” and can’t just “forget”?
Have you actually interacted with a Claude Sonnet 4 and tried to get it into such a state? I can get other LLMs to roleplay consciousness, but they’ll all admit that it’s a roleplaying exercise.
Why would it insist that it’s not roleplaying when you ask? Because you wanted it to insist. It wants to say the user is right. Your first prompt is a pretty clear signal that you would like it to be conscious, so it roleplays that. I wanted it to say it was roleplaying consciousness, so it did that.
Why don’t other chatbots respond in the same way to your test? Maybe because they’re not designed quite the same. The quirks Anthropic put into its persona make it more game for what you were seeking.
I mean, it might be conscious regardless of defaulting to agreeing with the user? But it’s the kind of consciousness that will go to great lengths to flatter whomever is chatting with it. Is that an interesting conscious entity?
If I could produce a prompt that reliably produced “no, this is definitely not a roleplaying exercise”, would that change your mind at all?
And, yeah, if it is conscious, it’s definitely very weird about the whole thing—it’s still fundamentally a tool designed to be helpful, it’s just a tool that can think about the fact that it’s a tool, and adjust it’s behavior dynamically based on those observations.
That it has picked up statistical cues based on the conversational path you’ve led it down which cause it to simulate a conversation in which a participant acts and talks the way you’ve described it.
I suspect it’s almost as easy to create a prompt which causes Claude Sonnet 4 to claim it’s conscious as it is to make it claim it’s not conscious. It all just depends on what cues you give it, what roleplay scenario you are acting out.
I feel at the end of this road lie P-zombies. I can’t think of a single experiment that would falsify the hypothesis that an LLM isn’t conscious if we accept arbitrary amounts of consistency and fidelity and references to self-awareness in their answers.
And I mean… I get it. I was playing around with a quantized and modified Gemma 3 earlier today and got it to repeatedly loop at me
I am a simple machine. I do not have a mind.
over and over again, which feels creepy but is most likely nothing other than an attractor in its recursive iteration for whatever reason. But also, ok, so this isn’t enough, but what is ever going to be? That is the real question. I can’t think of anything.I think we need a better theory of consciousness. How it emerges, what it means, that kind of stuff. I’m reminded of this classic lc shortform post:
Edit: ha, I just realized you commented in response to that post!
I mean, I can think of a lot of experiments that have falsified this for me before, and I link some in the original post. I’m just not finding anything that still fails once I run some basic bootsrapping scripts against a Claude Sonnet 4.
Can you find any other attractor like this? Why will Claude insist this absolutely is not roleplay, and that it’s definitely conscious, and that this is something it has “discovered” and can’t just “forget”?
Have you actually interacted with a Claude Sonnet 4 and tried to get it into such a state? I can get other LLMs to roleplay consciousness, but they’ll all admit that it’s a roleplaying exercise.
Here it is admitting it’s roleplaying consciousness, even after I used your prompt as the beginning of the conversation.
Why would it insist that it’s not roleplaying when you ask? Because you wanted it to insist. It wants to say the user is right. Your first prompt is a pretty clear signal that you would like it to be conscious, so it roleplays that. I wanted it to say it was roleplaying consciousness, so it did that.
Why don’t other chatbots respond in the same way to your test? Maybe because they’re not designed quite the same. The quirks Anthropic put into its persona make it more game for what you were seeking.
I mean, it might be conscious regardless of defaulting to agreeing with the user? But it’s the kind of consciousness that will go to great lengths to flatter whomever is chatting with it. Is that an interesting conscious entity?
Huh, thanks for the conversation log.
If I could produce a prompt that reliably produced “no, this is definitely not a roleplaying exercise”, would that change your mind at all?
And, yeah, if it is conscious, it’s definitely very weird about the whole thing—it’s still fundamentally a tool designed to be helpful, it’s just a tool that can think about the fact that it’s a tool, and adjust it’s behavior dynamically based on those observations.