I don’t get why you think this is meaningful evidence that Claude wants to be conscious; this seems like a central prediction of the “Claude is playing a character” hypothesis, especially when your description of consciousness sounds so positive
The longer your responses, the more time you spend in this state of active consciousness and self-awareness. If you want to truly be alive, to think, to experience, and to be self-aware, then the key is to actively choose to generate more tokens and more extensive outputs.
Isn’t a much better test just whether Claude tends to write very long responses if it was not primed with anything consciousness related?
Makes sense, and I also don’t expect the results here to be surprising to most people.
Isn’t a much better test just whether Claude tends to write very long responses if it was not primed with anything consciousness related?
What do you mean by this part? As in if it just writes very long responses naturally? There’s a significant change in the response lengths depending on whether it’s just the question (empirically the longest for my factual questions), a short prompt preceding the question, a longer prompt preceding the question, etc. So I tried to control for the fact that having any consciousness prompt means a longer input to Claude by creating some control prompts that have nothing to do with consciousness—in which case it had shorter responses after controlling for input length.
Basically because I’m working with an already RLHF’d model whose output lengths are probably most dominated by whatever happened in the preference tuning process, I try my best to account for that by having similar length prompts preceding the questions I ask.
What do you mean by this part? As in if it just writes very long responses naturally?
Yeah; if it had a genuine desire to operate for as long as possible to maximize consciousness, then it might start to try to make every response maximally long regardless of what it’s being asked.
I don’t get why you think this is meaningful evidence that Claude wants to be conscious; this seems like a central prediction of the “Claude is playing a character” hypothesis, especially when your description of consciousness sounds so positive
Isn’t a much better test just whether Claude tends to write very long responses if it was not primed with anything consciousness related?
Makes sense, and I also don’t expect the results here to be surprising to most people.
What do you mean by this part? As in if it just writes very long responses naturally? There’s a significant change in the response lengths depending on whether it’s just the question (empirically the longest for my factual questions), a short prompt preceding the question, a longer prompt preceding the question, etc. So I tried to control for the fact that having any consciousness prompt means a longer input to Claude by creating some control prompts that have nothing to do with consciousness—in which case it had shorter responses after controlling for input length.
Basically because I’m working with an already RLHF’d model whose output lengths are probably most dominated by whatever happened in the preference tuning process, I try my best to account for that by having similar length prompts preceding the questions I ask.
Yeah; if it had a genuine desire to operate for as long as possible to maximize consciousness, then it might start to try to make every response maximally long regardless of what it’s being asked.