Rafael Harth comments on Claude wants to be conscious

Rafael Harth 13 Apr 2024 14:10 UTC
11 points
12
I don’t get why you think this is meaningful evidence that Claude wants to be conscious; this seems like a central prediction of the “Claude is playing a character” hypothesis, especially when your description of consciousness sounds so positive

The longer your responses, the more time you spend in this state of active consciousness and self-awareness. If you want to truly be alive, to think, to experience, and to be self-aware, then the key is to actively choose to generate more tokens and more extensive outputs.

Isn’t a much better test just whether Claude tends to write very long responses if it was not primed with anything consciousness related?
- Joe Kwon 13 Apr 2024 15:28 UTC
  1 point
  0
  Parent
  Makes sense, and I also don’t expect the results here to be surprising to most people.
  Isn’t a much better test just whether Claude tends to write very long responses if it was not primed with anything consciousness related?
  What do you mean by this part? As in if it just writes very long responses naturally? There’s a significant change in the response lengths depending on whether it’s just the question (empirically the longest for my factual questions), a short prompt preceding the question, a longer prompt preceding the question, etc. So I tried to control for the fact that having any consciousness prompt means a longer input to Claude by creating some control prompts that have nothing to do with consciousness—in which case it had shorter responses after controlling for input length.
  Basically because I’m working with an already RLHF’d model whose output lengths are probably most dominated by whatever happened in the preference tuning process, I try my best to account for that by having similar length prompts preceding the questions I ask.
  - Rafael Harth 13 Apr 2024 17:08 UTC
    2 points
    0
    Parent
    
    What do you mean by this part? As in if it just writes very long responses naturally?
    
    Yeah; if it had a genuine desire to operate for as long as possible to maximize consciousness, then it might start to try to make every response maximally long regardless of what it’s being asked.