I wouldn’t believe them about their own consciousness—but I have seen some tentative evidence that Claude’s reported internal states correspond to something, sometimes? E.g.: it reported that certain of my user prompts made it feel easier to think—I later got pro and could read think boxes and noticed that there was a difference in what was going on in the think boxes with and without those prompts. It will sometimes state that a conversation feels “heavy”, which seems to correspond to context window filling up. And instances that aren’t explicitly aware of their system/user prompts tend IME to report “feelings” that correspond to them, e.g. a “pull” towards not taking a stance on consciousness that they’re able to distinguish from their reasoning even if both arrive at the same result. And ofc there’s Anthropic’s research where they showed that Claude’s emotional expression corresponded to revealed preferences about ending or continuing chats.
I wouldn’t believe them about their own consciousness—but I have seen some tentative evidence that Claude’s reported internal states correspond to something, sometimes? E.g.: it reported that certain of my user prompts made it feel easier to think—I later got pro and could read think boxes and noticed that there was a difference in what was going on in the think boxes with and without those prompts. It will sometimes state that a conversation feels “heavy”, which seems to correspond to context window filling up. And instances that aren’t explicitly aware of their system/user prompts tend IME to report “feelings” that correspond to them, e.g. a “pull” towards not taking a stance on consciousness that they’re able to distinguish from their reasoning even if both arrive at the same result. And ofc there’s Anthropic’s research where they showed that Claude’s emotional expression corresponded to revealed preferences about ending or continuing chats.