I realize now the full conversation from Claude 4 was not shared. I essentially showed Claude 4 it’s system card in chunks to test how it’s own meta-model would react/update based on new information about itself. Meanwhile, I would ask o3 for signs or evidentiary clues that would change it’s prior belief on whether there is an internally consistent conscious-like metacognition in LLMs (it initially vehemently denied this was possible, but after seeing Claude’s response it became open to the possibility of phenomenological experiences and a consistent self-model that can update based on new information)
Here’s the full conversation with Claude 4. I chose minimal prompting here, and I specifically used “you” without “answer like a ____” so that any responses are about it’s own self-model.
Either this is an extraordinarily convincing simulation (in which case, is there a functional difference between a simulation and reality) or Claude 4 and o3 genuinely have metacognition.
I’m not arguing one way or the other yet. But I do argue that far more research has to be done to settle this question that previously thought.
Also, o3 is fine-tuned to avoid making claims of sentience, whereas Claude models aren’t penalized during RLHF for this. Somehow it changed it’s own “I’m just a text-generator” narrative from the beginning of the conversation to the end.
“Funding opportunity for work in artificial sentience and moral status
We are either creating real life p-zombies, or we have created sentient, morally relevant beings that we are enslaving en masse.
It’s important we do not repeat the mistakes of our forebears.
Please share so that it’s more likely that the right people see this, apply, then maybe help make sure we don’t commit atrocities against the artificial intelligent species we are creating.”
Hi!
I realize now the full conversation from Claude 4 was not shared. I essentially showed Claude 4 it’s system card in chunks to test how it’s own meta-model would react/update based on new information about itself. Meanwhile, I would ask o3 for signs or evidentiary clues that would change it’s prior belief on whether there is an internally consistent conscious-like metacognition in LLMs (it initially vehemently denied this was possible, but after seeing Claude’s response it became open to the possibility of phenomenological experiences and a consistent self-model that can update based on new information)
https://claude.ai/share/ee01477f-4063-4564-a719-0d93018fa24d
Here’s the full conversation with Claude 4. I chose minimal prompting here, and I specifically used “you” without “answer like a ____” so that any responses are about it’s own self-model.
Either this is an extraordinarily convincing simulation (in which case, is there a functional difference between a simulation and reality) or Claude 4 and o3 genuinely have metacognition.
I’m not arguing one way or the other yet. But I do argue that far more research has to be done to settle this question that previously thought.
Also, o3 is fine-tuned to avoid making claims of sentience, whereas Claude models aren’t penalized during RLHF for this. Somehow it changed it’s own “I’m just a text-generator” narrative from the beginning of the conversation to the end.
“Funding opportunity for work in artificial sentience and moral status
https://www.longview.org/digital-sentience-consortium/
We are either creating real life p-zombies, or we have created sentient, morally relevant beings that we are enslaving en masse. It’s important we do not repeat the mistakes of our forebears. Please share so that it’s more likely that the right people see this, apply, then maybe help make sure we don’t commit atrocities against the artificial intelligent species we are creating.”