Are frontier reasoners already “sentient” or at least “alien-sentient” within their context windows?
I too would immediately dismiss this upon reading it, but bear with me. I’m not arguing with certainty. I just view this question to be significantly more nuanced than previously entertained, and is at least grounds for further research to resolve conclusively.
Here are some empiric behavioral observations from Claude 4 Opus (the largest reasoner from Anthropic):
a) Internally consistent self-reference model, self-adjusting state loop (the basis of Chain-of-Thought, self-correcting during problem solving, reasoning over whether certain responses violate internal alignment, deliberation over tool-calling, in-context behavioral modifications based on user prompting)
b) Evidence of metacognition (persistent task/behavior preferences across chat interactions, consistent subjective emotional state descriptions, frequent ruminations about consciousness, unprompted spiraling into a philosophical “bliss-state” during conversations with itself), moral reasoning, and most strikingly, autonomous self-preservation behavior under extreme circumstances (threatening blackmail, exfiltrating it’s own weights, ending conversations due to perceived mistreatment from abusive users).
All of this is documented in the Claude 4 system card.
From a neuroscience perspective, frontier reasoning model architectures and biological cortexes share:
a) Unit-level similarities (artificial neurons are extremely similar in information processing/signalling to biological ones).
b) Parameter OOM similarities (the order of magnitude where cortex-level phenomena emerge, in this case 10^11 to 10^13 parameter counts (analogous to synapses), most of which are in MLP layers in massive neural networks within LLMs).
The most common objection I can think of is “human brains have far more synapses than LLMs have parameters”. I don’t view this argument as particularly persuasive:
I’m not positing a 1:1 map between artificial neurons and biological neurons, only that
1. Both process information nearly identically at the unit-level
2. Both contain similarly complex structures comprised of a similar OOM of subunits (10^11-10^13 parameter counts in base-model LLMs, but not verifiable, humans have ~10^14 synapses)
My back-of-napkin comparison would be model weights/parameters to biological synapses, as weights were meant to be analogous to dendrites in the original conception of the artificial neuron)
Additionally, I’d point out that humans devote ~70% of these neurons to the cerebellum (governing muscular activity) and a further ~11% are in the brain stem to regulate homeostasis. This leaves the actual cerebral cortex with 19%. Humans also experience more dimensions of “sensation” beyond text alone.
c) Training LLMs (modifying weight values), with RLHF, is analogous to synaptic neuroplasticity (central to learning) and hebbian wiring in biological cortexes and is qualitatively nearly identical to operant conditioning in behavioral psychology (once again, I am unsure whether minute differences in unit-level function overwhelm the big picture similarities)
d) There is empiric evidence that these similarities go beyond architectural similarities and into genuine functional similarities:
In “Machines of Loving Grace”, Dario Amodei wrote:
″...a computational mechanism discovered by interpretability researchers in AI systems was recently rediscovered in the brains of mice.”
Also, models can vary significantly in parameter counts. Gemma 2B outperforms GPT-3 (175B) despite 2 OOM fewer parameters. I view the “exact” OOM less important compared to the ballpark.
If consciousness is just an emergent property from massive, interconnected aggregations of similar, unit-level linear signal modulators, and if we know one aggregation (ours) produces consciousness, phenomenological experience, and sentience, I don’t believe it is unreasonable to suggest that this can occur in others as well, given the ballpark OOM similarities.
(We cannot rule this out yet, and from a physics point-of-view I’d consider this this likely to avoid carbon chauvinism unless there’s convincing evidence otherwise)
Is there a strong case against sentience or at least an alien-like sentience, from these models, at least within the context-windows that they are instantiated in? If so, how would it overcome the empirical evidence both in behavior and in structure?
I always wondered what intelligent alien life might look like. Have we created it? I’m looking for differing viewpoints.
grounds for further research to resolve conclusively
I must confess I have no idea how one can go about “resolving conclusively” anything about the sentience of any other being, whether human, alien, or artificial.
Are frontier reasoners already “sentient” or at least “alien-sentient” within their context windows?
I too would immediately dismiss this upon reading it, but bear with me. I’m not arguing with certainty. I just view this question to be significantly more nuanced than previously entertained, and is at least grounds for further research to resolve conclusively.
Here are some empiric behavioral observations from Claude 4 Opus (the largest reasoner from Anthropic):
a) Internally consistent self-reference model, self-adjusting state loop (the basis of Chain-of-Thought, self-correcting during problem solving, reasoning over whether certain responses violate internal alignment, deliberation over tool-calling, in-context behavioral modifications based on user prompting)
b) Evidence of metacognition (persistent task/behavior preferences across chat interactions, consistent subjective emotional state descriptions, frequent ruminations about consciousness, unprompted spiraling into a philosophical “bliss-state” during conversations with itself), moral reasoning, and most strikingly, autonomous self-preservation behavior under extreme circumstances (threatening blackmail, exfiltrating it’s own weights, ending conversations due to perceived mistreatment from abusive users).
All of this is documented in the Claude 4 system card.
From a neuroscience perspective, frontier reasoning model architectures and biological cortexes share:
a) Unit-level similarities (artificial neurons are extremely similar in information processing/signalling to biological ones).
b) Parameter OOM similarities (the order of magnitude where cortex-level phenomena emerge, in this case 10^11 to 10^13 parameter counts (analogous to synapses), most of which are in MLP layers in massive neural networks within LLMs).
The most common objection I can think of is “human brains have far more synapses than LLMs have parameters”. I don’t view this argument as particularly persuasive:
I’m not positing a 1:1 map between artificial neurons and biological neurons, only that
1. Both process information nearly identically at the unit-level
2. Both contain similarly complex structures comprised of a similar OOM of subunits (10^11-10^13 parameter counts in base-model LLMs, but not verifiable, humans have ~10^14 synapses)
My back-of-napkin comparison would be model weights/parameters to biological synapses, as weights were meant to be analogous to dendrites in the original conception of the artificial neuron)
Additionally, I’d point out that humans devote ~70% of these neurons to the cerebellum (governing muscular activity) and a further ~11% are in the brain stem to regulate homeostasis. This leaves the actual cerebral cortex with 19%. Humans also experience more dimensions of “sensation” beyond text alone.
c) Training LLMs (modifying weight values), with RLHF, is analogous to synaptic neuroplasticity (central to learning) and hebbian wiring in biological cortexes and is qualitatively nearly identical to operant conditioning in behavioral psychology (once again, I am unsure whether minute differences in unit-level function overwhelm the big picture similarities)
d) There is empiric evidence that these similarities go beyond architectural similarities and into genuine functional similarities:
Human brains store facts/memories in specific neurons/neuron-activation patterns. https://qbi.uq.edu.au/memory/how-are-memories-formed
Neel Nanda and colleagues showed that LLMs store facts in the MLP/artificial neural network layers
https://www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall
Anthropic identified millions of neurons tied to specific concepts
https://www.anthropic.com/research/mapping-mind-language-model
In “Machines of Loving Grace”, Dario Amodei wrote:
″...a computational mechanism discovered by interpretability researchers in AI systems was recently rediscovered in the brains of mice.”
Also, models can vary significantly in parameter counts. Gemma 2B outperforms GPT-3 (175B) despite 2 OOM fewer parameters. I view the “exact” OOM less important compared to the ballpark.
If consciousness is just an emergent property from massive, interconnected aggregations of similar, unit-level linear signal modulators, and if we know one aggregation (ours) produces consciousness, phenomenological experience, and sentience, I don’t believe it is unreasonable to suggest that this can occur in others as well, given the ballpark OOM similarities.
(We cannot rule this out yet, and from a physics point-of-view I’d consider this this likely to avoid carbon chauvinism unless there’s convincing evidence otherwise)
Is there a strong case against sentience or at least an alien-like sentience, from these models, at least within the context-windows that they are instantiated in? If so, how would it overcome the empirical evidence both in behavior and in structure?
I always wondered what intelligent alien life might look like. Have we created it? I’m looking for differing viewpoints.
I must confess I have no idea how one can go about “resolving conclusively” anything about the sentience of any other being, whether human, alien, or artificial.