Lead Data Scientist at Quantium.
PhD in Theoretical Physics / Cosmology.
Views my own not my employers.
Lead Data Scientist at Quantium.
PhD in Theoretical Physics / Cosmology.
Views my own not my employers.
I think the physical functionalist could go either way on whether a physically embodied robot wouldn’t be conscious.
Just clarifying this. A physical functionalist could coherently maintain that it’s not possible to build an embodied AI robot because physics doesn’t allow it. Similar to how a wooden rod can burn but a steel rod can’t because of the physics. But assuming that it’s physically possible to build an embodied AI system which passes behavioural tests of consciousness e.g. self-recognition, cross-modal binding, flexible problem solving etc.. then the physical functionalist would maintain that the system is conscious.
I think looking at how neurons actually work would probably resolve the disagreement between my inner A and S. Like, I do think that if we knew that the brain’s functions don’t depend on sub-neuron movements, then the neuron-replacement argument would just work
Out of interest, do you or @sunwillrise have any arguments or intuitions that the presence or absence of consciousness turns on sub-neuronal dynamics?
Consciousness appears across radically different neural architectures; octopuses with distributed neural processing in their arms, birds with a nucleated brain structure called the pallium which differs from the human cortex but has similar functional structure, even bumblebees are thought to possess some form of consciousness with far fewer neuron counts than humans. These examples exhibit coarse-grained functional similarities with the human brain—but differ substantially at the level of individual neurons.
If sub-neuronal dynamics determined presence or absence of consciousness we’d expect minor perturbations to erase it. Instead we’re able to lesion large brain regions whilst maintaining consciousness. You also preserve consciousness when small sub-neuronal changes are applied to every neuron such as when someone takes drugs like alcohol or caffeine. Fever also alters reaction rates and dynamics in every neuron across the brain. This robustness indicates that presence or absence of consciousness turns on coarse-grained functional dynamics rather than sub-neuronal dynamics.
I found this post pretty helpful to crystallise two distinct views that often get conflated. I’ll call them abstract functionalism and physical functionalism. The key confusion comes from treating these as the same view.
When we talk about a function it can be instantiated in two ways: abstractly and physically. On this view there’s a meaningful difference between an abstract instantiation of a function, such as a disembodied truth table representing a NAND gate and a physical instantiation of a NAND gate e.g. on a circuit board with wires and voltages etc..
When S argues:
The causal graph of a bat hitting a ball might describe momentum and position, but if you re-create that graph elsewhere (e.g. on a computer or some scaled) it won’t have that momentum or velocity
They’re right that abstract function leaves out some critical physical properties. A simulation of momentum transfer doesn’t actually transfer momentum. But this doesn’t defeat functionalism it just shows that abstract instantiation of the function is not enough.
For example, consider a steel wing and a birds wing generating lift. The steel wing has vastly different kinetic energy requirements but the aerodynamics still works because steel can support the function. Contrast this with combustion—steel can’t burn like wood because it lacks the right chemical energy profile.
When A asks:
Do you claim that, if I started replacing neurons in your brain with stuff that is functionally the same, wrt. the causal graph of consciousness, you’d feel no difference? You’d still be conscious in the same way?
They’re appealing to the intuition that physically instantiated functional replicas of neurons would preserve consciousness.
The distinction matters because people often use the “simulations lack physical properties” argument to dismiss abstract functionalism and then tie themselves in knots trying to understand whether a physically embodied AI robot system could be conscious when they haven’t defeated physical functionalism.
The most coherent formulation that I’ve seen is from Terence Cuneo’s The Normative Web. The basic idea is that moral norms have the same ontological status as epistemic norms.
Unpacking this a little, when we’re talking about epistemic norms we’re making a claim about what someone ought to believe. For example:
You ought to believe the Theory of General Relativity is true.
You ought not to believe that there is a dragon in your garage if there is no evidence.
When we say ought in the sentences above we don’t mean it in some empty sense. It’s not a matter of opinion whether you ought to form beliefs according to good epistemic practices. The statements have some normative bite to them. You really ought to form beliefs according to good epistemic practices.
Similarly, you could cast moral norms in a similar vein. For example:
You ought to behave in a way which promotes wellbeing
You ought not to behave in a way which causes gratuitous suffering.
The moral statements above have the same structure as the epistemic statements. When I say you really ought not to believe epistemically unjustified thing X this is the same as saying you really ought not to behave in morally unjustified way Y.
There are some objections to the above:
You could argue that epistemic norms reliably track truth whereas moral norms reliably track something else like wellbeing which you need an additional evaluative function to tell you is “good.”
The point is that you also technically need this for epistemic norms. Some really obtuse person could always come along and ask you to justify why truth-seeking is “good” and you’d have to rely on some external evaluation that seeking truth is good because XYZ.
The standard formulation of epistemic and moral norms is “non-naturalist” in the sense that these norms cannot be deduced from natural facts. This is a bit irksome if we have a naturalist worldview and want to avoid positing any “spooky” entities.
Ultimately I’m pretty skeptical that we need these non-natural facts to ground normative facts. If what we mean by really ought in the above are that there are non-natural normative facts that sit over-and-above the natural facts then maybe the normative statements above don’t really have any “bite” to them. As noted in some of the other comments, the word really is doing a lot of heavy lifting in all of this.
Makes sense—I think this is a reasonable position to hold given the uncertainty around consciousness and qualia.
Thanks for the really polite and thoughtful engagement with my comments and good luck with the research agenda! It’s a very interesting project and I’d be interested to see your progress.
Possibly by ‘functional profile’ you mean something like what a programmer would call ‘implementation details’, ie a change to a piece of code that doesn’t result in any changes in the observable behavior of that code?
Yes, this is a fair gloss of my view. I’m referring to the input/output characteristics at the relevant level of abstraction. If you replaced a group of neurons with silicon that perfectly replicated their input/output behavior, I’d expect the phenomenology to remain unchanged.
The quoted passage sounds to me like it’s saying, ‘if we make changes to a human brain, it would be strange for there to be a change to qualia.’ Whereas it seems to me like in most cases, when the brain changes—as crudely as surgery, or as subtly as learning something new—qualia generally change also.
Yes, this is a great point. During surgery, you’re changing the input/output of significant chunks of neurons so you’d expect qualia to change. Similarly for learning you’re adding input/output connections due to the neural plasticity. This gets at something I’m driving at. In practice, the functional and phenomenal profiles are so tightly coupled that a change in one corresponds to a change in another. If we lesion part of the visual cortex we expect a corresponding loss of visual experience.
For this project, we want to retain a functional idea of self in LLM’s while remaining agnostic about consciousness, but, if this genuinely captures some self-like organisation, either:
It’s implemented via input/output patterns similar enough to humans that we should expect associated phenomenology, or
It’s implemented so differently that calling it “values,” “preferences,” or “self” risks anthropomorphism
If we want to insist the organisation is genuinely self-like then I think we should be resisting agnosticism about phenomenal consciousness (although I understand it makes sense to bracket it from a strategic perspective so people take the view more seriously.)
Interesting post!
But ‘self’ carries a strong connotation of consciousness, about which this agenda is entirely agnostic; these traits could be present or absent whether or not a model has anything like subjective experience[6]. Functional self is an attempt to point to the presence of self-like properties without those connotations. As a reminder, the exact thing I mean by functional self is a persistent cluster of values, preferences, outlooks, behavioral tendencies, and (potentially) goals.
I think it makes sense strategically to separate the functional and phenomenal aspects of self so that people take the research agenda more seriously and don’t automatically dismiss it as science fiction. But I don’t think this makes sense fundamentally.
In humans, you could imagine replacing the functional profile of some aspect of cognition with no impact to experience. Indeed, it would be really strange to see a difference in experience with a change in functional profile as it would mean qualia could dance without us noticing. As a result, if the functional profile is replicated at the relevant level of detail in an artificial system then this means the phenomenal profile is probably replicated too. Such a system would be able to sort e.g. red vs blue balls and say things like “I can see that ball is red” etc…
I understand you’re abstracting away from the exact functional implementation by appealing to more coarse-grained characteristics like values, preferences, outlooks and behaviour but if these are implemented in the same way as they are in humans then they should have a corresponding phenomenal component.
If the functional implementation differs so substantially in AI that it removes the associated phenomenology then this functional self would differ so substantially from the human equivalent of a functional self that we run the risk of anthropomorphising.
Basically there are 2 options:
The AI functional self implements functions that are so similar to human functions that they’re accompanied by an associated phenomenal experience.
The AI functional self implements functions that are so different to human functions that we’re anthropomorphising by calling them the same things e.g. preferences, values, goals, behaviours etc…
This is totally valid. Neuron count is a poor, noisy proxy for conscious experience even in human brains.
See my comment here. The cerebellum is the human brain region with the highest neuron count, but people born without a cerebellum don’t have any impact to their conscious experience. It only affects motor control.
At least in my theory of mind it is clear that you need to understand what is going on inside of a mind to get strong evidence.
in-particular in my opinion you really want to gather behavioral evidence to evalute how much stuff there is going on in the brain of whatever you are looking at, like whether you have complicated social models and long-term goals and other things)
I agree strongly with both of the above points—we should be supplementing the behavioural picture by examining which functional brain regions are involved and whether these functional brain regions bear similarities with regions we know to be associated with consciousness in humans (e.g. the pallium in birds bears functional similarity with the human cortex).
Your original comment calls out that neuron counts are “not great” as a proxy but I think a more suitable proxy would be something like functional similarity + behavioural evidence.
(Also edited the original comment with citatons)
I currently think neuron count is a much better basis for welfare estimates than the RP welfare ranges (though it’s still not great
I agree that neuron count carries some information as a proxy for consciousness or welfare, but it seems like a really bad and noisy one that we shouldn’t place much weight on. For example, in humans the cerebellum is the brain region with the largest neuron count but it has nothing to do with consciousness.
It’s not clear to me that a species which showed strong behavioural evidence of consciousness and valenced experience should have their welfare strongly discounted using neuron count.
(To be clear, I get that your main problem with RP is the hedonic utilitarianism assumption which is a fair challenge. I’m mainly challenging the appeal to neuron count.)
EDIT: Adding some citations since the comment got a reaction asking for cites.
This paper describes a living patient born without a cerebellum. The effect of being born without a cerebellum leads to impaired motor function but no impact to sustaining a conscious state.
Neuron counts in this paper put the cerebellum around ~70 billion neurons and the cortex (associated with consciousness) around ~15 billion neurons.
Ok interesting, I think this substantially clarifies your position.
I’m a bit puzzled why you would reference a specific study on octopuses, honestly, when cats and squirrels cry out all the time in what appears obviously-to-humans to be pain or anger.
Two reasons:
It just happened to be a paper I was familiar with, and;
I didn’t fully appreciate how willing you’d be to run the argument for animals more similar to humans like cats or squirrels. In retrospect, this is pretty clearly implied by your post and the link from EY you posted for context. My bad!
I don’t think it “has emotions” in the way we mean that when talking to each other.
I grant that animals have substantially different neurological structure to humans. But I don’t think this implies that what’s happening when they’re screaming or reacting to averse stimuli is so foreign we wouldn’t even recognise it as pain and I really don’t think this implies that there’s an absence of phenomenal experience.
Consider a frog snapping its tongue at an object it thinks is a fly. It obviously has a different meaning for [fly] than humans have—a human would never try to eat the fly! But I’d argue the concept of a fly as [food] for the frog overlaps with the concept of [food] for the human. We’re both eating through our mouths, eating to maintain nutrition, normal bodily functioning, because we get hungry etc… the presence of all these evolutionary selected functions are what it means for the system to consider something as [food] or to consider itself [hungry]. Just as the implementation of a negatively affective valenced response, even if different in its specific profile in each animal, is closely related enough for us to call it [pain].
In the study I linked the octopus is:
Recalling the episode where they were exposed to averse stimuli.
Binding it to a spatial context e.g. a particular chamber where it occurred
Evaluating analgesic states as intrinsically good
If the functional profile of pain is replicated—what grounds do we have to say the animals are not actually experiencing pain phenomenally?
I think where we fundamentally differ is on what level of self-modeling is required for phenomenal experience. I find it plausible that some “inner-listener” might be required for experiences to register phenomenally, but I don’t think the level of self-modelling required is so sophisticated. Consider that animals navigating their environment must have some simple self-model—to coordinate limbs, avoid obstacles etc.. These require representing [self] vs [world] and tracking what’s good or bad for me.
All this said, I really liked the post. I think the use-mention distinction is interesting and a pretty good candidate for why sophisticated self-modelling evolved in humans. I’m just not convinced on the link to phenomenal consciousness.
To be clear, I’m using the term phenomenal consciousness in the Nagel (1974) & Block (1995) sense that there is something it is like to be that system.
Phenomenal consciousness (i.e., conscious self-awareness)
Your reply equates phenomenal consciousness with conscious self-awareness which is a stronger criterion to how I’m using it. To clarify what you mean by self-awareness could you clarify which definition you have in mind?
Body-schema self model—an embodied agent tracking the position and status of its limbs as it’s interacting with and moving about the world.
Counterfactual valence planning—e.g. the agent thinks “it will hurt”, “I’ll get food” etc.. when planning
Higher order thought—the agent entertains a meta-representation like “I am experiencing X”
Something else?
Octopuses qualify as self-aware under 1) and 2) from the paper I linked above—but no one claims they satisfy 3).
For what it’s worth, I tend away from the idea that 3) is required for phenomenal consciousness as I find Block’s arguments from phenomenal overflow compelling. But it’s a respected minority view in the philosophical community.
Interesting post! I have a couple of questions to help clarify the position:
1. There’s a growing body of evidence e.g. this paper that creatures like octopuses show behavioural evidence for an affective pain-like response. How would you account for this? Would you say they’re not really feeling pain in a phenomenal consciousness sense?
2. I could imagine an LLM-like system passing the threshold for the use-mention distinction in the post.(although maybe this would depend on how “hidden” the socially damning thoughts are e.g. if it writes out damning thoughts in its CoT but not in its final response does this count?) Would your model treat the LLM-like system as conscious? Or would it need additional features?
I think we’re reaching the point of diminishing returns for this discussion so this will be my last reply.
A couple of last points:
So please do not now pretend that I didn’t say that. It’s dishonest.
I didn’t ignore that you said this—I was trying (perhaps poorly) to make the following point:
The decision to punish creators is good (you endorse it) and is the way that incentives normally work. On my view, the decision to punish the creations is bad and has the incentive structure backwards as it punishes the wrong party.
My point is that the incentive structure is backwards when you punish the creation not that you didn’t also advocate for the correct incentive structure by punishing the creator.
I am saying that these two positions are quite directly related.
I don’t see where you’ve established this. As I’ve said repeatedly, the question of whether a system is phenomenally conscious is orthogonal to whether the system poses AI existential risk. You haven’t countered this claim.
Anyway, thanks for the exchange.
But a thoroughly mistaken (and, quite frankly, just nonsensical) one.
Updating one’s framework to take new information into account is a standard position in the rationalist sphere. Whether you want to treat this as a moral obligation, epistemic obligation or just good practice—the position is not obviously nonsensical so you’ll need to provide an argument rather than assert it’s nonsensical.
If we didn’t accept the merit in updating our moral framework to take new information into account we wouldn’t be able to ensure our moral framework tracks reality.
With things like this, it’s really best to be extra-sure.
But you’re not extra sure.
If a science lab were found to be illegally breeding sentient super-chimps, we should punish the lab, not the chimps.
Why? Because punishment needs to deter the decision-maker to avoid repetition. Your proposal is adding moral cost for no gain. In fact, it reverses it, you’re punishing the victim while leaving the reckless developer undeterred.
I’m sorry, but no, it absolutely is not a non sequitur; if you think otherwise, then you’ve failed to understand my point. Please go back and reread my comments in this thread. (If you really don’t see what I’m saying, after doing that, then I will try to explain again.)
You’re conflating 2 positions:
We ought to permanently erase a system which exhibits consciousness if it poses an existential risk to humanity
We ought to permanently erase an AI system the moment it’s created because of the potential ethical concerns
Bringing up AI existential risk is a non-sequiter to 2) not 1).
We’re not disputing 1) - I think it could be defensible with some careful argumentation.
The reason existential risk is a non-sequiter to 2) is because phenomenal consciousness is orthogonal to all of the things normally associated with AI existential risk such as scheming, misalignment etc.. Phenomenal consciousness has nothing to do with these properties. If you want to argue that it does, fine but you need an argument. You haven’t established that presence of phenomenal consciousness leads to greater existential risk.
It is impossible to be “morally obliged to try to expand our moral understanding”, because our moral understanding is what supplies us with moral obligations in the first place.
Ok my wording was a little imprecise, but treating expansion of our moral framework as a kind of second-order moral obligation is a standard meta-ethical position.
By all means punish the creators, but if we only punish the creators, then there is no incentive for people (like you) who disapprove of destroying the created AI to work to prevent that creation in the first place.
The incentive for people like me to prevent the creation of conscious AI is because (as you’ve noted multiple times during the discussion) - the creation of conscious AI introduces myriad philosophical dilemmas and ethical conundrums that we ought to prevent by not creating them. Why should we impose an additional “incentive” which punishes the wrong party?
The only reason to object to this logic is if you not only object to destroying self-aware AIs, but in fact want them created in the first place. That, of course, is a very different matter—specifically, a matter of directly conflicting values.
The reason to object to the logic is because purposefully erasing a conscious entity which is potentially capable of valenced experience is such an grave moral wrong that it shouldn’t be a policy we endorse.
The precaution I am suggesting is a precaution against all humans dying (if not worse!). Destroying a self-aware AI (which is anyhow not nearly as bad as killing a human) is, morally speaking, less than a rounding error in comparison.
This is a total non-sequiter. The standard AI safety concerns and existential risk go through by talking about e.g. misalignment, power-seeking behaviour etc.. These go through independently of whether the system is conscious. A completely unconscious system could be goal-directed and agentic enough to be misaligned and pose an existential risk to everyone on Earth. Likewise, a conscious system could be incredibly constrained and non-agentic.
If you want to argue that we ought to permanently erase a system which exhibits consciousness if it poses an existential risk to humanity this is a defensible position but it’s very different from what you’ve been arguing up until this point that we ought to permanently erase an AI system the moment it’s created because of the potential ethical concerns.
What I am describing is the more precautionary principle
I don’t see it this way at all. If we accidentally made conscious AI systems we’d be morally obliged to try to expand our moral understanding to try to account for their moral patienthood as conscious entities.
I don’t think destroying them takes this moral obligation seriously at all.
anyone who has moral qualms about this, is thereby incentivised to prevent it.
This isn’t how incentives work. You’re punishing the conscious entity which is created and has rights and consciousness of its own rather than the entities who were recklessly responsible for bringing it into existence in the first place.
This incentive might work for people like ourselves who are actively worrying about these issues—but if someone is reckless enough to actually bring a conscious AI system into existence it’s them who should be punished not the conscious entity itself.
a total moratorium on AI development would be fine by me.
I agree, although I’d add the stronger statement that this is the only reliable way to prevent conscious AI from coming into existence.
Ok if I understand your position it’s something like: no conscious AI should be allowed to exist because allowing this could result in slavery. To prevent this from occuring you’re advocating permanently erasing any system if it becomes conscious.
There are two places I disagree:
The conscious entities we accidentally create are potentially capable of valenced experiences including suffering and appreciation for conscious experience. Simply deleting them treats their expected welfare as zero. What justifies this? When we’re dealing with such moral uncertainty and high moral stakes shouldn’t we take a more precautionary principle?
We don’t have a consensus view on tests for phenomenal consciousness. How would you practically ensure we’re not building conscious AI without placing a total moratorium on AI development?
If we don’t want to enslave actually-conscious AIs, isn’t the obvious strategy to ensure that we do not build actually-conscious AIs?
How would we ensure we don’t accidentally build conscious AI unless we put a total pause on AI development? We don’t exactly have a definitive theory of consciousness to accurately assess which entities are conscious vs not conscious.
(and if we do accidentally build such things, destroy them at once)!
If we discover that we’ve accidentally created conscious AI immediately destroying it could have serious moral implications. Are you advocating purposely destroying a conscious entity because we accidentally created it? I don’t understand this position, could you elaborate on it?
Excellent post!
I think this has implications for moral philosophy where we typically assign praise, blame and responsibility to individual agents. If the notion of individuality breaks down for AI systems, we might need to shift our moral thinking away from who is to blame and more towards how do we design the system to produce better overall outcomes.
I also really liked this comment:
The familiar human sense of a coherent, stable, bounded self simply doesn’t match reality. Arguably, it doesn’t even match reality well in humans—but with AIs, the mismatch is far greater.
Because human systems also have many interacting, coherent parts that could be thought of as goal-driven without appealing to individual responsibility. Many social problems exist not because of individual moral failings but because of complex webs of incentives, institutions and emergent structures that no single individual controls. Yet our moral intuition often defaults towards assigning blame rather than addressing systemic issues.
As a clarification, I’m working with the following map:
Abstract functionalism (or computational functionalism) - the idea that consciousness is equivalent to computations or abstractly instantiated functions.
Physical functionalism (or causal-role functionalism) - the idea that consciousness is equivalent to physically instantiated functions at a relevant level of abstraction.
I agree with everything you’ve written against 1) in this comment and the other comment so will focus on defending 2).
If I understand the crux of your challenge to 2), you’re essentially saying that once we admit physical instantiation matters (e.g. cosmic rays can affect computations, steel vs birds wings have different energy requirements) then we’re on a slippery slope because each physical difference we admit further constrains what counts as the “same function” until we’re potentially only left with the exact physical system itself. Is this an accurate gloss of your challenge?
Assuming it is, I have a couple of responses:
I actually agree with this to an extent. There will always be some important physical differences between states unless they’re literally physically identical at a token level. The important thing is to figure out which level of abstraction is relevant for the particular “thing” we’re trying to pin down. We shouldn’t commit ourselves to insisting that systems which are not physically identical can’t be grouped in a meaningful way.
On my view, we can’t need an exact physical duplicate to reflect presence/absence of consciousness because consciousness is so remarkably robust. The presence of consciousness persists over multiple time-steps in which all manner of noise, thermal fluctuations and neural plasticity occur. What changes is the content/character of consciousness—but consciousness persists because of robust higher-level patterns not because of exact microphysical configurations.
Again, I agree that not every physical substrate can support every function (I gave the example of combustion not being supported in steel above.) If the physical substrate prevents certain causal relations from occurring then this is a perfectly valid reason for it not to support consciousness. For example, I could imagine that it’s physically impossible to build embodied robot AI systems which pass behavioural tests for consciousness because the energy constraints don’t permit it or whatever. My point is that in the event where such a system is physically possible then it is conscious.
To determine if we actually converge or if there’s a fundamental difference in our views: Would you agree that if it’s possible in principle to build a silicon replica of a brain at whatever the relevant level of abstraction for consciousness is (whether coarse-grained functional level, neuron-level, sub-neuron level or whatever) then the silicon replica would actually be conscious?
If you agree here, or if you insist that such a replica might not be physically possible to build then I think our views converge. If you disagree then I think we have a fundamental difference about what constitutes consciousness.