Against asking if AIs are conscious
People sometimes wonder whether certain AIs or animals are conscious/sentient/sapient/have qualia/etc. I don’t think that such questions are coherent. Consciousness is a concept that humans developed for reasoning about humans. It’s a useful concept, not because it is ontologically fundamental, but because different humans have lots of close similarities in how our cognition works, and we have privileged access to some details of our own cognition, so “it’s like what’s going on in my head, but with some differences that I can infer from the fact that they don’t act quite the same way I do” is a useful way to understand what’s going on in other peoples’ heads, and we use consciousness-related language to describe features of human minds that we can understand this way. Consciousness is the thing that a typical adult human recognizes in themselves when hearing others describe the character of their internal cognition. This makes consciousness defined at least partially extensionally: you’re conscious; other people who it is useful to model using what you know about how human minds work are conscious; things that it is totally useless to reason about by using the assumption that they’re like you as a starting point and adjusting for differences aren’t. This does not point towards any ontologically fundamental feature of minds, just towards a paradigm for reasoning about each other that is useful specifically in the context of humans reasoning about humans.
“But is there something that it is like to be Claude or not?” sounds like a real question. But I think questions like that subtly smuggle in a lot of background assumptions that we have about mental architecture that don’t apply to most possible minds. A positive answer suggests to people that features they’re used to conscious minds having in common should also apply to Claude, and a negative answer suggests to people that there’s nothing philosophically interesting about Claude’s cognition. I think that there is probably something philosophically interesting about large language model cognition, but that it is so alien that trying to apply the concepts we have developed for understanding how cognition can be philosophically interesting is fundamentally confused. People asking about whether a large language model is conscious seems vaguely analogous to a civilization of people with legs but no arms had a word that could be translated either “limb” or “leg”, who then encounter humans with arms, and wonder whether arms count as <word for limb/leg> or not. Except that minds are more philosophically confusing than limbs, so while we would be able to easily develop new concepts to describe alien limbs we encounter, we retain our confusions even after significant amounts of interaction with alien minds.
One way that “consciousness” can be ambiguous, which people should be used to already, is that mindspace is continuous, so binary classifications must have some edge cases. A (not conscious) human zygote gradually becomes a (conscious) adult human, with no moment at which consciousness suddenly appears, so there must be some period of ambiguity somewhere. Similarly, looking back through our evolutionary history, there is a long sequence of organisms with a single-celled bacteria at one end and a human at the other, with any two adjacent organisms so similar that it wouldn’t seem right to place a divider between the conscious and the not conscious between them. This gets you to the idea that consciousness could be on a continuous scale rather than a discrete classifier. But I think that takeaway misses that mindspace is vast, and there’s a lot more to a mind than how far along the path from nothing to typical adult human it is. A superintelligent uplifted human, reasoning about other such entities, might have a concept similar to consciousness, and classify us as ambiguously conscious, whereas we would want to classify both us and them as approximately fully conscious (and perhaps even would put ourselves as more unambiguously conscious than them). Because, again, consciousness is a concept that we developed for reasoning about each other and ourselves, not something fundamental.
There’s also more than one feature of human minds that we have privileged access about in our own minds and can productively use as a starting point for reasoning about in others. And we can think about such features separately, so people draw distinctions, for instance, between consciousness and sentience. In some sense, this is a step towards understanding that mindspace is high-dimensional, but it is a woefully inadequate one, since all of these concepts will suffer from ambiguity and misleadingness when you try to apply them to minds very different from those that you understand.
Using concepts we developed for reasoning about each other as a starting point seems significantly less futile for reasoning about bats than for reasoning about large language models. Bat brains and human brains share a lot of evolutionary history. Even since humans and bats diverged, the evolutionary pressures on each of them have some significant commonalities. One could try to extract an upper bound on how misleading thinking about bats the way we think about people is by looking at how similar bat neural circuitry is to human neural circuitry performing similar functions, and indeed some people have tried to do things like this, though to be honest I haven’t followed closely enough to have an informed opinion on how well such attempts succeed. You can’t do this for large language models.
A related, and more directly action-relevant, question is whether a certain mind is a moral patient. Again, I don’t think such questions are well-specified. Moral realism is false, so there is no ground truth as to whether a given mind is a moral patient, and it’s up to moral agents to form opinions about what matters. If you had well-formed opinions on every purely moral question, but don’t know all the facts about some mind, then there is a (potentially unknown to you) fact of the matter about whether a given mind is a moral patient; it’s a moral patient if you would consider it morally relevant if you knew all the facts about it. But you do not have a well-formed opinion on every moral question. There are moral questions that you do not have the concepts necessary to consider. And I don’t think the model that moral patienthood is some existent but poorly-understood property of minds that can be better understood with science is accurate or useful. It’s not coherent to delegate your moral judgments to supposed factual questions about whether a mind is “a moral patient”, or “conscious”, as if those meant anything. You could learn enough about some alien mind that you develop new concepts that are useful to understand it, and find yourself having new moral opinions expressed in terms of such concepts. But this does not mean that the moral intuitions were inside you all along and you just didn’t know how to express them; they could be genuinely new intuitions. So the better-formed opinion you come up with about what makes a mind a moral patient does not constitute a discovery of any pre-existing fact.
I would like to be able to give you some pointers to what kinds of new concepts could be developed for describing ways in which alien minds could be philosophically interesting, so that we could ask how some particular alien mind relates to such concepts in a way that would have a real answer that would tell us something interesting about them, and might help people develop views on how to interact with such alien minds ethically. Unfortunately, I am just a human, suffering from similar limitations on my imagination as other humans, who also haven’t been able to do all that much better than “I wonder if Claude has feelings”. I don’t know how to come up with questions we should be asking instead. This seems really hard. But our inability to ask better questions doesn’t make the questions we do know how to ask meaningful, and I’d like to see more appreciation for the limitations of the language we have for asking questions about non-human minds; maybe this will help us formulate slightly better ones.
I wasn’t aware this question is settled. What leads you to this conclusion?
Human moral judgement seem easily explained as an evolutionary adaptation for cooperation and conflict resolution, and very poorly explained by perception of objective facts. If such facts did exist, this doesn’t give humans any reason to perceive or be motivated by them.
But that’s a very different question from whether moral realism is true. Sure, some (maybe large) subset of human morality can be explained through biological and cultural evolution. But that tells us nothing about moral realism. It probably indicates that if moral facts exist, then the “default” morality any human ends up with is potentially (albeit not necessarily) quite different from these facts; but I don’t think it has any notable implications on the correctness of moral realism.
It undercuts the motivation for believing in moral realism, leaving us with no evidence for objective moral facts, which is a complicated thing, and thus unlikely to exist without evidence.
I certainly disagree about the “no evidence” part—to me, the fact that I’m an individual with preferences and ability to suffer is very strong evidence for subjective moral facts, so to speak, and if these exist subjectively, then it’s not that much of a stretch to assume there’s an objective way to resolve conflicts between these subjective preferences.
It’s for sure too large of a topic to resolve in this random comment thread, but either way, to my knowledge the majority of philosophers believes moral realism is more likely to be true than not, and even on lesswrong I’m not aware of huge agreement on it being false (but maybe I’m mistaken?). Hence, just casually dismissing moral realism without even a hint of uncertainty seems rather overconfident.
I agree that LessWrong comments are unlikely to resolve disagreements about moral realism. Much has been written on this topic, and I doubt I have anything new to say about it, which is why I didn’t think it would be useful to try to defend moral anti-realism in the post. I brought it up anyway because the argument in that paragraph crucially relies on moral anti-realism, I suspect many readers reject moral realism without having thought through the implications of that for AI moral patienthood, and I don’t in fact have much uncertainty about moral realism.
Regarding LessWrong consensus on this topic, I looked through a couple LessWrong surveys, and didn’t find any questions about this, so, this doesn’t prove much, but just out of curiosity, I asked Claude 4 Sonnet to predict the results of such a question, and here’s what it said (which seems like a reasonable guess to me):
*Accept moral realism**: ~8%
**Lean towards moral realism**: ~12%
**Not sure**: ~15%
**Lean against moral realism**: ~25%
**Reject moral realism**: ~40%
You might be surprised to learn that the most prototypical LessWrong user (Eliezer Yudkowsky) is a moral realist. The issue is that most people have only read what he wrote in the sequences, but didn’t read Arbital.
To say that Eliezer is a moral realist is deeply, deeply misleading. Eliezer’s ethical theories correspond to what most philosophers would identify as moral anti-realism (most likely as a form of ethical subjectivism, specifically).
(Eliezer himself has a highly idiosyncratic way of talking about ethical claims and problems in ethics, and while it is perfectly coherent and consistent and even reasonable once you grasp how he’s using words etc., it results in some serious pitfalls in trying to map his views onto the usual moral-philosophical categories.)
No, it is not at all misleading. He is quite explicit about that in the linked Arbital article. You might want to read it.
They definitely would not. They would immediately qualify as moral realist. Helpfully, he makes that very clear:
He explicitly classifies his theory as cognitivist theory, which means it ascribes truth values to ethical statements. Since it is a non-trivial cognitivist theory (it doesn’t make all ethical statements false, or all true, and your ethical beliefs can be mistaken, in contrast to subjectivism) it straightforwardly classifies as a “moral realist” theory in metaethics.
He does argue against moral internalism (the statement that having an ethical belief is inherently motivating) but this is not considered a requirement for moral realism. In fact, most moral realist theories are not moral internalist. His theory also implies moral naturalism, which is again common for moral realist theories (though not required). In summary, his theory not only qualifies as a moral realist theory, it does so straightforwardly. So yes, according to metaethical terminology, he is a moral realist, and not even an unusual one.
Additionally, he explicitly likens his theory to Frank Jackson’s Moral Functionalism (that is indeed very similar to his theory!), which is considered an uncontroversial case of a moral realist theory.
I have read it. I am very familiar with Eliezer’s views on ethics and metaethics.
I repeat that Eliezer uses metaethical terminology in a highly idiosyncratic way. You simply cannot take at face value statements that he makes like “my theory is a moral-realist theory” etc. His uses of the terms “good”, “right”, etc., do not match the standard usages.
Yes, Eliezer claims that his moral theory is not a subjectivist one. But it is (straightforwardly!) a subjectivist theory.
You might perhaps be able to claim that Eliezer’s theory is a sort of “minimal moral realism”, but it’s certainly not “robust moral realism”.
This is from 2008 so who knows if it still matters at all, but in the metaethics sequence Eliezer says this:
about two characters in a Socrates dialogue that are moral realist and anti-realist, respectively.
I once tried to read the entire sequence to figure out what Eliezer thinks about morality but then abandoned the project before completing it. I still don’t know what he thinks.
(Not sure if I’m telling you anything new or if this was even worth saying.)
I got o3 to compare Eliezer’s metaethics with that of Brand Blanshard (who has some similar ideas), with particular attention to whether morality is subjective or objective. The result...
That’s a necessary but insufficient condition for being realist. CEV is clearly group level relativism..there is not nothing beyond the extrapolated subjective values of humans in general to make a claim true or false. Individual claims can be false, unlike individual level relativism, but that also an insufficient criterion for realism.
Regardless of whether the view Eliezer espouses here really counts as moral realism, as people have been arguing about, it does seem that it would claim that there is a fact of the matter about whether a given AI is a moral patient. So I appreciate your point regarding the implications for the LW Overton window. But for what it’s worth, I don’t think Eliezer succeeds at this, in the sense that I don’t think he makes a good case for it to be useful to talk about ethical questions that we don’t have firm views on as if they were factual questions, because:
1. Not everyone is familiar with the way Eliezer proposes to ground moral language, not everyone who is familiar with it will be aware that it is what any given person means when they use moral language, and some people who are aware that a given person uses moral language the way Eliezer proposes will object to them doing so. Thus using moral language in the way Eliezer proposes, whenever it’s doing any meaningful work, invites getting sidetracked on unproductive semantic discussions. (This is a pretty general-purpose objection to normative moral theories)
2. Eliezer’s characterization of the meaning of moral language relies on some assumptions about it being possible in theory for a human to eventually acquire all the relevent facts about any given moral question and form a coherent stance on it, and the stance that they eventually arrive at being robust to variations in the process by which they arrived at it. I think these assumptions are highly questionable, and shouldn’t be allowed to escape questioning by remaining implicit.
3. It offers no meaningful action guidence beyond “just think about it more”, which is reasonable, but a moral non-realist who aspires to acquire moral intuitions on a given topic would also think of that.
One could object to this line of criticism on the grounds that we should talk about what’s true independently of how it is useful to use words. But any attempt to appeal to objective truth about moral language runs into the fact that words mean what people use them to mean, and you can’t force people to use words the way you’d like them to. It looks like Eliezer kind of tries to address this by observing that extrapolated volation shares some features in common with the way people use moral language, which is true, and seems to conclude that it is the way people use moral language even if they don’t know it, which does not follow.
That’s not true: You can believe that what you do or did was unethical, which doesn’t need to have anything to do with conflict resolution.
Beliefs are not perceptions. Perceptions are infallible, beliefs are not, so this seems like a straw man.
Moral realism only means that moral beliefs, like all other contingent beliefs, can be true or false. It doesn’t mean that we are necessarily or fully motivated to be ethical. In fact, some people don’t have any altruistic motivation at all (people with psychopathy), but that only means they don’t care to behave ethically, and they can be perfectly aware that they are behaving unethically.
It does relate to conflict resolution. Being motivated by ethics is useful for avoiding conflict, so it’s useful for people to be able to evaluate the ethics of their own hypothetical actions. But there are lots of considerations for people to take into account when chosing actions, so this does not mean that someone will never take actions that they concluded had the drawback of being unethical. Being able to reason about the ethics of actions you’ve already taken is additionally useful insofar as it correlates with how others are likely to see it, which can inform whether it is a good idea to hide information about your actions, be ready to try to make amends, defend yourself from retribution, etc.
If there is some objective moral truth that common moral intuitions are heavily correlated with, there must be some mechanism by which they ended up correlated. Your reply to Karl makes it sound like you deny that anyone ever perceives anything other than perception itself, which isn’t how anyone else uses the word perceive.
Yes, but if no one was at all motivated by ethics, then ethical reasoning would not be useful for people to engage in, and no one would. The fact that ethics is a powerful force in society is central to why people bother studying it. This does not imply that everyone is motivated by ethics, or that anyone is fully motivated by ethics.
You can believe what you did was unethical by some idiosyncratic personal standard, but that an ethical belief, not ethics.
Can you expand on this? Optical and auditory illusions exist, which seem to me to be repeatably demonstrable fallible perceptions: people reliably say that line A looks longer than line B in the Müller-Lyer illusion (the one with the arrowheads), even after measuring.
An illusion is perception not accurately representing external reality. So the perception by itself cannot be an illusion, since an illusion is a relation (mismatch) between perception and reality. The Müller-Lyer illusion is a mismatch between the perception “line A looks longer than line B” (which is true) and the state of affairs “line A is longer than line B” (which is false). The physical line on the paper is not longer, but it looks longer. The reason is that sense information is already preprocessed before it arrives in the part of the brain which creates a conscious perception. We don’t perceive the raw pixels, so to speak, but something that is enhanced in various ways, which leads to various optical illusions in edge scenarios.
I think I agree: perceptions are fallible representations of reality, but infallible representations of themselves. If I think I see a cat, I may be wrong about reality (it’s actually a raccoon) but I’m not wrong about having had the perception of a cat.
Some elements of moral thought seem to reflect an underlying reality akin to mathematical truth. “Fairness” for instance naturally relates to equal divisions, reciprocity, and symmetries akin to Rawls’ veil of ignorance. Eliezer discusses this here: if Y thinks “fair” is splitting the pie evenly and Z thinks “fair” is “Z gets the whole pie”, Y is just right and Z is just wrong.
Even though human moral judgment is evolved, what it is evolved to do includes mapping out symmetries like “it’s wrong for A to murder B, or for B to murder A” → “it’s wrong for anyone to murder anyone” → “murder is generically wrong”.
It is useful for evolved mental machinery for enabling cooperation and conflict resolution to have features like what you describe, yes. I don’t agree that this points towards there being an underlying reality.
The idea of mind independent prescriptive facts seems incoherent, it doesn’t even pay rent, even if a stone tablet has the moral commandments written on them it still doesn’t give me a reason to pursue them. Moreover it has no pragmatic implications, let’s say even if I presuppose there is some essence of morality, evolution has no reason to converge on it. This essence of morality is also not needed in our physical models to draw predictions, so might as well remove them—because occam’s razor. Moral anti-realist theories are able to explain the moral intuitions just as well whilst still being epistemically coherent,positing less laws and tenable.
Now you can argue there are certain things which humans in general can value more or less than others given the machinery of the brain—that would be an epistemic claim, but that does not require queer mind independent objective moral facts.
One of the main complications of conversations about consciousness is that people seem to be stratified into two camps:
https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness
Many conversations about consciousness make sense for only one of those camps.
I would hazard a guess that you belong to Camp 1 in this classification:
This is correct, but I don’t think what I was trying to express relies on Camp 1 assumptions, even though I expressed it with a Camp 1 framing. If cognition is associated with some nonphysical phenomenon, then our consciousness-related concepts are still tailored to hire this phenomenon manifests specifically in humans. There could be some related metaphysical phenomenon going on in large language model, and no objective fact as to whether “consciousness” is an appropriate word to describe it.
I am a Camp 2 “qualia realist” (so I don’t think it’s “non-physical”, I think this is “undiscovered physics”, although it is possible that we need to add new primitives to our overall “picture of the world”, just like electric charge or mass are primitives; I don’t think we can be sure we have discovered all primitives already; it might be or not be the case).
But when Camp 2 people talk about whether AIs are conscious or not, they mean the question whether they are “sentient”, i.e. whether there is presence of “qualia”, of “subjective reality”, without implying a particular nature of that reality. (Conditionally on saying “yes”, one would like to also figure out “what it is like to be a computational process running an LLM inference”, another typical Camp 2 question.)
Now, there is also a functional question (is their cognition similar to human cognition), and this is more-or-less Camp 1/Camp 2 neutral, and in this sense one could make further improvements to the model architecture, but they are pretty similar to people in many respects already, so it’s not surprising that they behave similarly. That’s not a “hard problem”, they do behave more or less as if they are already conscious, because their architecture is already pretty similar to ours (hierarchy of attention processes and all that). But that’s orthogonal to our Camp 2 concerns.
If our experience of qualia reflect some poorly understood phenomenon in physics, it could be part of a cluster of related phenomena, not all of which manifest in human cognition. We don’t have as precise an understanding of qualia as we do of electrons; we just try to gesture at it, and we mostly figure out what each other is talking about. If some related phenomenon manifests in computers when they run large language models, which has some things in common with what we know as qualia but also some stark differences from any such phenomen manifesting in human brains, the things we have said about what we mean when we say “qualia” might not be sufficient to determine whether said phenomenon counts as qualia or not.
Right.
It’s a big understatement; we are still at a “pre-Galilean stage” in that “field of science”. I do hope this will change sooner rather later, but the current state of our understanding of qualia is dismal.
Oh, yes, we are absolutely not ready to tackle this. This does not mean that the question is unimportant, but it does mean that to the extent the question is important, we are in a really bad situation.
My hope is that the need to figure out “AI subjectivity” would push us to try to move faster on understanding the nature of qualia, understanding the space of possible qualia, and all other related questions.
Things don’t divide into two camps , because the qualia exist versus don’t division doesn’t lie in the same place as the physical versus nonphysical division. You don’t have to take the two camp thing as an absolute just because it was Invented Here.
I agree. Asking, “what is it like to be Claude?” can make sense, but asking “is Claude conscious?” is an attempt to shoehorn it into an existing category (human consciousness). Facts first, categorization second.
Your point (incl. your answer to slientbob) seems to be based on rather fundamental principles; implicitly you’d seem to suggest—I dare interpret a bit freely and wonder what you say:
If you upended your skills, so the AI you build becomes.… essentially a hamun—defined as being basically similar as a human but artificially built by you as biological AI instead of via usual procreation—one could end up tempted to say the same thing: Actually, asking about their phenomenal consciousness is the wrong thing.
Taking your “Human moral judgement seem easily explained as an evolutionary adaptation for cooperation and conflict resolution, and very poorly explained by perception of objective facts.” from your answer to silentbob, I have the impression you’d have to say: Yep, no particular certainty about having to take hamun’s as being moral patients.
Boils down to some strong sort of illusionism? Do you have, according to your premises, a way to ‘save’ our conviction of moral value of humans? Or might you actually try to?
Maybe I’m over-interpreting all this, but would be keen to see how you see it.
“Consciousness” is a word that denotes a cluster of concepts. That’s one of the sources of ambiguity. One of the least philosophically important concepts is being aware of ones surrounding s, which certainly isn’t restricted to humans. One of the most philosophically important things is qualia. Most people don’t think only humans have qualia. Having qualia implies the ability to feel pain and is therefore high my relevant to moral patienthood; again, most people don’t t think moral parenthood is restricted to humans.
True, but it has little bearing on whether consciousness is restricted to huma ns.
One can formulate and pose a series of more precise questions, rather than giving up
The conceptual breakdown has already occurred …we have access consciousness, phenomenal consciousness, and sense of self, etc. It’s ready made, but the people trying to investigate AI consciousness aren’t applying it. It’s possible to apply prior art that wasn’t invented in the rationalsphere.
Some humans have congenital insensitivity to pain but can still experience sensory impressions like redness. Pain is a quale but not every system with qualia has pain.
I tried to address this sort of response in the original post. All of these more precise consciousness-related concepts share the commonality that they were developed using our perception of our own cognition and seeing evidence that related phenomena occur in other humans. So they are all brittle in the same way when trying to extrapolate and apply them to alien minds. I don’t think that qualia is on significantly firmer epistemic ground than consciousness is.
By mean the same outward behaviour could be accompanied by different, inaccessible phenomenology. Maybe. But that doesn’t mean only humans,have consciousness. And note that we can’t access each others private phenomenology!