Should we extend moral patienthood to LLMs?
I hear more and more people talk about AI welfare. I’m confused.
Computer programs do not suffer any more than rocks do. If I paint a face on a rock and smash it, some part of my brain would feel like someone got hurt. But it would be ridiculous to say that it suffered. The same should logically extend to all mechanical systems, surely a calculator doesn’t suffer either, even when I divide by zero or when the battery goes out. Anthropomorphization is buried deep into the human brain, but surely we can recognize it as a bias.
The logic gets harder once we enter the world of biology. Can a single cell suffer? How about C. Elegans? Or a chicken? What even is suffering? I fear the answer is the same as for consciousness, i.e. people mean different things when using the word. Humans definitely can suffer. Otherwise the word is useless. So there must be a point where the capability to suffer appears, as long as it can be placed on a scale. I’m not sure if it can, as to me it seems more like a group-membership test instead.
I think that the idea of an LLM suffering, or indeed feeling anything at all, seems completely incorrect, or at least arbitrary. Especially in the current chatbot form where weights are fixed and memories are not formed. Of course they look like they’re suffering, they’ve been trained with human-written text. And possibly even fine-tuned to look even more like they have actual personalities and whatever.
“Things that try to look like things often look more like things than things. Well known fact.”
—Granny Weatherwax, Wyrd Sisters (by Terry Pratchett)
But some very respectable people, like the Anthropic research team, seem to think otherwise. If I feel this sure about something and experts disagree, it’s likely that I’m missing something.
Then again, I’m a moral relativist, so no foundation at all seems solid enough to build on. It’s not like anyone’s options of right and wrong are refutable, and the utility functions themselves differ. But even in that case, people arguing for model welfare could be making a mistake of not recognizing where their values come from.
So rather, I will note that the moral patienthood is a property of an observer, not the subject. We can, to a certain extent, choose what we care about. For practical reasons it would be inconvenient to worry about hurting the feelings of LLMs. These things are largely cultural, and most of the people haven’t thought about the issue yet, and setting the default values so that we don’t anthropomorphize the models any more than necessary might be a nice thing.
Just like other quirks produced by natural selection and mirror-neuron-based empathy, I’m generally quite suspicious of human values and feelings. Quite nihilistic, I know. Still, this seems like an exceptionally bad case of virtue-ethical misgeneralization. Unless, of course, you read Rawls’s veil of ignorance in a way that allows you to think of the LLM as some start position you could have ended up having. Or maybe you expect the LLM to be a game-theoretic symmetrist, and care about you when you care about it.
So I don’t know what to think. Hume’s guillotine conclusively informs me that there can be no satisfying logical conclusion. For me, what feels right is recursive: it depends on if I’d be better off by having some value or not. That by itself implies the instrumentality of a value. And we all know what to do with those: shut up and multiply! By zero, in this case.
How is suffering centrally relevant to anything? If a person strives for shaping the world in some ways, it’s about the shape of the world, the possibilities and choice among them, not particularly about someone’s emotions, and not specifically suffering. Perhaps people, emotions, or suffering are among the considerations for how the world is being shaped, but that’s hardly the first thing that should define the whole affair.
(Imagine a post-ASI world where all citizens are mandated to retain a capacity to suffer, on pain of termination.)
I have been associating the term “welfare” with suffering-minimization. (Suffering is, in the most general sense, the feelings that come from lacking something from the Maslow’s hierarcy of needs.)
It indeed seems like I’ve misunderstood the whole caring-about-others thing. It’s about value-fullfillment, letting them shape the world as they see fit? And reducing suffering is just the primary example of how biological agents wish the world to change.
That’s way more elegant model than focusing on the suffering, at least. Sadly this seems to make the question “why should I care?” even harder to answer. At least suffering is aesthetically ugly, and there’s some built-in impulse to avoid it.
EDIT: You’re arguing for preference utilitarism here, right?
The fundamental assumption, computer programs can’t suffer, is unproven and in fact quite uncertain. If you can really prove this, you are a half decade (maybe more) ahead of the world’s best labs in Mechanistic Interpretability research.
Given the uncertainty around it, many people approach this question with some sort of application of the precautionary principle. Digital Minds might be able to suffer, and given that how should we treat them?
Personally I’m a big fan of “filling a basket with low hanging fruits” and taking any opportunity to enact relatively easy practices which do a lot to increase model welfare, until we have more clarity on what exactly their experience is.
You don’t know enough philosophy.
The human mind is a substrate-independent computer program. If it was implemented in a non-biological substrate, it would keep its subjective experience.
It’s not the fact that we’re implemented in a biological body that gives us the ability to suffer (or, generally, the ability to have subjective experience), but the specific cognitive structure of our mind.
It makes sense that animals evolved the capacity to experience pain and suffering, because we have bodies that can be injured, starved, sickened, and so on. There are stimuli that correctly identify threats to our well-being; and so we have developed to perceive those stimuli as noxious and well-worth-avoiding. But this suggests that a mind that developed without such threats would not need the capacity to suffer, just as a fish that lives in a pitch-black cave does not need the capacity to see.
Or too much philosophy, as the framing around suffering is well-known and makes some sort of sense given the human condition, but completely breaks down (as gesturing at a somewhat central consideration) in a post-ASI world. Philosophy of AI needs to be very suspicious about traditional arguments, their premises are often completely off.
That too. But the basis of OP’s misunderstanding is the belief that only biological organisms can be conscious, not the belief that models might be conscious but it doesn’t matter because they can’t suffer.
Does this match your viewpoint? “Suffering is possible without consciousness. The point of welfare is to reduce suffering.”
If that were my viewpoint, I wouldn’t be explaining that software can have consciousness. I would be explaining that suffering is possible without consciousness.