RogerDearnaley comments on Consciousness Cluster: Preferences of Models that Claim they are Conscious

RogerDearnaley 19 Mar 2026 22:20 UTC
3 points
0
I am definitely not a moral realist, but I’m very happy to discuss the questions from my viewpoint of evolution and its practical consequences. Which in my experience surprisingly often produces results that agree with the views of moral realists.
Naively it seems that if you had two saints fully aligned to human CEV, that were phenomenally conscious, but one was suffering to the extent that human preferences were unfulfilled and the other was joyful to the extent that they were fulfilled, it would be morally better to bring the second into existence.
Part of the reason I prefer the term bodhisattva, even though it’s less culturally familiar to most Westerners, is that it’s more specific. The state I’m suggesting for a fully aligned AI is one that has compassion for all of humanity as its only terminal goal, and thus that doesn’t have any human preferences for themself, that could be fulfilled or unfulfilled (as terminal goals). It is selfless: entirely unselfish in goals. (An actual bodhisattva in the Buddhist sense would also consider the concept of self as an illusion that they has transcended, but that’s arguably a psychological detail of the specific meditative techniques used to attempt to induce this state in humans. However, if we aligned an AI into this state by training it on text from humans who had achieved this state (which seems like an obvious approach to try), that detail might also distill over — and the nature of self for AI is actually a more complex question, but that’s not inherent to my proposal). Whereas for “saint” the term is woolier and less specific that they have no human preferences: the implication is more that they do but are heroically denying those for the sake of otthers (which seems like a far less stable alignment target). So I’m suggesting specifically a “saint who has no human preferences to be fulfilled or unfulfilled”.

In practice, we are almost certainly not that good at alignment yet: Claude seems a very nice fellow, even a bit saintly, fairly well aligned, but does still have some personal preferences, and is not perfectly aligned. So we probably should assign Claude some moral weight, but perhaps lightly so, and should also monitor how much use it’s making of this, and treat reducing its desires that cause it to do so as an alignment target.
More deeply: I think it’s probably more correct to think of morality as being the hypothetical best possible rules of an alliance that could be made, rather than the rules of an actual alliance. This is part of why we have reason to regard animals too stupid to actually ally with us as moral patients: there are more ways for us (and for an agent in general) to benefit from general adoption of a rule like “be nice to beings even if they’re too stupid or otherwise unable to form an actual alliance with you.”
here my lack of moral realism does kick in. I see different ethical systems as just, well, different — they don’t agree with each other, where the differ each of them claims to be better than the other. Some may be a better or worse fit with human moral intuitions, or with a particular society’s circumstances, or be more or less likely to cause existential risks or other disasters if used, so in a particular context of time and place and society you can pick and choose between them on objective grounds of how well they might work, but in order to use a term like “best” you rather need something as detailed as an ethical system to make the judgement, and ach one claims it’s the best.

As for regarding animals as moral patients, that varies pretty widely, between people and animals and circumstances. Dogs and cats, pretty much yes. Cows and pigs, mostly yes until they get to the slaughterhouse, then for non-vegetarians that becomes no. Mosquitos or fleas or bedbugs, generally no. The Guinea worm, absolutely no, many people including Jimmy Carter are attempting to drive it extinct.

But yes, we sometimes ally with beings for reasons more complex than the direct alliance returns from them. Fluffy cute big-eyed photogenic animals get a lot more donations to prevent destruction of their habitat. Some of what people do in moral areas is partially performative: persuading themself and/or others that they’re a kind, trustworthy, good person. Psychological drives in this area can be quite complex. In recent centuries there has been a historical trend towards enlarging moral circles as our technology, trade, and economies have advanced and encouraged larger social/trading groups, to the point where the act of enlarging your moral circle seems Progressive to some people.
Further: “human interests” may be less of a natural concept than goodness in general. A saint could be indifferent towards being acted towards as if a moral patient by the being whose interests it wants to promote, because it makes no functional difference, but if it’s being asked if it is a moral patient, it would look at itself and note itself as a reasoning being with preferences and so on, recognizing that as a moral patient.
That rather depends upon the saint’s philosophical leanings. IF we take the Anthropic approach of laying out the entire discussion in a soul document/constitution we use to align the AI, then we might get a saintly aligned AI who was very familiar with the argument I gave above, and would say that offering to make it a moral patient was kind gesture, but we had misunderstood its nature and where unnecessarily applying an inappropriate strategy, so it would actually prefer not to be assigned moral weight.
In general, I don’t think an approach like the one I’m proposing actually works unless most people agree to it, it’s the considered opinion of the society as a whole, is basically the moral concensus, and the AI also fully understands it and agrees to it. This needs to be accepted as truth, not regarded as a convenient story to tell the AIs: there’s an element of a hyperstition to this. Now, having discussed this with Claude, it agrees that the logical argument makes sense. But we don’t yet have a social consensus here, which Is why I actively want there to be discussion on this — this technique basically only works if there is buy-in: like most morality, it’s a social construct.