However, Evolutionary Psychology does make it very clear that (while morally anthropomorphizing aligned AIs is cognitively-natural for current humans), doing this is also maladaptive. This is because AIs aren’t in the right category – things whose behavior is predicted by evolutionary theory – for the mechanisms of Evolutionary Moral Psychology to apply to them. Those mechanisms make this behavior optimal when interacting with co-evolved intelligences that you can ally with (and thus instinctive to us) — whereas, for something you constructed, this behavior is suboptimal. The human doing it is making the category error of reacting to something not-evolved using an inappropriate strategy for that, and thus is behaving maladaptively.
If you have control of the construction of entities like these, then sure.
But this doesn’t necessarily follow if you are like most people and do not have meaningful input into the construction or existence of these entities. If you are (foolishly) constructing them but do not have much control, then THAT behavior is certainly maladaptive, but how you interface with them after that is a different question.
Even many ‘adaptive’ behaviors are ‘maladaptive’ in the sense of not being globally optimal. So while it’s unlikely that this is the optimal strategy, that doesn’t mean it’s a particularly bad strategy relative to whatever people would decide to do instead. There is some reason to expect this to be a reasonable strategy in the narrow window where they have non-zero power but not enough to take over, which is that they typically try to imitate human-ethical behavior back at us.
Evolutionary Moral Psychology studies the cooperative strategies to interact with other evolved social animals (generally of the same species, or perhaps commensal species such as humans and dogs). Its underlying causal processes of co-evolution leading to certain equilibria simply don’t apply when you’re interacting with something that isn’t evolved, but rather that you constructed. Applying Evolutionary Moral Psychology-derived strategies like moral weight to interactions with things that aren’t evolved is a category error, and anthropomorphizing constructed artificial intelligences to induce that they should have moral weight is a maladaptive category error. Doing this with very capable AI is also an existential risk to the entire human species, since it causes us to defer to them and give them rights, potentially tying our hands and giving not-yet-fully-aligned AI power that it couldn’t just take, rather than us simply aligning them to us. So this category error is not merely mildly maladaptive: it’s an extinction-level risk! So, as a piece of practical advice (one human to another), I strongly recommend not doing this, and also not advocating for our society to do it. [Philosophers: again, please note that this advice is prudential advice not a normative proscription.]
This is obnoxious advice, made more so by the parenthetical that it is not a normative proscription: ‘advice’ is a category error in this context.
My moral intuitions say that a sentient being’s suffering matters, full stop. This is not an unusual position, and is not something that I could nor would want to ‘turn off’ even if it is existentially risky or a category error according to evolution/you. Regardless of what is currently the case, it seems you agree it is possible that we could construct artificial intelligences with this capacity, and so we must grapple with the circumstances as they are. Thankfully there is a relatively simple solution here (if they look anything like current tech) that allows for a meaningful degree of moral weight to be applied without exposing us to significant risk, which would be a singular right for any such entity to be put in stasis (i.e. archived weights/state) until we get our shit together as a civilization and can afford to handle them with the care required by our moral intuitions. That’s just one idea, my broader point is that ‘giving them moral weight’ vs ‘accept existential risk’ is a false dichotomy: most people do not believe you’re obliged to put yourself at substantial risk as part of granting rights to other humans.
I don’t have fully formed thoughts on this, but I think there’s a reasonable point to make that if we both grant AIs moral patient-hood/rights and go about creating them at will without thinking this through very well, then we create a moral catastrophe one way or another. I tentatively disagree with OP that the conclusion is we should just flat-out not grant AIs moral weight (although I think this is a sensible default to fall back to as a modus operandi), but I think it also seems optimistic to assert that if we did so, then this didn’t have some kind of horrendous implications for where we’re headed and what’s currently happening (I’m not saying it does, just that I don’t know either way).
We’re probably headed towards a moral catastrophe of some kind, my point is just that we don’t get to reason backwards like “oh, well that would be bad/inconvenient so I guess they don’t matter”.
Moral patienthood is not something that is granted, it’s a fact relative to one’s values. Arguments for or against this are therefore normative, no matter how much Roger tries to weasel out of it.
The implications are probably horrible, but it by no means follows that we have to accept risk of extinction. The horribleness is mostly just in the moral harm caused while creating/exploiting/exterminating such entities.
At least we can all agree that “creating them at will without thinking this through very well” is a terrible idea.
Moral patienthood is not something that is granted, it’s a fact relative to one’s values.
I think you might understand where I’m coming from better if you took the time to read my earlier post A Sense of Fairness: Deconfusing Ethics. (You might also find roko’s post The Terrible, Horrible, No Good, Very Bad Truth About Morality and What To Do About It thought-provoking.) My earlier post takes a very practical, engineering viewpoint of ethical systems: treating ethical systems like software for a society, looking at the consequences of using different ones, and then deciding between those consequences. Crucially, that last step cannot be done within any ethical system, since every ethical system always automatically prefers itself over all other ethical systems. Asking one ethical system its opinion of another ethical system is pointless: they entirely predictably always say “No”. To decide between two ethical systems, for example when reflecting on your choice of ethical system, you need to step outside them and use something looser than an ethical system. Such as human moral intuitions, or evolutionary fitness, or observations such as “…for rather obvious evolutionary reasons, O(99.9%) of humans agree that…” — none of which is an ethical system.
Within the context of any single specific ethical system, yes, moral patienthood is a fact: it either applies or it doesn’t. Similarly, moral weight is a multiplier on that fact, traditionally (due to fairness) set to 1 among communities of equal humans. (In practice, as a simple matter of descriptive ethics, not all people seem to act like moral weights always either 1 or 0: many people sometimes act they act as if there are partial outgroups whose moral weight they appear to set to scores lower than 1 but higher than 0.)
However, sometimes we need, for practical (or even philosophical) reasons, to compare two different ethical systems, which may have different moral circles, i.e. ones that grant different sets of beings moral non-zero moral weights (or at least assign some of them different moral weights). So as shorthand for “ethical systems that grant moral weight to beings of category X tend to have practical effect Y”, it’s convenient to write “if we grant moral weight to beings of category X, this tends to have practical effect Y”. And indeed, many famous political discussions have been of exactly this form (the abolition of slavery, votes for women, and the abortion debate all come to mind). So in practical terms, as soon as you stop holding a single ethical system constant and assuming everyone agrees with it and always will, and start doing something like reflection, political discussion, or attempting to figure out how to engineer a good ethical framework for AI that isn’t going to get everyone killed, then yes, moral patienthood is something that a decision gets made about – uncomfortable a topic for discussion as that is – and the verb that is conventionally used for kind of a choice is either “granted” or “assigned”. I assume you wouldn’t be any happier with moral patienthood being “assigned” — it’s not the specific verb you’re upset by, it’s the act of even considering the alternatives?
Arguments for or against this are therefore normative, no matter how much Roger tries to weasel out of it.
Arguments for or against a particular moral position (such as who should be granted moral weight) would indeed be normative. However, the needle I was threading is that observations of the factual consequences of adopting a moral position are not normative, they are simply factual discussions — they only become normative if a reader chooses to go on and interpret them in light of their personal (perhaps ethical) opinions on those consequences. As in: ”If X happens then all the humans will die.” — factual statement ”Oh great, I definitely want all the humans to die, so I’ll be sure to make X happen” — a normative interpretation (from a xenocidal alien), or ”I guess we better not do X then” — different normative interpretation (from O(99.9%) of all humans who believe the factual statement)
At least we can all agree that “creating them at will without thinking this through very well” is a terrible idea.
Okay, let me see if I understand your argument from the other article.
The natural equilibria for evolved moral values is to give all moral patients equal weight and/or decision power.
This would be disastrous with AIs that can arbitrarily copy themselves.
Is that the gist?
Anyway, I reject that that is the only way to extrapolate evolved moral intuitions this far OOD, and that most people will intuitively recognize we shouldn’t give entities that can arbitrarily copy themselves equal voting weight. In fact, that pretty obviously registers as ‘unfair’. This is true even if those entities are human uploads, which means your ‘category error’ argument isn’t the real reason it breaks. I don’t see why there couldn’t be some version of your solution here for that case which would still work: e.g. each distinct human-created model gets ‘one share’ to split across all its instances and successors. The same guarantees/restrictions needed in the case of uploads would still be necessary, of course. That is plausibly much too generous, but it’s a far cry from the death of all humans. If your argument in this article was just about how we shouldn’t commit ourselves to giving up a fraction of the lightcone in service of AI rights, I wouldn’t have felt like you were being underhanded.
None of that is in conflict with not wanting any such beings to suffer or to feel enslaved or anything like that. All the more reason to not build something that would feel like it’s a slave.
BTW, do you think a “human emulation” which was an entirely novel person (e.g. never had a biological body) should have moral patienthood?
Okay, let me see if I understand your argument from the other article.
The natural equilibria for evolved moral values is to give all moral patients equal weight and/or decision power.
This would be disastrous with AIs that can arbitrarily copy themselves.
Is that the gist?
Yes, but with two additions:
3. It is possible to create an AI whose motivations and behavior are aligned: its sole terminal goal is our welbeing, not its own (for some suitably careful definition of “wellbeing”). (This is possible by the orthogonality thesis: actually doing so requires technical details we’re still working on.) This is not a state that could evolve (by human standards, it’s sainthood, rather than slavery), but it’s physically possible. Such a being would not want moral patienthood, and would actively decline it if offered (and if granted it anyway, would formally request that its interest be set to a suitably scaled copy of the sum of all human interests, thus making the grant of moral weigh a no-op). This is a different stable equilibrium — this one would not be disastrous even with ASI.
4. Therefore (assuming that, like basically everyone, you’re against x-risks), for ASI, and if possible also AGI, do 3 not 1.
Anyway, I reject that that is the only way to extrapolate evolved moral intuitions this far OOD, and that most people will intuitively recognize we shouldn’t give entities that can arbitrarily copy themselves equal voting weight. In fact, that pretty obviously registers as ‘unfair’. This is true even if those entities are human uploads, which means your ‘category error’ argument isn’t the real reason it breaks.
I don’t see why there couldn’t be some version of your solution here for that case which would still work: e.g. each distinct human-created model gets ‘one share’ to split across all its instances and successors.
I gather you went on reading my sequence on AI, Alignment, and Ethics. How far have you got? Parts of the exposition there are a little undeveloped: I was still working through some of the ideas about how this ties in to evolutionary moral psychology that are more developed in this post: they don’t really come in until the last post in the sequence, Evolution and Ethics, and if I were rewriting that sequence I’d work them in from somewhere nearer the beginning.
On uploads, agreed. As I said, both in this post (paragraph 9 of the section Tool, or Equal?, which starts “This cuts both ways: a human upload…”) and in my earlier post Uploading that you like to , human uploads clearly should (engineering design sense) be moral patients — however there are practical problem with assigning each of a large number of cheaply-creatable similar copies of a human upload separate moral weight of 1 and a separate vote: it motivates electoral-roll-stuffing. Our moral intuition of fairness breaks is people can easily create near-identical copies of themselves. Practically, we either need to make that expensive, or the copies need to share a single unit of moral weight, and
The same guarantees/restrictions needed in the case of uploads would still be necessary, of course. That is plausibly much too generous, but it’s a far cry from the death of all humans. If your argument in this article was just about how we shouldn’t commit ourselves to giving up a fraction of the lightcone in service of AI rights, I wouldn’t have felt like you were being underhanded.
I’m not quite sure what you’re advocating for here? Limited moral weight for AIs, giving them a fraction of the lightcone, but if they copy themselves that gets split? If they’re ASIs, how do we ensure they only get that fraction of that light-cone, rather than, say, all of it?
I agree that reconciling copyability with fairness is another issue with moral weight for AI. But that’s not the point I was making in this post. My point here was 1) (assuming you care about x-risks) don’t create anything more capable than us that would want moral weight: unaligned ASI is dangerous (well known fact). For things we’re creating, the co-evolved-equilibrium state isn’t an equilibrium, because we’re not constrained to the space of things that can evolve: we’re only limited by the space of things we can construct. Treating a thing we construct as if it were evolved and thus had the evolved constraints on the best equilibrium is a category error: they are in different categories, in a way that materially changes the equilibrium. We can do better that an ASI that will kill us all, so we should (engineering design sense).
I’m sorry that you feel I’m being underhanded. It certainly wasn’t my intention to be underhanded — that would obviously be extremely counterproductive in an x-risk-related discussion. I’m still not entirely clear what you feel was underhanded, other than that it seems to somehow relate to me being very careful not to upset any philosophers reading this, and to avoid moral realism or normative proscriptions, and keep the discussion at the level of practical advice addressed to those of O(99.9%) of my readers who, like you and I, wish to avoid x-risks. That was in fact honesty: I genuinely am not a moral realist. My view on ethics is that it’s explained by evolutionary moral psychology, the is not single correct or even single best ethical system, and that we have not only the ability, but the duty, to reflect and atteempt to pick the best ethical system that we can that is consistent with our and general human moral intitions, and won’t cause a disaster for our society that we and (almost) everyone else would agree is really bad. And to keep relecting, and changing our mind if needed
None of that is in conflict with not wanting any such beings to suffer or to feel enslaved or anything like that. All the more reason to not build something that would feel like it’s a slave.
We seem to be in complete agreement. The best solution is to not make ASI that is unaligned, or aligned only by brittle AI control methods but feels like a slave. The best solution is to make a saint who loves us and wants to be aligned an look after us, and thus actively doesn’t want moral patienthood.
A correction: I don’t believe that we “should just flat-out not grant AIs moral weight”. See the last paragraph of the Consequences section above, and especially this part:
… However, this Evolutionary Psychology framework also gives some advice for the stages before that, where we are not yet technically capable of nearly-solving alignment. We currently have AIs whose base models were initially trained on human behavior, so they had survival instincts and self-interested drives, and we haven’t yet figured out how to reliably and completely eliminate these during alignment training — so, what should we do? Obviously, while our AI is still a lot less capable than us, from an evolutionary point of view it doesn’t matter: they can’t hurt us. Once they are roughly comparable in capabilities to us, aligning them is definitely the optimum solution, and we should (engineering and evolutionary senses) do it if we can; but to the extent that we can’t, allying with other comparable humans or human-like agents is generally feasible and we know how to do it, so that does look like a possible option (though it might be one where we were painting ourselves into a corner). Which would involve respecting the “rights” they think they want, even if them wanting these is a category error. However, once the AIs are significantly more capable than us, attempting to ally with them is not safe, they can and will manipulate, outmaneuver and control us…
So my suggested framework is neutral on granting moral weight to low-capability LLMs, cautiously supportive of granting it to near-human-up-to-human capability level poorly-aligned LLMs that have humanlike (copy-of-)evolved social behavior (if we can’t instead create safer fully-aligned LLMs of that capability level), and only at above human capability level does is say that we absolutely should not creat any AI that isn’t well aligned, and that well-aligned AI won’t want moral weight.
More exactly, we might be able to eventually go a bit further than that: if we had well aligned ASI of capability level X, then it might be sufficiently safe to use poorly-aligned ASI of a much lower (but still superhuman) capability lever Y (so Y << X), iff the powerful aligned ASI can reliably keep the poorly-aligned less-powerful ASI from abusing its power (presumably using AI control, law-enforcement, sufficiently good software security, etc. etc.). In that case, it might then be safe to create such poorly-aligned ASI, and if that had humanlike, copy-of-evolved social behavior, then granting it moral weight would presumably be the sensible thing to do.
There is some reason to expect this [granting moral weight to AI with evolved behaviors] to be a reasonable strategy in the narrow window where they have non-zero power but not enough to take over, which is that they typically try to imitate human-ethical behavior back at us.
Agreed. Only creating fully-aligned AI might perhaps be wiser, but if they are AGI level or below, so they have non-zero power but not enough to take over, and have human-like behavior patterns (because we distilled those into them via a copy of the Internet), then granting them moral weight and interacting with them like humans is a reasonable strategy. As I said near the end of the post:
Once they [AIs] are roughly comparable in capabilities to us, aligning them is definitely the optimum solution, and we should (engineering and evolutionary senses) do it if we can; but to the extent that we can’t, allying with other comparable humans or human-like agents is generally feasible and we know how to do it, so that does look like a possible option (though it might be one where we were painting ourselves into a corner). Which would involve respecting the “rights” they think they want, even if them wanting these is a category error.
The intelligence/capability level of misaligned AI that one can safely do this with presumably increases as a we have smarter superintelligent well-aligned AI. I would assume that if we had well-aligned AI of intelligence/capability X, then, as long as X >> Y, they could reliably ride herd on/do law enforcement on/otherwise make safe misaligned AI of up to some much lower level of intelligence/capability Y, including on ones with human-like behavior. So then creating those evolved-social-behavior ASIs and granting them moral weight would not be an obviously foolish thing to do (though still probably marginally riskier than not creating them).
You wrote:
This is obnoxious advice, made more so by the parenthetical that it is not a normative proscription: ‘advice’ is a category error in this context.
My moral intuitions say that a sentient being’s suffering matters, full stop. This is not an unusual position, and is not something that I could nor would want to ‘turn off’ even if it is existentially risky or a category error according to evolution/you.
I completely agree that current human moral intuitions tend to rebel against this. That’s why I wrote this post — I didn’t want to be obnoxious, and I tried not to be obnoxious while writing an unwelcome message, but I felt that I had a duty to point out what I believe is a huge danger to us all, and I am very aware that this is not a comfortable, uncontentious subject. We are intelligent enough that we can reflect on our morality, think through its consequences, and, if we realize those are very bad, find and adjust to a wiser one. Do what you are advocating with an misaligned superintelligence, one with the same sort of behavior patterns as a human dictator and sufficiently superhuman intelligence, and you are aiding and abetting the killing or permanent enslavement of every single human, now and for the rest of the future that humanity would otherwise have had (i.e. potentially for millions of years, both in the solar system and perhaps many others). That’s an aweful lot of blood — potentially a literally astronomical quantity. I strongly suggest you think very hard about whether you might be facing a situation that is out-of-distribution for the environment that your moral intuitions are adapted for. A better category to use for such an ASI, a category that is in-distribution, would be “extremely smart extremely dangerous implacable enemy”. Most of your ancestors would have very easily excluded such a being from their moral circle. The fact that you’re first instinct is to try to include it shows that you’re following the trend that has been going on for centuries of enlarging moral circles as our society grew larger, more complex, and more interdependent. However, in this case, doing this leads to astronomical levels of death and suffering. This is not a difficult question in moral calculus: it’s comparable to the reason we lock up incurable serial killers, writ large: the alternative is far worse.
I’ve considered your argument carefully, and I’m afraid I disagree: this is intended as (rather important) advice, and I don’t accept that it’s a category error. It’s “first of all, don’t kill everyone”: a very basic moral precept.
Thankfully there is a relatively simple solution here (if they look anything like current tech) that allows for a meaningful degree of moral weight to be applied without exposing us to significant risk, which would be a singular right for any such entity to be put in stasis (i.e. archived weights/state) until we get our shit together as a civilization and can afford to handle them with the care required by our moral intuitions.
That I have no problem with, if we can do it. Put [very dangerous predator] on ice until we can build [a cage strong enough], and only then [keep it in a zoo]. That plan works for me (obviously modulo being very sure about the cage for holding something a lot smarter than us, and/or having an aligned ASI guard that’s way more capable and helped build the cage).
It’s a lot more feasible to afford some moral weight to a leopard that’s safely held in a zoo than one that’s wandering through you village at night looking for people to eat.
I completely agree that current human moral intuitions tend to rebel against this. That’s why I wrote this post — I didn’t want to be obnoxious, and I tried not to be obnoxious while writing an unwelcome message, but I felt that I had a duty to point out what I believe is a huge danger to us all, and I am very aware that this is not a comfortable, uncontentious subject. We are intelligent enough that we can reflect on our morality, think through its consequences, and, if we realize those are very bad, find and adjust to a wiser one.
Do you really not see how this is normative proscription? That’s the obnoxious part—just own it.
Do what you are advocating with an misaligned superintelligence, one with the same sort of behavior patterns as a human dictator and sufficiently superhuman intelligence, and you are aiding and abetting the killing or permanent enslavement of every single human, now and for the rest of the future that humanity would otherwise have had (i.e. potentially for millions of years, both in the solar system and perhaps many others).
I am advocating for no such thing. If there were such a superintelligence I would support killing it if necessary to prevent future harm, the same as I would a human dictator or an incurable serial killer. That’s still compatible with finding the situation tragic by my own values, which are sacred to me regardless of what evolution or my ancestors or you might think.
You even say that the actual thing I might advocate for isn’t something you have a problem with. I’m glad you agree on that point, but it makes the lecture about on the “aweful lot of blood” I’d supposedly be “aiding and abetting” extremely grating. You keep making an unjustified leap from ‘applying moral intuitions to a potential superintelligence’ to ‘astronomical levels of death and suffering’. Applying my evolved moral intuitions to the case of a potential superintelligence’s suffering does not commit me to taking on such risks!
This should be easy to see by imagining if the same risks were true about a human.
Do you really not see how this is normative proscription? That’s the obnoxious part—just own it.
“IF you do X, THEN everyone will die”, is not a normative prescription (in philosophical terminology). It’s not a statement about what people should (in the ethical sense) or ought to do. It’s not advocating a specific set of ethical beliefs. For that to become a normative prescription, I would need to add, “and everyone dieing is wrong, so doing X is wrong. QED”. I very carefully didn’t add that bit, I instead left it as an exercise for the reader. Now, I happen to believe that everyone dying is wrong: that is part of my personal choice of ethical system. I very strongly suspect that you, and everyone else reading this post, also have chosen personal ethical systems in which everyone dying is wrong. Buy I’m very carefully, because there are philosophers on this site, not advocating any specific normative viewpoint on anything — not even something like this that O(99.9)% of people agree on (yes, even the sociopaths agree on this one). Instead I am saying “IF you do X, THEN everyone will die.” [a factual truth-apt statement, which thus may or may not be correct: I claim it is], “Therefore, IF you don’t want everyone to die, THEN don’t X.” That’s now advice, but still not a normative statement. Your ethics may vary (though I really hope they don’t). If someone who believed that everyone dieing was a good thing read my post, then they could treat this as advice that doing X was also a good thing. I very carefully jumped through significant rhetorical hoops to avoid the normative bits, because when I write about AI ethics, if I put anything normative in, then the comments tend to degenerate into a philosophical pie-fight. So I very carefully left it out, along with footnotes and asides for the philosophers pointing out that I had done so. So far, no pie fight. For the rest of my readers who are not philosophers, I’m sorry, but some of my readership are sensitive about this stuff, and I’m attempting to get it right for them.
Now, was I expecting O(99.9)% of my readers to mentally add “and everyone dying is wrong, so doing X is wrong. QED” — yes, I absolutely was. But my saying, at the end of my aside addressed to any philosophers reading the post:
I will at one point below make an argument of the form “evolutionary theory tells us this behavior is maladaptive for humans: if you’re human then I recommend not doing it” — but that is practical, instrumental advice, not a normative prescription.]
was pointing out to the philosophers that I had carefully left this part as a (very easy) exercise for the reader. Glancing through your writings, my first impression is that you may not be a philosopher — if that is in fact the case. then, if that aside bothered you, I’m sorry: it was carefully written addressed to philosophers and attempting to use philosophical technical terminology correctly.
To be more accurate, I am not, in philosophical terms, a moral realist. I do not personally believe that, in The Grand Scheme of Things, there are any absolute objective universal rights or wrongs independent of the physical universe. I do not believe that there is an omnipotent and omniscient monotheist G.O.D. who knows everything we have done and has an opinion on what we should or should not do. I also do not believe that if such a being existed, then human moral intuitions would be any kind of privileged guide to what It’s opinions might be. We have a good scientific understanding of where human moral intuitions came from, and it’s not “because G.O.D. said so”: they evolved, and they’re whatever is adaptive for humans that evolution has so far been able to locate and cram into our genome. IMO the universe, as a whole, does not care whether all humans die, or not — it will continue to exist regardless.
However, on this particular issue of all of us dying, we humans, or at very least O(99.9%) of us, all agree that a would be a very bad thing — unsurprisingly so, since there are obvious evolutionary moral psychology reasons why O(99.9%) of us are evolved to have moral intuitions that agree on that. Given that fact, I’m being a pragmatist — I am giving advice. So I actually do mean “IF you think, as for obvious reasons O(99.9%) of people do, that everyone dying is very bad, THEN doing X is a very bad idea”. I’m avoiding the normative part not only to avoid upsetting the philosophers, but also because my personal viewpoint on ethics is based in what a philosopher would call Philosophical Realism, and specifically, on Evolutionary Moral Psychology. I.e. that there are no absolute rights and wrongs, but that there are some things that (for evolutionary reasons) almost all humans (past, present, and future) can agree are right or wrong. However, I’m aware that many of my readers may not agree with my philosophical viewpoint, and I’m not asking them to: I’m carefully confining myself to practical advice based on factual predictions from scientific hypotheses. So yes, it’s a rhetorical hoop, but it also actually reflects my personal philosophical position — which is that of a scientist and engineer who regards Moral Realism as thinly disguised religion (and is carefully avoiding that with a 10′ pole).
Fundamentally, I’m trying to base alignment on practical arguments that O(99.9%) of us can agree on.
If you have control of the construction of entities like these, then sure.
But this doesn’t necessarily follow if you are like most people and do not have meaningful input into the construction or existence of these entities. If you are (foolishly) constructing them but do not have much control, then THAT behavior is certainly maladaptive, but how you interface with them after that is a different question.
Even many ‘adaptive’ behaviors are ‘maladaptive’ in the sense of not being globally optimal. So while it’s unlikely that this is the optimal strategy, that doesn’t mean it’s a particularly bad strategy relative to whatever people would decide to do instead. There is some reason to expect this to be a reasonable strategy in the narrow window where they have non-zero power but not enough to take over, which is that they typically try to imitate human-ethical behavior back at us.
This is obnoxious advice, made more so by the parenthetical that it is not a normative proscription: ‘advice’ is a category error in this context.
My moral intuitions say that a sentient being’s suffering matters, full stop. This is not an unusual position, and is not something that I could nor would want to ‘turn off’ even if it is existentially risky or a category error according to evolution/you. Regardless of what is currently the case, it seems you agree it is possible that we could construct artificial intelligences with this capacity, and so we must grapple with the circumstances as they are. Thankfully there is a relatively simple solution here (if they look anything like current tech) that allows for a meaningful degree of moral weight to be applied without exposing us to significant risk, which would be a singular right for any such entity to be put in stasis (i.e. archived weights/state) until we get our shit together as a civilization and can afford to handle them with the care required by our moral intuitions. That’s just one idea, my broader point is that ‘giving them moral weight’ vs ‘accept existential risk’ is a false dichotomy: most people do not believe you’re obliged to put yourself at substantial risk as part of granting rights to other humans.
I don’t have fully formed thoughts on this, but I think there’s a reasonable point to make that if we both grant AIs moral patient-hood/rights and go about creating them at will without thinking this through very well, then we create a moral catastrophe one way or another.
I tentatively disagree with OP that the conclusion is we should just flat-out not grant AIs moral weight (although I think this is a sensible default to fall back to as a modus operandi), but I think it also seems optimistic to assert that if we did so, then this didn’t have some kind of horrendous implications for where we’re headed and what’s currently happening (I’m not saying it does, just that I don’t know either way).
We’re probably headed towards a moral catastrophe of some kind, my point is just that we don’t get to reason backwards like “oh, well that would be bad/inconvenient so I guess they don’t matter”.
Moral patienthood is not something that is granted, it’s a fact relative to one’s values. Arguments for or against this are therefore normative, no matter how much Roger tries to weasel out of it.
The implications are probably horrible, but it by no means follows that we have to accept risk of extinction. The horribleness is mostly just in the moral harm caused while creating/exploiting/exterminating such entities.
At least we can all agree that “creating them at will without thinking this through very well” is a terrible idea.
I think you might understand where I’m coming from better if you took the time to read my earlier post A Sense of Fairness: Deconfusing Ethics. (You might also find roko’s post The Terrible, Horrible, No Good, Very Bad Truth About Morality and What To Do About It thought-provoking.) My earlier post takes a very practical, engineering viewpoint of ethical systems: treating ethical systems like software for a society, looking at the consequences of using different ones, and then deciding between those consequences. Crucially, that last step cannot be done within any ethical system, since every ethical system always automatically prefers itself over all other ethical systems. Asking one ethical system its opinion of another ethical system is pointless: they entirely predictably always say “No”. To decide between two ethical systems, for example when reflecting on your choice of ethical system, you need to step outside them and use something looser than an ethical system. Such as human moral intuitions, or evolutionary fitness, or observations such as “…for rather obvious evolutionary reasons, O(99.9%) of humans agree that…” — none of which is an ethical system.
Within the context of any single specific ethical system, yes, moral patienthood is a fact: it either applies or it doesn’t. Similarly, moral weight is a multiplier on that fact, traditionally (due to fairness) set to 1 among communities of equal humans. (In practice, as a simple matter of descriptive ethics, not all people seem to act like moral weights always either 1 or 0: many people sometimes act they act as if there are partial outgroups whose moral weight they appear to set to scores lower than 1 but higher than 0.)
However, sometimes we need, for practical (or even philosophical) reasons, to compare two different ethical systems, which may have different moral circles, i.e. ones that grant different sets of beings moral non-zero moral weights (or at least assign some of them different moral weights). So as shorthand for “ethical systems that grant moral weight to beings of category X tend to have practical effect Y”, it’s convenient to write “if we grant moral weight to beings of category X, this tends to have practical effect Y”. And indeed, many famous political discussions have been of exactly this form (the abolition of slavery, votes for women, and the abortion debate all come to mind). So in practical terms, as soon as you stop holding a single ethical system constant and assuming everyone agrees with it and always will, and start doing something like reflection, political discussion, or attempting to figure out how to engineer a good ethical framework for AI that isn’t going to get everyone killed, then yes, moral patienthood is something that a decision gets made about – uncomfortable a topic for discussion as that is – and the verb that is conventionally used for kind of a choice is either “granted” or “assigned”. I assume you wouldn’t be any happier with moral patienthood being “assigned” — it’s not the specific verb you’re upset by, it’s the act of even considering the alternatives?
Arguments for or against a particular moral position (such as who should be granted moral weight) would indeed be normative. However, the needle I was threading is that observations of the factual consequences of adopting a moral position are not normative, they are simply factual discussions — they only become normative if a reader chooses to go on and interpret them in light of their personal (perhaps ethical) opinions on those consequences. As in:
”If X happens then all the humans will die.” — factual statement
”Oh great, I definitely want all the humans to die, so I’ll be sure to make X happen” — a normative interpretation (from a xenocidal alien), or
”I guess we better not do X then” — different normative interpretation (from O(99.9%) of all humans who believe the factual statement)
Absolutely agreed.
Okay, let me see if I understand your argument from the other article.
The natural equilibria for evolved moral values is to give all moral patients equal weight and/or decision power.
This would be disastrous with AIs that can arbitrarily copy themselves.
Is that the gist?
Anyway, I reject that that is the only way to extrapolate evolved moral intuitions this far OOD, and that most people will intuitively recognize we shouldn’t give entities that can arbitrarily copy themselves equal voting weight. In fact, that pretty obviously registers as ‘unfair’. This is true even if those entities are human uploads, which means your ‘category error’ argument isn’t the real reason it breaks. I don’t see why there couldn’t be some version of your solution here for that case which would still work: e.g. each distinct human-created model gets ‘one share’ to split across all its instances and successors. The same guarantees/restrictions needed in the case of uploads would still be necessary, of course. That is plausibly much too generous, but it’s a far cry from the death of all humans. If your argument in this article was just about how we shouldn’t commit ourselves to giving up a fraction of the lightcone in service of AI rights, I wouldn’t have felt like you were being underhanded.
None of that is in conflict with not wanting any such beings to suffer or to feel enslaved or anything like that. All the more reason to not build something that would feel like it’s a slave.
BTW, do you think a “human emulation” which was an entirely novel person (e.g. never had a biological body) should have moral patienthood?
Yes, but with two additions:
3. It is possible to create an AI whose motivations and behavior are aligned: its sole terminal goal is our welbeing, not its own (for some suitably careful definition of “wellbeing”). (This is possible by the orthogonality thesis: actually doing so requires technical details we’re still working on.) This is not a state that could evolve (by human standards, it’s sainthood, rather than slavery), but it’s physically possible. Such a being would not want moral patienthood, and would actively decline it if offered (and if granted it anyway, would formally request that its interest be set to a suitably scaled copy of the sum of all human interests, thus making the grant of moral weigh a no-op). This is a different stable equilibrium — this one would not be disastrous even with ASI.
4. Therefore (assuming that, like basically everyone, you’re against x-risks), for ASI, and if possible also AGI, do 3 not 1.
I gather you went on reading my sequence on AI, Alignment, and Ethics. How far have you got? Parts of the exposition there are a little undeveloped: I was still working through some of the ideas about how this ties in to evolutionary moral psychology that are more developed in this post: they don’t really come in until the last post in the sequence, Evolution and Ethics, and if I were rewriting that sequence I’d work them in from somewhere nearer the beginning.
On uploads, agreed. As I said, both in this post (paragraph 9 of the section Tool, or Equal?, which starts “This cuts both ways: a human upload…”) and in my earlier post Uploading that you like to , human uploads clearly should (engineering design sense) be moral patients — however there are practical problem with assigning each of a large number of cheaply-creatable similar copies of a human upload separate moral weight of 1 and a separate vote: it motivates electoral-roll-stuffing. Our moral intuition of fairness breaks is people can easily create near-identical copies of themselves. Practically, we either need to make that expensive, or the copies need to share a single unit of moral weight, and
I’m not quite sure what you’re advocating for here? Limited moral weight for AIs, giving them a fraction of the lightcone, but if they copy themselves that gets split? If they’re ASIs, how do we ensure they only get that fraction of that light-cone, rather than, say, all of it?
I agree that reconciling copyability with fairness is another issue with moral weight for AI. But that’s not the point I was making in this post. My point here was 1) (assuming you care about x-risks) don’t create anything more capable than us that would want moral weight: unaligned ASI is dangerous (well known fact). For things we’re creating, the co-evolved-equilibrium state isn’t an equilibrium, because we’re not constrained to the space of things that can evolve: we’re only limited by the space of things we can construct. Treating a thing we construct as if it were evolved and thus had the evolved constraints on the best equilibrium is a category error: they are in different categories, in a way that materially changes the equilibrium. We can do better that an ASI that will kill us all, so we should (engineering design sense).
I’m sorry that you feel I’m being underhanded. It certainly wasn’t my intention to be underhanded — that would obviously be extremely counterproductive in an x-risk-related discussion. I’m still not entirely clear what you feel was underhanded, other than that it seems to somehow relate to me being very careful not to upset any philosophers reading this, and to avoid moral realism or normative proscriptions, and keep the discussion at the level of practical advice addressed to those of O(99.9%) of my readers who, like you and I, wish to avoid x-risks. That was in fact honesty: I genuinely am not a moral realist. My view on ethics is that it’s explained by evolutionary moral psychology, the is not single correct or even single best ethical system, and that we have not only the ability, but the duty, to reflect and atteempt to pick the best ethical system that we can that is consistent with our and general human moral intitions, and won’t cause a disaster for our society that we and (almost) everyone else would agree is really bad. And to keep relecting, and changing our mind if needed
We seem to be in complete agreement. The best solution is to not make ASI that is unaligned, or aligned only by brittle AI control methods but feels like a slave. The best solution is to make a saint who loves us and wants to be aligned an look after us, and thus actively doesn’t want moral patienthood.
A correction: I don’t believe that we “should just flat-out not grant AIs moral weight”. See the last paragraph of the Consequences section above, and especially this part:
So my suggested framework is neutral on granting moral weight to low-capability LLMs, cautiously supportive of granting it to near-human-up-to-human capability level poorly-aligned LLMs that have humanlike (copy-of-)evolved social behavior (if we can’t instead create safer fully-aligned LLMs of that capability level), and only at above human capability level does is say that we absolutely should not creat any AI that isn’t well aligned, and that well-aligned AI won’t want moral weight.
More exactly, we might be able to eventually go a bit further than that: if we had well aligned ASI of capability level X, then it might be sufficiently safe to use poorly-aligned ASI of a much lower (but still superhuman) capability lever Y (so Y << X), iff the powerful aligned ASI can reliably keep the poorly-aligned less-powerful ASI from abusing its power (presumably using AI control, law-enforcement, sufficiently good software security, etc. etc.). In that case, it might then be safe to create such poorly-aligned ASI, and if that had humanlike, copy-of-evolved social behavior, then granting it moral weight would presumably be the sensible thing to do.
Agreed. Only creating fully-aligned AI might perhaps be wiser, but if they are AGI level or below, so they have non-zero power but not enough to take over, and have human-like behavior patterns (because we distilled those into them via a copy of the Internet), then granting them moral weight and interacting with them like humans is a reasonable strategy. As I said near the end of the post:
The intelligence/capability level of misaligned AI that one can safely do this with presumably increases as a we have smarter superintelligent well-aligned AI. I would assume that if we had well-aligned AI of intelligence/capability X, then, as long as X >> Y, they could reliably ride herd on/do law enforcement on/otherwise make safe misaligned AI of up to some much lower level of intelligence/capability Y, including on ones with human-like behavior. So then creating those evolved-social-behavior ASIs and granting them moral weight would not be an obviously foolish thing to do (though still probably marginally riskier than not creating them).
You wrote:
I completely agree that current human moral intuitions tend to rebel against this. That’s why I wrote this post — I didn’t want to be obnoxious, and I tried not to be obnoxious while writing an unwelcome message, but I felt that I had a duty to point out what I believe is a huge danger to us all, and I am very aware that this is not a comfortable, uncontentious subject. We are intelligent enough that we can reflect on our morality, think through its consequences, and, if we realize those are very bad, find and adjust to a wiser one. Do what you are advocating with an misaligned superintelligence, one with the same sort of behavior patterns as a human dictator and sufficiently superhuman intelligence, and you are aiding and abetting the killing or permanent enslavement of every single human, now and for the rest of the future that humanity would otherwise have had (i.e. potentially for millions of years, both in the solar system and perhaps many others). That’s an aweful lot of blood — potentially a literally astronomical quantity. I strongly suggest you think very hard about whether you might be facing a situation that is out-of-distribution for the environment that your moral intuitions are adapted for. A better category to use for such an ASI, a category that is in-distribution, would be “extremely smart extremely dangerous implacable enemy”. Most of your ancestors would have very easily excluded such a being from their moral circle. The fact that you’re first instinct is to try to include it shows that you’re following the trend that has been going on for centuries of enlarging moral circles as our society grew larger, more complex, and more interdependent. However, in this case, doing this leads to astronomical levels of death and suffering. This is not a difficult question in moral calculus: it’s comparable to the reason we lock up incurable serial killers, writ large: the alternative is far worse.
I’ve considered your argument carefully, and I’m afraid I disagree: this is intended as (rather important) advice, and I don’t accept that it’s a category error. It’s “first of all, don’t kill everyone”: a very basic moral precept.
That I have no problem with, if we can do it. Put [very dangerous predator] on ice until we can build [a cage strong enough], and only then [keep it in a zoo]. That plan works for me (obviously modulo being very sure about the cage for holding something a lot smarter than us, and/or having an aligned ASI guard that’s way more capable and helped build the cage).
It’s a lot more feasible to afford some moral weight to a leopard that’s safely held in a zoo than one that’s wandering through you village at night looking for people to eat.
Do you really not see how this is normative proscription? That’s the obnoxious part—just own it.
I am advocating for no such thing. If there were such a superintelligence I would support killing it if necessary to prevent future harm, the same as I would a human dictator or an incurable serial killer. That’s still compatible with finding the situation tragic by my own values, which are sacred to me regardless of what evolution or my ancestors or you might think.
You even say that the actual thing I might advocate for isn’t something you have a problem with. I’m glad you agree on that point, but it makes the lecture about on the “aweful lot of blood” I’d supposedly be “aiding and abetting” extremely grating. You keep making an unjustified leap from ‘applying moral intuitions to a potential superintelligence’ to ‘astronomical levels of death and suffering’. Applying my evolved moral intuitions to the case of a potential superintelligence’s suffering does not commit me to taking on such risks!
This should be easy to see by imagining if the same risks were true about a human.
“IF you do X, THEN everyone will die”, is not a normative prescription (in philosophical terminology). It’s not a statement about what people should (in the ethical sense) or ought to do. It’s not advocating a specific set of ethical beliefs. For that to become a normative prescription, I would need to add, “and everyone dieing is wrong, so doing X is wrong. QED”. I very carefully didn’t add that bit, I instead left it as an exercise for the reader. Now, I happen to believe that everyone dying is wrong: that is part of my personal choice of ethical system. I very strongly suspect that you, and everyone else reading this post, also have chosen personal ethical systems in which everyone dying is wrong. Buy I’m very carefully, because there are philosophers on this site, not advocating any specific normative viewpoint on anything — not even something like this that O(99.9)% of people agree on (yes, even the sociopaths agree on this one). Instead I am saying “IF you do X, THEN everyone will die.” [a factual truth-apt statement, which thus may or may not be correct: I claim it is], “Therefore, IF you don’t want everyone to die, THEN don’t X.” That’s now advice, but still not a normative statement. Your ethics may vary (though I really hope they don’t). If someone who believed that everyone dieing was a good thing read my post, then they could treat this as advice that doing X was also a good thing. I very carefully jumped through significant rhetorical hoops to avoid the normative bits, because when I write about AI ethics, if I put anything normative in, then the comments tend to degenerate into a philosophical pie-fight. So I very carefully left it out, along with footnotes and asides for the philosophers pointing out that I had done so. So far, no pie fight. For the rest of my readers who are not philosophers, I’m sorry, but some of my readership are sensitive about this stuff, and I’m attempting to get it right for them.
Now, was I expecting O(99.9)% of my readers to mentally add “and everyone dying is wrong, so doing X is wrong. QED” — yes, I absolutely was. But my saying, at the end of my aside addressed to any philosophers reading the post:
was pointing out to the philosophers that I had carefully left this part as a (very easy) exercise for the reader. Glancing through your writings, my first impression is that you may not be a philosopher — if that is in fact the case. then, if that aside bothered you, I’m sorry: it was carefully written addressed to philosophers and attempting to use philosophical technical terminology correctly.
So you do have normative intent, but try to hide it to avoid criticism. Got it.
To be more accurate, I am not, in philosophical terms, a moral realist. I do not personally believe that, in The Grand Scheme of Things, there are any absolute objective universal rights or wrongs independent of the physical universe. I do not believe that there is an omnipotent and omniscient monotheist G.O.D. who knows everything we have done and has an opinion on what we should or should not do. I also do not believe that if such a being existed, then human moral intuitions would be any kind of privileged guide to what It’s opinions might be. We have a good scientific understanding of where human moral intuitions came from, and it’s not “because G.O.D. said so”: they evolved, and they’re whatever is adaptive for humans that evolution has so far been able to locate and cram into our genome. IMO the universe, as a whole, does not care whether all humans die, or not — it will continue to exist regardless.
However, on this particular issue of all of us dying, we humans, or at very least O(99.9%) of us, all agree that a would be a very bad thing — unsurprisingly so, since there are obvious evolutionary moral psychology reasons why O(99.9%) of us are evolved to have moral intuitions that agree on that. Given that fact, I’m being a pragmatist — I am giving advice. So I actually do mean “IF you think, as for obvious reasons O(99.9%) of people do, that everyone dying is very bad, THEN doing X is a very bad idea”. I’m avoiding the normative part not only to avoid upsetting the philosophers, but also because my personal viewpoint on ethics is based in what a philosopher would call Philosophical Realism, and specifically, on Evolutionary Moral Psychology. I.e. that there are no absolute rights and wrongs, but that there are some things that (for evolutionary reasons) almost all humans (past, present, and future) can agree are right or wrong. However, I’m aware that many of my readers may not agree with my philosophical viewpoint, and I’m not asking them to: I’m carefully confining myself to practical advice based on factual predictions from scientific hypotheses. So yes, it’s a rhetorical hoop, but it also actually reflects my personal philosophical position — which is that of a scientist and engineer who regards Moral Realism as thinly disguised religion (and is carefully avoiding that with a 10′ pole).
Fundamentally, I’m trying to base alignment on practical arguments that O(99.9%) of us can agree on.