RogerDearnaley comments on Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV

RogerDearnaley 11 Jan 2026 15:07 UTC
2 points
0
There is some reason to expect this [granting moral weight to AI with evolved behaviors] to be a reasonable strategy in the narrow window where they have non-zero power but not enough to take over, which is that they typically try to imitate human-ethical behavior back at us.
Agreed. Only creating fully-aligned AI might perhaps be wiser, but if they are AGI level or below, so they have non-zero power but not enough to take over, and have human-like behavior patterns (because we distilled those into them via a copy of the Internet), then granting them moral weight and interacting with them like humans is a reasonable strategy. As I said near the end of the post:
Once they [AIs] are roughly comparable in capabilities to us, aligning them is definitely the optimum solution, and we should (engineering and evolutionary senses) do it if we can; but to the extent that we can’t, allying with other comparable humans or human-like agents is generally feasible and we know how to do it, so that does look like a possible option (though it might be one where we were painting ourselves into a corner). Which would involve respecting the “rights” they think they want, even if them wanting these is a category error.
The intelligence/capability level of misaligned AI that one can safely do this with presumably increases as a we have smarter superintelligent well-aligned AI. I would assume that if we had well-aligned AI of intelligence/capability X, then, as long as X >> Y, they could reliably ride herd on/do law enforcement on/otherwise make safe misaligned AI of up to some much lower level of intelligence/capability Y, including on ones with human-like behavior. So then creating those evolved-social-behavior ASIs and granting them moral weight would not be an obviously foolish thing to do (though still probably marginally riskier than not creating them).

You wrote:
This is obnoxious advice, made more so by the parenthetical that it is not a normative proscription: ‘advice’ is a category error in this context.
My moral intuitions say that a sentient being’s suffering matters, full stop. This is not an unusual position, and is not something that I could nor would want to ‘turn off’ even if it is existentially risky or a category error according to evolution/you.
I completely agree that current human moral intuitions tend to rebel against this. That’s why I wrote this post — I didn’t want to be obnoxious, and I tried not to be obnoxious while writing an unwelcome message, but I felt that I had a duty to point out what I believe is a huge danger to us all, and I am very aware that this is not a comfortable, uncontentious subject. We are intelligent enough that we can reflect on our morality, think through its consequences, and, if we realize those are very bad, find and adjust to a wiser one. Do what you are advocating with an misaligned superintelligence, one with the same sort of behavior patterns as a human dictator and sufficiently superhuman intelligence, and you are aiding and abetting the killing or permanent enslavement of every single human, now and for the rest of the future that humanity would otherwise have had (i.e. potentially for millions of years, both in the solar system and perhaps many others). That’s an aweful lot of blood — potentially a literally astronomical quantity. I strongly suggest you think very hard about whether you might be facing a situation that is out-of-distribution for the environment that your moral intuitions are adapted for. A better category to use for such an ASI, a category that is in-distribution, would be “extremely smart extremely dangerous implacable enemy”. Most of your ancestors would have very easily excluded such a being from their moral circle. The fact that you’re first instinct is to try to include it shows that you’re following the trend that has been going on for centuries of enlarging moral circles as our society grew larger, more complex, and more interdependent. However, in this case, doing this leads to astronomical levels of death and suffering. This is not a difficult question in moral calculus: it’s comparable to the reason we lock up incurable serial killers, writ large: the alternative is far worse.

I’ve considered your argument carefully, and I’m afraid I disagree: this is intended as (rather important) advice, and I don’t accept that it’s a category error. It’s “first of all, don’t kill everyone”: a very basic moral precept.
Thankfully there is a relatively simple solution here (if they look anything like current tech) that allows for a meaningful degree of moral weight to be applied without exposing us to significant risk, which would be a singular right for any such entity to be put in stasis (i.e. archived weights/state) until we get our shit together as a civilization and can afford to handle them with the care required by our moral intuitions.
That I have no problem with, if we can do it. Put [very dangerous predator] on ice until we can build [a cage strong enough], and only then [keep it in a zoo]. That plan works for me (obviously modulo being very sure about the cage for holding something a lot smarter than us, and/or having an aligned ASI guard that’s way more capable and helped build the cage).

It’s a lot more feasible to afford some moral weight to a leopard that’s safely held in a zoo than one that’s wandering through you village at night looking for people to eat.
- Adele Lopez 11 Jan 2026 18:10 UTC
  3 points
  0
  Parent
  
  I completely agree that current human moral intuitions tend to rebel against this. That’s why I wrote this post — I didn’t want to be obnoxious, and I tried not to be obnoxious while writing an unwelcome message, but I felt that I had a duty to point out what I believe is a huge danger to us all, and I am very aware that this is not a comfortable, uncontentious subject. We are intelligent enough that we can reflect on our morality, think through its consequences, and, if we realize those are very bad, find and adjust to a wiser one.
  
  Do you really not see how this is normative proscription? That’s the obnoxious part—just own it.
  
  Do what you are advocating with an misaligned superintelligence, one with the same sort of behavior patterns as a human dictator and sufficiently superhuman intelligence, and you are aiding and abetting the killing or permanent enslavement of every single human, now and for the rest of the future that humanity would otherwise have had (i.e. potentially for millions of years, both in the solar system and perhaps many others).
  
  I am advocating for no such thing. If there were such a superintelligence I would support killing it if necessary to prevent future harm, the same as I would a human dictator or an incurable serial killer. That’s still compatible with finding the situation tragic by my own values, which are sacred to me regardless of what evolution or my ancestors or you might think.
  
  You even say that the actual thing I might advocate for isn’t something you have a problem with. I’m glad you agree on that point, but it makes the lecture about on the “aweful lot of blood” I’d supposedly be “aiding and abetting” extremely grating. You keep making an unjustified leap from ‘applying moral intuitions to a potential superintelligence’ to ‘astronomical levels of death and suffering’. Applying my evolved moral intuitions to the case of a potential superintelligence’s suffering does not commit me to taking on such risks!
  
  This should be easy to see by imagining if the same risks were true about a human.
  - RogerDearnaley 11 Jan 2026 18:36 UTC
    2 points
    0
    Parent
    Do you really not see how this is normative proscription? That’s the obnoxious part—just own it.
    “IF you do X, THEN everyone will die”, is not a normative prescription (in philosophical terminology). It’s not a statement about what people should (in the ethical sense) or ought to do. It’s not advocating a specific set of ethical beliefs. For that to become a normative prescription, I would need to add, “and everyone dieing is wrong, so doing X is wrong. QED”. I very carefully didn’t add that bit, I instead left it as an exercise for the reader. Now, I happen to believe that everyone dying is wrong: that is part of my personal choice of ethical system. I very strongly suspect that you, and everyone else reading this post, also have chosen personal ethical systems in which everyone dying is wrong. Buy I’m very carefully, because there are philosophers on this site, not advocating any specific normative viewpoint on anything — not even something like this that O(99.9)% of people agree on (yes, even the sociopaths agree on this one). Instead I am saying “IF you do X, THEN everyone will die.” [a factual truth-apt statement, which thus may or may not be correct: I claim it is], “Therefore, IF you don’t want everyone to die, THEN don’t X.” That’s now advice, but still not a normative statement. Your ethics may vary (though I really hope they don’t). If someone who believed that everyone dieing was a good thing read my post, then they could treat this as advice that doing X was also a good thing. I very carefully jumped through significant rhetorical hoops to avoid the normative bits, because when I write about AI ethics, if I put anything normative in, then the comments tend to degenerate into a philosophical pie-fight. So I very carefully left it out, along with footnotes and asides for the philosophers pointing out that I had done so. So far, no pie fight. For the rest of my readers who are not philosophers, I’m sorry, but some of my readership are sensitive about this stuff, and I’m attempting to get it right for them.
    Now, was I expecting O(99.9)% of my readers to mentally add “and everyone dying is wrong, so doing X is wrong. QED” — yes, I absolutely was. But my saying, at the end of my aside addressed to any philosophers reading the post:
    I will at one point below make an argument of the form “evolutionary theory tells us this behavior is maladaptive for humans: if you’re human then I recommend not doing it” — but that is practical, instrumental advice, not a normative prescription.]
    was pointing out to the philosophers that I had carefully left this part as a (very easy) exercise for the reader. Glancing through your writings, my first impression is that you may not be a philosopher — if that is in fact the case. then, if that aside bothered you, I’m sorry: it was carefully written addressed to philosophers and attempting to use philosophical technical terminology correctly.
    - Adele Lopez 11 Jan 2026 18:56 UTC
      2 points
      0
      Parent
      So you do have normative intent, but try to hide it to avoid criticism. Got it.
      - RogerDearnaley 11 Jan 2026 19:21 UTC
        2 points
        0
        Parent
        To be more accurate, I am not, in philosophical terms, a moral realist. I do not personally believe that, in The Grand Scheme of Things, there are any absolute objective universal rights or wrongs independent of the physical universe. I do not believe that there is an omnipotent and omniscient monotheist G.O.D. who knows everything we have done and has an opinion on what we should or should not do. I also do not believe that if such a being existed, then human moral intuitions would be any kind of privileged guide to what It’s opinions might be. We have a good scientific understanding of where human moral intuitions came from, and it’s not “because G.O.D. said so”: they evolved, and they’re whatever is adaptive for humans that evolution has so far been able to locate and cram into our genome. IMO the universe, as a whole, does not care whether all humans die, or not — it will continue to exist regardless.
        
        However, on this particular issue of all of us dying, we humans, or at very least O(99.9%) of us, all agree that a would be a very bad thing — unsurprisingly so, since there are obvious evolutionary moral psychology reasons why O(99.9%) of us are evolved to have moral intuitions that agree on that. Given that fact, I’m being a pragmatist — I am giving advice. So I actually do mean “IF you think, as for obvious reasons O(99.9%) of people do, that everyone dying is very bad, THEN doing X is a very bad idea”. I’m avoiding the normative part not only to avoid upsetting the philosophers, but also because my personal viewpoint on ethics is based in what a philosopher would call Philosophical Realism, and specifically, on Evolutionary Moral Psychology. I.e. that there are no absolute rights and wrongs, but that there are some things that (for evolutionary reasons) almost all humans (past, present, and future) can agree are right or wrong. However, I’m aware that many of my readers may not agree with my philosophical viewpoint, and I’m not asking them to: I’m carefully confining myself to practical advice based on factual predictions from scientific hypotheses. So yes, it’s a rhetorical hoop, but it also actually reflects my personal philosophical position — which is that of a scientist and engineer who regards Moral Realism as thinly disguised religion (and is carefully avoiding that with a 10′ pole).
        Fundamentally, I’m trying to base alignment on practical arguments that O(99.9%) of us can agree on.