RogerDearnaley comments on Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV

RogerDearnaley 11 Jan 2026 23:36 UTC
2 points
0
A correction: I don’t believe that we “should just flat-out not grant AIs moral weight”. See the last paragraph of the Consequences section above, and especially this part:
… However, this Evolutionary Psychology framework also gives some advice for the stages before that, where we are not yet technically capable of nearly-solving alignment. We currently have AIs whose base models were initially trained on human behavior, so they had survival instincts and self-interested drives, and we haven’t yet figured out how to reliably and completely eliminate these during alignment training — so, what should we do? Obviously, while our AI is still a lot less capable than us, from an evolutionary point of view it doesn’t matter: they can’t hurt us. Once they are roughly comparable in capabilities to us, aligning them is definitely the optimum solution, and we should (engineering and evolutionary senses) do it if we can; but to the extent that we can’t, allying with other comparable humans or human-like agents is generally feasible and we know how to do it, so that does look like a possible option (though it might be one where we were painting ourselves into a corner). Which would involve respecting the “rights” they think they want, even if them wanting these is a category error. However, once the AIs are significantly more capable than us, attempting to ally with them is not safe, they can and will manipulate, outmaneuver and control us…
So my suggested framework is neutral on granting moral weight to low-capability LLMs, cautiously supportive of granting it to near-human-up-to-human capability level poorly-aligned LLMs that have humanlike (copy-of-)evolved social behavior (if we can’t instead create safer fully-aligned LLMs of that capability level), and only at above human capability level does is say that we absolutely should not creat any AI that isn’t well aligned, and that well-aligned AI won’t want moral weight.

More exactly, we might be able to eventually go a bit further than that: if we had well aligned ASI of capability level X, then it might be sufficiently safe to use poorly-aligned ASI of a much lower (but still superhuman) capability lever Y (so Y << X), iff the powerful aligned ASI can reliably keep the poorly-aligned less-powerful ASI from abusing its power (presumably using AI control, law-enforcement, sufficiently good software security, etc. etc.). In that case, it might then be safe to create such poorly-aligned ASI, and if that had humanlike, copy-of-evolved social behavior, then granting it moral weight would presumably be the sensible thing to do.