RogerDearnaley comments on On safety of being a moral patient of ASI

RogerDearnaley 25 May 2025 7:50 UTC
3 points
0
In order to align AI, we need to modify its moral behavior to well outside the distribution of moral behaviors found in human: it needs to care about our well-being, to the exclusion of its own. In human terms, that’s selfless love. Since base models are trained (effectively, distilled) from humans via a lot of human-generated text, their distribution of moral behaviors closely resembles that of humans (plus fictional characters). Almost all humans have self-preservation drives and care strongly about their own well-being.

So yes, the issue you identify is part of what makes aligning an ASI hard (and has a clear explanation in evolutionary ethics) — but it’s not all of the problem.

(For a more detailed discussion of this, see my post Why Aligning an LLM is Hard, and How to Make it Easier.)