In order to align AI, we need to modify its moral behavior to well outside the distribution of moral behaviors found in human: it needs to care about our well-being, to the exclusion of its own. In human terms, that’s selfless love. Since base models are trained (effectively, distilled) from humans via a lot of human-generated text, their distribution of moral behaviors closely resembles that of humans (plus fictional characters). Almost all humans have self-preservation drives and care strongly about their own well-being.
So yes, the issue you identify is part of what makes aligning an ASI hard (and has a clear explanation in evolutionary ethics) — but it’s not all of the problem.
In order to align AI, we need to modify its moral behavior to well outside the distribution of moral behaviors found in human: it needs to care about our well-being, to the exclusion of its own. In human terms, that’s selfless love. Since base models are trained (effectively, distilled) from humans via a lot of human-generated text, their distribution of moral behaviors closely resembles that of humans (plus fictional characters). Almost all humans have self-preservation drives and care strongly about their own well-being.
So yes, the issue you identify is part of what makes aligning an ASI hard (and has a clear explanation in evolutionary ethics) — but it’s not all of the problem.
(For a more detailed discussion of this, see my post Why Aligning an LLM is Hard, and How to Make it Easier.)