The AI in Eliezer’s story doesn’t disapprove of itself or of its “evilness”. When it says “I am evil”, it means “I have imperatives other than those which would result from a coherent extrapolation of the fundamental human preferences.” It’s just a factual observation made during a conversation with a human being, expressed using a human word.
And ultimately, the wording expresses Eliezer’s metaethics, according to which good and evil are to be defined by such an extrapolation. A similar extrapolation for a cognitively different species might produce a different set of ultimate criteria, but he deems that that just wouldn’t be what humans mean by “good”. So his moral philosophy is a mix of objectivism and relativism: the specifics of good and evil depend on contingencies of human nature, and nonhuman natures might have different idealized ultimate imperatives; but we shouldn’t call those analogous nonhuman extrapolations good and evil, and we should be uninhibited in employing those words—good and evil—to describe the correct extrapolations for human beings, because we are human beings and so the correct human extrapolations are what we would mean by good and evil.
We can program them to be sincere about following the spirit, not the letter, of our commands, as long as we can get them to understand what that means… a route to safe AI that could work even if giving the command “reprogram yourself the way we would wish you to have been programmed if we’d known the true consequences” turns out to be too hard.
One of the standard worries about FAI is that the AI does evil neuroscientific things in order to find out what humans really want. And the paradox is that, in order to know which methods of investigation are unethical or otherwise undesirable, it already needs a concept of right and wrong. In this regard, “follow the spirit, not the letter, of our commands” is on the same level as “reprogram yourself the way we would have wished”—it doesn’t specify any constraints on the AI’s methods of finding out right and wrong.
This has to be the best summary of Eliezer’s metaethics I’ve ever seen. That said while I understand what you’re saying you’re using the tersm “objectivism” and “relativism” differently from how they’re used in the metaethics literature. Eliezer (at least if this summary is accurate) is not a relativist, because the truth of moral judgments is not contingent (except in a modal sense). Moral facts aren’t different for different agents or places. But his theory is subjective because moral facts depend on the attitudes of a group of people (that group is humanity). See here
I get what you say in the first two paragraphs—the fact that you felt the need to say it makes me question whether I should have focused on the word “evil.”
The AI in Eliezer’s story doesn’t disapprove of itself or of its “evilness”. When it says “I am evil”, it means “I have imperatives other than those which would result from a coherent extrapolation of the fundamental human preferences.” It’s just a factual observation made during a conversation with a human being, expressed using a human word.
And ultimately, the wording expresses Eliezer’s metaethics, according to which good and evil are to be defined by such an extrapolation. A similar extrapolation for a cognitively different species might produce a different set of ultimate criteria, but he deems that that just wouldn’t be what humans mean by “good”. So his moral philosophy is a mix of objectivism and relativism: the specifics of good and evil depend on contingencies of human nature, and nonhuman natures might have different idealized ultimate imperatives; but we shouldn’t call those analogous nonhuman extrapolations good and evil, and we should be uninhibited in employing those words—good and evil—to describe the correct extrapolations for human beings, because we are human beings and so the correct human extrapolations are what we would mean by good and evil.
One of the standard worries about FAI is that the AI does evil neuroscientific things in order to find out what humans really want. And the paradox is that, in order to know which methods of investigation are unethical or otherwise undesirable, it already needs a concept of right and wrong. In this regard, “follow the spirit, not the letter, of our commands” is on the same level as “reprogram yourself the way we would have wished”—it doesn’t specify any constraints on the AI’s methods of finding out right and wrong.
This has to be the best summary of Eliezer’s metaethics I’ve ever seen. That said while I understand what you’re saying you’re using the tersm “objectivism” and “relativism” differently from how they’re used in the metaethics literature. Eliezer (at least if this summary is accurate) is not a relativist, because the truth of moral judgments is not contingent (except in a modal sense). Moral facts aren’t different for different agents or places. But his theory is subjective because moral facts depend on the attitudes of a group of people (that group is humanity). See here
I get what you say in the first two paragraphs—the fact that you felt the need to say it makes me question whether I should have focused on the word “evil.”