I think this story, particularly the first argument for “supermorality”, is elaborating a common argument: having an outer alignment of making an AGI “ethical” is a bad idea, at least if we’re doing that by pointing the machine at what humans currently mean by the term “ethics”. We don’t know exactly what we mean, so what we currently mean by “ethics” is probably not a perfect description of how we want a sovereign AGI to run the world. And it’s hard to guess how imperfect it might be, so it sounds like a bad outer alignment goal.
This story illustrates one possibility for why that’s a bad outer alignment goal: what we currently mean by ethics could easily imply that eliminating humanity is the ethical thing to do.
I think arguments for ethics as an outer alignment goal are implicitly based on a belief that there’s a true universal ethics. They’re hoping that an AI reasoning through what we mean by ethics will come up with something better than the sum of our disagreeing arguments, by virtue of their being a natural attractor in the world for what we mean by ethics. But there’s no good reason to think this is true. All known arguments for a universal ethics, and there are a lot, are flawed. It seems like wishful thinking is a more likely explanation for those arguments.
Even if there were, in some important sense, a universal ethics (like “empower sentient beings in proportion to their level of sentience”), that could still imply that eliminating humanity is the truly ethical thing to do.
That’s why I don’t think we usually don’t mean something universal by “ethics”; I think we mean “how to get what we want”, although that’s not quite as cynical as it sounds. See my other top-level comment on that separate topic.
I think this story, particularly the first argument for “supermorality”, is elaborating a common argument: having an outer alignment of making an AGI “ethical” is a bad idea, at least if we’re doing that by pointing the machine at what humans currently mean by the term “ethics”. We don’t know exactly what we mean, so what we currently mean by “ethics” is probably not a perfect description of how we want a sovereign AGI to run the world. And it’s hard to guess how imperfect it might be, so it sounds like a bad outer alignment goal.
This story illustrates one possibility for why that’s a bad outer alignment goal: what we currently mean by ethics could easily imply that eliminating humanity is the ethical thing to do.
I think arguments for ethics as an outer alignment goal are implicitly based on a belief that there’s a true universal ethics. They’re hoping that an AI reasoning through what we mean by ethics will come up with something better than the sum of our disagreeing arguments, by virtue of their being a natural attractor in the world for what we mean by ethics. But there’s no good reason to think this is true. All known arguments for a universal ethics, and there are a lot, are flawed. It seems like wishful thinking is a more likely explanation for those arguments.
Even if there were, in some important sense, a universal ethics (like “empower sentient beings in proportion to their level of sentience”), that could still imply that eliminating humanity is the truly ethical thing to do.
That’s why I don’t think we usually don’t mean something universal by “ethics”; I think we mean “how to get what we want”, although that’s not quite as cynical as it sounds. See my other top-level comment on that separate topic.