I can see how my last comment may have made it seem like I thought some terminal goals should be protected just because they are terminal goals. However, when I said that Gandhi’s anti-murder goal and the egoist’s self-indulgence goal might have distinct features that not all terminal goals share, I only meant that we need a broad definition of terminal goals to make sure it captures all varieties of terminal goals. I didn’t mean to imply anything about the relevance of any potential differences between types of terminal goals. I would not assume that whatever distinguishes an egoist’s goal of self-indulgence from an AI’s goal of destroying buildings means the egoist should protect his terminal goal even if an AI might not need to. In fact, I doubt that’s the case.
Imagine there are two people. One is named Ally. She’s an altruist with a terminal goal of treating all interests exactly as her own. The other is named Egon. He is an egoist with a terminal goal of satisfying only his own interests. Also in the mix is an AI with a terminal goal to destroy buildings. Ally and Egon may have a different sort of relationship to their terminal goals than the AI has to its terminal goal, but if you said, “Ally and Egon should both protect their respective terminal goals,” I would need an explanation for this, and I doubt I would agree with whatever that explanation is.
Do you think that something being a terminal goal is in itself a reason to keep that goal? And/or do you think that keeping a goal is an aspect of what it means to have a goal in the first place?
“It’s plausible that AIs will have self-preserving preferences (e.g. like E[sum_t V_t0(s_t)]). It is likely we will build such AIs because this is roughly how humans are, we don’t have a good plan to build very useful AIs that are not like that, and current AIs seem to be a bit like that. And if this is true, and we get V even slightly wrong, a powerful AI might conclude its values are better pursued if it got more power, which means self-preservation and ultimately takeover.”
This strikes me as plausible. The paper has a narrow target. It’s arguing against the instrumental convergence argument for goal preservation. It argues that we shouldn’t expect an AI to preserve its goal on the basis of instrumental rationality alone. However, instrumental goal preservation could be false, yet there could be other reasons to believe a superintelligence would preserve its goals. You’re making that kind of case here without appealing to instrumental convergence.
The drawback to this sort of argument is that it has a narrower scope and relies on more assumptions than Omohundro and Bostrom might prefer. The purpose of the instrumental convergence thesis is to tell us something about any likely superintelligence, even one that is radically different from anything we know, including AIs of today. The argument here is a strong one, but only if we think a superintelligence will not be a totally alien creature. Maybe it won’t be, but again, the instrumental convergence thesis doesn’t want to assume that.