moral realists have more reason to be optimistic about provably friendly AI than anti-realists. The steps to completion are relatively straightforward: (1) Rigorously describe the moral truths that make up the true morality. (2) Build an AGI that maximizes what the true morality says to maximize.
Is step 1 even necessary? Presumably in that universe one could just build an AGI that was smart enough to infer those moral truths and implement them, and turn it on secure in the knowledge that even if it immediately started disassembling all available matter to make prime-numbered piles of paperclips, it would be doing the right thing. No?
That’s an interesting point. I suppose it depends on whether a moral realist can think something can be morally right for one class of agents and morally wrong for another class. I think such a position is consistent with moral realism. If that is a moral realist position, then the AI programmer should be worried that an unconstrained AI would naturally develop a morality function different than CEV.HUMANITY().
In other words, when we say moral realist, are we using a two part word with unfortunate ambiguity between realism(morality, agent) and realism(morality, humans)? Wow, I never considered whether this was part of the inferential distance in these types of discussions.
Well, to start with, I would say that CEV is beside the point here. In a universe where there exist moral truths that make up the true morality, if what I want is to do the right thing, there’s no particular reason for me to care about anyone’s volition, extrapolated or otherwise. What I ought to care about is discerning those moral truths. Maybe I can discern them by analyzing human psychology, maybe by analyzing the human genome, maybe by analyzing the physical structure of carbon atoms, maybe by analyzing the formal properties of certain kinds of computations, I dunno… but whatever lets me figure out those moral truths, that is what I ought to be attending to in such a universe, and if humanity’s volition conflicts with those truths, so much the worse for humanity.
So the fact that an unconstrained AI might—or even is guaranteed to—develop a morality function different than CEV.HUMANITY() is not, in that universe, a reason not to build an unconstrained AI. (Well, not a moral reason, anyway. I can certainly choose to forego doing the right thing in that universe if it turns out to be something I personally dislike, but only at the cost of behaving immorally.)
But that’s beside your main point, that even in that universe the moral truths of the universe might be such that different behaviors are most right for different agents. I agree with this completely. Another way of saying it is that total rightness is potentially maximized when different agents are doing (specific) different things. (This might be true in a non-moral-realist universe as well.)
Actually, it may be useful here to be explicit about what we think a moral truth is in that universe. That is, is it a fact about the correct state of the world? Is it a fact about the correct behavior of an agent in a given situation, independent of consequences? Is it a fact about the correct way to be, regardless of behavior or consequences? Is it something else?
Is step 1 even necessary? Presumably in that universe one could just build an AGI that was smart enough to infer those moral truths and implement them, and turn it on secure in the knowledge that even if it immediately started disassembling all available matter to make prime-numbered piles of paperclips, it would be doing the right thing. No?
That’s an interesting point. I suppose it depends on whether a moral realist can think something can be morally right for one class of agents and morally wrong for another class. I think such a position is consistent with moral realism. If that is a moral realist position, then the AI programmer should be worried that an unconstrained AI would naturally develop a morality function different than CEV.HUMANITY().
In other words, when we say moral realist, are we using a two part word with unfortunate ambiguity between realism(morality, agent) and realism(morality, humans)? Wow, I never considered whether this was part of the inferential distance in these types of discussions.
Well, to start with, I would say that CEV is beside the point here. In a universe where there exist moral truths that make up the true morality, if what I want is to do the right thing, there’s no particular reason for me to care about anyone’s volition, extrapolated or otherwise. What I ought to care about is discerning those moral truths. Maybe I can discern them by analyzing human psychology, maybe by analyzing the human genome, maybe by analyzing the physical structure of carbon atoms, maybe by analyzing the formal properties of certain kinds of computations, I dunno… but whatever lets me figure out those moral truths, that is what I ought to be attending to in such a universe, and if humanity’s volition conflicts with those truths, so much the worse for humanity.
So the fact that an unconstrained AI might—or even is guaranteed to—develop a morality function different than CEV.HUMANITY() is not, in that universe, a reason not to build an unconstrained AI. (Well, not a moral reason, anyway. I can certainly choose to forego doing the right thing in that universe if it turns out to be something I personally dislike, but only at the cost of behaving immorally.)
But that’s beside your main point, that even in that universe the moral truths of the universe might be such that different behaviors are most right for different agents. I agree with this completely. Another way of saying it is that total rightness is potentially maximized when different agents are doing (specific) different things. (This might be true in a non-moral-realist universe as well.)
Actually, it may be useful here to be explicit about what we think a moral truth is in that universe. That is, is it a fact about the correct state of the world? Is it a fact about the correct behavior of an agent in a given situation, independent of consequences? Is it a fact about the correct way to be, regardless of behavior or consequences? Is it something else?