Humans are most of the way to human-friendly. A human given absolute power might use it to accumulate wealth at the expense of others, or punish people that displease her in cruel ways, or even utterly annihilate large groups of people based on something silly like nationality or skin color. But a human wouldn’t misunderstand human values. There is no chance the human would, if she decided to make everyone as happy as possible, kill everyone to use their atoms to tile the universe with pictures of smiley faces (to use a familiar example).
I mean, sure, I agree with the example: a well-meaning human would not kill everyone to tile the universe with pictures of smiley faces. There’s a reason that example is familiar; it was chosen by humans to illustrate something humans instinctively agree is the wrong answer, but a nonhuman optimizer might not.
But to generalize from this to the idea that humans wouldn’t misunderstand human values, or that a well-meaning human granted superhuman optimization abilities won’t inadvertently destroy the things humans value most, seems unjustified.
Well, there’s the problem of getting the human to be sufficiently well-meaning, as opposed to using Earth as The Sims 2100 before moving on to bigger and better galaxies. But if Friendliness is a coherent concept to begin with, why wouldn’t the well-meaning superhuman figure it out after spending some time thinking about it?
Edit: What I’m saying is that if the candidate Friendly AI is actually a superhuman, then we don’t have to worry about Step 1 of friendliness: explaining the problem. Step 2 is convincing the superhuman to care about the problem, and I don’t know how likely that is. And finally Step 3 is figuring out the solution, and assuming the human is sufficiently super that wouldn’t be difficult (all this requires is intelligence, which is what we’re giving the human to begin with).
Agreed that a sufficiently intelligent human would be no less capable of understanding human values, given data and time, than an equally intelligent nonhuman.
No-one is seriously worried that an AGI will misunderstand human values. The worry is that an AGI will understand human values perfectly well, and go on to optimize what it was built to optimize.
Right, so I’m still thinking about it from the “what it was built to optimize” step. You want to try to build the AGI to optimize for human values, right? So you do your best to explain to it what you mean by your human values. But then you fail at explaining and it starts optimizing something else instead.
But suppose the AGI is a super-intelligent human. Now you can just ask it to “optimize for human values” in those exact words (although you probably want to explain it a bit better, just to be on the safe side).
Humans are most of the way to human-friendly. A human given absolute power might use it to accumulate wealth at the expense of others, or punish people that displease her in cruel ways, or even utterly annihilate large groups of people based on something silly like nationality or skin color. But a human wouldn’t misunderstand human values. There is no chance the human would, if she decided to make everyone as happy as possible, kill everyone to use their atoms to tile the universe with pictures of smiley faces (to use a familiar example).
That is not at all clear to me.
I mean, sure, I agree with the example: a well-meaning human would not kill everyone to tile the universe with pictures of smiley faces. There’s a reason that example is familiar; it was chosen by humans to illustrate something humans instinctively agree is the wrong answer, but a nonhuman optimizer might not.
But to generalize from this to the idea that humans wouldn’t misunderstand human values, or that a well-meaning human granted superhuman optimization abilities won’t inadvertently destroy the things humans value most, seems unjustified.
Well, there’s the problem of getting the human to be sufficiently well-meaning, as opposed to using Earth as The Sims 2100 before moving on to bigger and better galaxies. But if Friendliness is a coherent concept to begin with, why wouldn’t the well-meaning superhuman figure it out after spending some time thinking about it?
Edit: What I’m saying is that if the candidate Friendly AI is actually a superhuman, then we don’t have to worry about Step 1 of friendliness: explaining the problem. Step 2 is convincing the superhuman to care about the problem, and I don’t know how likely that is. And finally Step 3 is figuring out the solution, and assuming the human is sufficiently super that wouldn’t be difficult (all this requires is intelligence, which is what we’re giving the human to begin with).
Agreed that a sufficiently intelligent human would be no less capable of understanding human values, given data and time, than an equally intelligent nonhuman.
No-one is seriously worried that an AGI will misunderstand human values. The worry is that an AGI will understand human values perfectly well, and go on to optimize what it was built to optimize.
Right, so I’m still thinking about it from the “what it was built to optimize” step. You want to try to build the AGI to optimize for human values, right? So you do your best to explain to it what you mean by your human values. But then you fail at explaining and it starts optimizing something else instead.
But suppose the AGI is a super-intelligent human. Now you can just ask it to “optimize for human values” in those exact words (although you probably want to explain it a bit better, just to be on the safe side).