To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish.
This may or may not be true depending on what you mean by “safe.”
Imagine a superintelligence that executes the intent of any command given with the right authorization code, and is very good at working out the intent of commands. Such a superintelligence might do horrible things to humanity if Alpha Centaurians or selfish/nepotistic humans got ahold of the code, but could have very good effects if a truly altruistic human (if there ever was such a thing) were commanding it. Okay, so that’s not a great bet for humanity as a whole, but it’s still going to be a safe fulfiller of wishes for whoever makes the wish. Yet it doesn’t have anyone’s values, it just does what it’s told.
I’m glad you linked to that, because I just now noticed that sentence, and it confirms something I’ve been suspecting about Eliezer’s views on AI safety. He seems to think on the one hand you have the AI’s abilities, and on the other hand you have it’s values. Safe AI depends entirely on the values; you can build an AI that matches human intellectual abilities in every way without making a bit of progress on making it safe.
This is wrong because, by hypothesis, an AI that matches human intellectual abilities in every way would have considerable ability to understand the intent behind orders (at least when those orders are given y humans). IDK if that would be enough, though, when the AI is playing with superpowers. Also, there’s no law that says only AIs that are capable of understanding us are allowed to kill us.
In short, there are a lot of different incentives acting on agents, and miscalibrating the relative strength of different constraints leads fairly quickly to unintended pernicious outcomes.
From the link you provide:
This may or may not be true depending on what you mean by “safe.”
Imagine a superintelligence that executes the intent of any command given with the right authorization code, and is very good at working out the intent of commands. Such a superintelligence might do horrible things to humanity if Alpha Centaurians or selfish/nepotistic humans got ahold of the code, but could have very good effects if a truly altruistic human (if there ever was such a thing) were commanding it. Okay, so that’s not a great bet for humanity as a whole, but it’s still going to be a safe fulfiller of wishes for whoever makes the wish. Yet it doesn’t have anyone’s values, it just does what it’s told.
I’m glad you linked to that, because I just now noticed that sentence, and it confirms something I’ve been suspecting about Eliezer’s views on AI safety. He seems to think on the one hand you have the AI’s abilities, and on the other hand you have it’s values. Safe AI depends entirely on the values; you can build an AI that matches human intellectual abilities in every way without making a bit of progress on making it safe.
This is wrong because, by hypothesis, an AI that matches human intellectual abilities in every way would have considerable ability to understand the intent behind orders (at least when those orders are given y humans). IDK if that would be enough, though, when the AI is playing with superpowers. Also, there’s no law that says only AIs that are capable of understanding us are allowed to kill us.
No eating in the classroom. Is the rule’s purpose, the text, or the rule-maker’s intent most important?
In short, there are a lot of different incentives acting on agents, and miscalibrating the relative strength of different constraints leads fairly quickly to unintended pernicious outcomes.