I don’t see the problem framed that way very often. Part of me wishes it were clearly framed that way more often, though part of me also wonders if that way of framing it misses something.
Also, being able to understand what that means is something that comes in degrees. Do you have an opinion on how good an AI has to be at that to be safe? Do you have a sense of other people’s opinions on that question? (Wanting to feel that out is actually one of my main reasons for writing this post.)
following the spirit, not the letter, of our commands
This seems like a trivial variation of “I wish for you to do what I should wish for”. Which is to say, I do see it framed exactly that way fairly frequently here. The general problem, I think, is that all of these various problems are at a similar level of difficulty, and the solution to one seems to imply the solution to all of them. The corollary being that something that’s nearly a solution to any of them carries all the risks of any AI. This is where terms like “AI-complete” and “FAI-complete” come from.
On further reflection, this business of “FAI-complete” is very puzzling. What we should make of it depends on what we mean by FAI:
If we define FAI broadly, then yes, the problem of getting AI to have a decent understanding of our intentions does seem to be FAI-complete
If we defined FAI as a utopia-machine, claims of FAI completeness look very dubious. I have a human’s values, but my understanding of my own values isn’t perfect. If I found myself in the position of the titular character in Bruce Almighty, I’d trust myself to try to make some very large improvements in the world, but I wouldn’t trust myself to try to create a utopia in one fell swoop. If my self-assessment is right, that means it’s possible to have a mind that can be trusted to attempt some good actions but not others, which looks like a problem for claims of FAI completeness.
Edit: Though in Bruce Almighty, he just wills things to happen and they happen. There are often unintended consequences, but never any need to worry about what means the genie will use to get the desired result. So it’s not a perfect analogy for trying to use super-AI.
Besides, even if an AI is Friendliness-complete and knows the “right thing” to be achieved, it doesn’t mean it can actually achieve it. Being superhumanly smart doesn’t mean being superhumanly powerful. We often make such an assumption because it’s the safe one in the Least Convenient World if the AI is not Friendly. But in the Least Convenient World, a proven-Friendly AI is at least as intelligent as a human, but no more powerful than an average big corp.
To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish.
This may or may not be true depending on what you mean by “safe.”
Imagine a superintelligence that executes the intent of any command given with the right authorization code, and is very good at working out the intent of commands. Such a superintelligence might do horrible things to humanity if Alpha Centaurians or selfish/nepotistic humans got ahold of the code, but could have very good effects if a truly altruistic human (if there ever was such a thing) were commanding it. Okay, so that’s not a great bet for humanity as a whole, but it’s still going to be a safe fulfiller of wishes for whoever makes the wish. Yet it doesn’t have anyone’s values, it just does what it’s told.
I’m glad you linked to that, because I just now noticed that sentence, and it confirms something I’ve been suspecting about Eliezer’s views on AI safety. He seems to think on the one hand you have the AI’s abilities, and on the other hand you have it’s values. Safe AI depends entirely on the values; you can build an AI that matches human intellectual abilities in every way without making a bit of progress on making it safe.
This is wrong because, by hypothesis, an AI that matches human intellectual abilities in every way would have considerable ability to understand the intent behind orders (at least when those orders are given y humans). IDK if that would be enough, though, when the AI is playing with superpowers. Also, there’s no law that says only AIs that are capable of understanding us are allowed to kill us.
In short, there are a lot of different incentives acting on agents, and miscalibrating the relative strength of different constraints leads fairly quickly to unintended pernicious outcomes.
I don’t see the problem framed that way very often. Part of me wishes it were clearly framed that way more often, though part of me also wonders if that way of framing it misses something.
Also, being able to understand what that means is something that comes in degrees. Do you have an opinion on how good an AI has to be at that to be safe? Do you have a sense of other people’s opinions on that question? (Wanting to feel that out is actually one of my main reasons for writing this post.)
This seems like a trivial variation of “I wish for you to do what I should wish for”. Which is to say, I do see it framed exactly that way fairly frequently here. The general problem, I think, is that all of these various problems are at a similar level of difficulty, and the solution to one seems to imply the solution to all of them. The corollary being that something that’s nearly a solution to any of them carries all the risks of any AI. This is where terms like “AI-complete” and “FAI-complete” come from.
On further reflection, this business of “FAI-complete” is very puzzling. What we should make of it depends on what we mean by FAI:
If we define FAI broadly, then yes, the problem of getting AI to have a decent understanding of our intentions does seem to be FAI-complete
If we defined FAI as a utopia-machine, claims of FAI completeness look very dubious. I have a human’s values, but my understanding of my own values isn’t perfect. If I found myself in the position of the titular character in Bruce Almighty, I’d trust myself to try to make some very large improvements in the world, but I wouldn’t trust myself to try to create a utopia in one fell swoop. If my self-assessment is right, that means it’s possible to have a mind that can be trusted to attempt some good actions but not others, which looks like a problem for claims of FAI completeness.
Edit: Though in Bruce Almighty, he just wills things to happen and they happen. There are often unintended consequences, but never any need to worry about what means the genie will use to get the desired result. So it’s not a perfect analogy for trying to use super-AI.
Besides, even if an AI is Friendliness-complete and knows the “right thing” to be achieved, it doesn’t mean it can actually achieve it. Being superhumanly smart doesn’t mean being superhumanly powerful. We often make such an assumption because it’s the safe one in the Least Convenient World if the AI is not Friendly. But in the Least Convenient World, a proven-Friendly AI is at least as intelligent as a human, but no more powerful than an average big corp.
From the link you provide:
This may or may not be true depending on what you mean by “safe.”
Imagine a superintelligence that executes the intent of any command given with the right authorization code, and is very good at working out the intent of commands. Such a superintelligence might do horrible things to humanity if Alpha Centaurians or selfish/nepotistic humans got ahold of the code, but could have very good effects if a truly altruistic human (if there ever was such a thing) were commanding it. Okay, so that’s not a great bet for humanity as a whole, but it’s still going to be a safe fulfiller of wishes for whoever makes the wish. Yet it doesn’t have anyone’s values, it just does what it’s told.
I’m glad you linked to that, because I just now noticed that sentence, and it confirms something I’ve been suspecting about Eliezer’s views on AI safety. He seems to think on the one hand you have the AI’s abilities, and on the other hand you have it’s values. Safe AI depends entirely on the values; you can build an AI that matches human intellectual abilities in every way without making a bit of progress on making it safe.
This is wrong because, by hypothesis, an AI that matches human intellectual abilities in every way would have considerable ability to understand the intent behind orders (at least when those orders are given y humans). IDK if that would be enough, though, when the AI is playing with superpowers. Also, there’s no law that says only AIs that are capable of understanding us are allowed to kill us.
No eating in the classroom. Is the rule’s purpose, the text, or the rule-maker’s intent most important?
In short, there are a lot of different incentives acting on agents, and miscalibrating the relative strength of different constraints leads fairly quickly to unintended pernicious outcomes.