That’s a solution a human would come up with implicitly using human understanding of what is appropriate.
The best solution to the uFAI in the AI’s mind might be creating a small amount of anitmatter in the uFAI lab. the AI is 99.99% confident that it only needs half of earth to achieve its goal of becoming Friendly.
The problem is explaining why that’s a bad thing in terms that will allow the AI to rewrite its source code. It has no way on it’s own of determining if any of the steps it thinks are ok aren’t actually horrible things, because it knows it wasn’t given a reliable way of determining what is horrible.
Any rule like “Don’t do any big drastic acts until you’re friendly” requires an understanding of what we would consider important vs. unimportant.
Not to mention the meaning of “friendly”. Could an unFriendlyAI know what was meant be Friendly? Wouldn’t being able to understand what was meant by Friendly require an IA to be Friendly?
EDITED TO ADD: I goofed in framing the problem. I was thinking about the process of being Friendly, which is what I interpreted the original post to be talking about. What I wrote is obviously wrong, an unFriendly AI could know and understand the intended results of Friendliness.
Wouldn’t being able to understand what was meant by Friendly require an IA to be Friendly?
The answer to that depends on what you mean by Friendly :-)
Presumably the foolish AI-creators in this story don’t have a working FAI theory. So they can’t mean the AI to be Friendly because they don’t know what that is, precisely.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
I wonder how things would work out if you programmed an AI to be ‘Friendly, as Eliezer Yudkowsky would want you to be’. If an AI can derive most of our physics from seeing one frame with a bent blade of grass then it could quite probably glean a lot from scanning Eliezer’s work. 10,000 words are worth a picture after all!
Unfortunately it is getting to that stage through recursive self improvement without messing up the utility function that would doom us.
That’s a solution a human would come up with implicitly using human understanding of what is appropriate.
The best solution to the uFAI in the AI’s mind might be creating a small amount of anitmatter in the uFAI lab. the AI is 99.99% confident that it only needs half of earth to achieve its goal of becoming Friendly.
The problem is explaining why that’s a bad thing in terms that will allow the AI to rewrite its source code. It has no way on it’s own of determining if any of the steps it thinks are ok aren’t actually horrible things, because it knows it wasn’t given a reliable way of determining what is horrible.
Any rule like “Don’t do any big drastic acts until you’re friendly” requires an understanding of what we would consider important vs. unimportant.
You’re right, it would imply that the programmers were quite close to having created a FAI.
Not to mention the meaning of “friendly”. Could an unFriendlyAI know what was meant be Friendly? Wouldn’t being able to understand what was meant by Friendly require an IA to be Friendly?
EDITED TO ADD: I goofed in framing the problem. I was thinking about the process of being Friendly, which is what I interpreted the original post to be talking about. What I wrote is obviously wrong, an unFriendly AI could know and understand the intended results of Friendliness.
Yes.
No.
The answer to that depends on what you mean by Friendly :-)
Presumably the foolish AI-creators in this story don’t have a working FAI theory. So they can’t mean the AI to be Friendly because they don’t know what that is, precisely.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
I wonder how things would work out if you programmed an AI to be ‘Friendly, as Eliezer Yudkowsky would want you to be’. If an AI can derive most of our physics from seeing one frame with a bent blade of grass then it could quite probably glean a lot from scanning Eliezer’s work. 10,000 words are worth a picture after all!
Unfortunately it is getting to that stage through recursive self improvement without messing up the utility function that would doom us.