Wouldn’t being able to understand what was meant by Friendly require an IA to be Friendly?
The answer to that depends on what you mean by Friendly :-)
Presumably the foolish AI-creators in this story don’t have a working FAI theory. So they can’t mean the AI to be Friendly because they don’t know what that is, precisely.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
I wonder how things would work out if you programmed an AI to be ‘Friendly, as Eliezer Yudkowsky would want you to be’. If an AI can derive most of our physics from seeing one frame with a bent blade of grass then it could quite probably glean a lot from scanning Eliezer’s work. 10,000 words are worth a picture after all!
Unfortunately it is getting to that stage through recursive self improvement without messing up the utility function that would doom us.
The answer to that depends on what you mean by Friendly :-)
Presumably the foolish AI-creators in this story don’t have a working FAI theory. So they can’t mean the AI to be Friendly because they don’t know what that is, precisely.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
I wonder how things would work out if you programmed an AI to be ‘Friendly, as Eliezer Yudkowsky would want you to be’. If an AI can derive most of our physics from seeing one frame with a bent blade of grass then it could quite probably glean a lot from scanning Eliezer’s work. 10,000 words are worth a picture after all!
Unfortunately it is getting to that stage through recursive self improvement without messing up the utility function that would doom us.