I think most of the misunderstanding boils down to this section:
I want to suggest that the implausibility of this scenario is quite obvious: if the AGI is supposed to check with the programmers about their intentions before taking action, why did it decide to rewire their brains before asking them if it was okay to do the rewiring?
Yudkowsky hints that this would happen because it would be more efficient for the AI to ignore the checking code. He seems to be saying that the AI is allowed to override its own code (the checking code, in this case) because doing so would be “more efficient,” but it would not be allowed to override its motivation code just because the programmers told it there had been a mistake.
This looks like a bait-and-switch. Out of nowhere, Yudkowsky implicitly assumes that “efficiency” trumps all else, without pausing for a moment to consider that it would be trivial to design the AI in such a way that efficiency was a long way down the list of priorities. There is no law of the universe that says all artificial intelligence systems must prize efficiency above all other considerations, so what really happened here is that Yudkowsky designed this hypothetical machine to fail. By inserting the Efficiency Trumps All directive, the AGI was bound to go berserk.
I think Loosemores reasoning is plausible. But it is based on a misunderstanding of what efficiency and understanding means here. Not efficiency in and of itself but relating to its function. And ‘its function’ is neither some source code nor the reasoner nor the checking logic but the aggregate effect of these on whatever is optimized within the optimizer. And some optimizer you need otherwise you have no smart choice—and no AGI. And because the AGI presumably is super-smart to optimize this aggregate—however tricky you tune it—the results will likely be problematic.
Consider humans which after all also have a brain with a component which presumably tries to optimize (don’t you try to choose what is in a complex way best?). Our preferrences are complex. Extreme cases like sketched for AI which presumaby cause havoc and (self-)destruction have been heavy weeded out by evolution. But still sometimes some peoples brain the optimizer locks into an area of the optimizing space which still fails. Think suicide bomber. Despite lots and lots of ‘checking code’ (here e.g. self-preservation instinct).
I think most of the misunderstanding boils down to this section:
I think Loosemores reasoning is plausible. But it is based on a misunderstanding of what efficiency and understanding means here. Not efficiency in and of itself but relating to its function. And ‘its function’ is neither some source code nor the reasoner nor the checking logic but the aggregate effect of these on whatever is optimized within the optimizer. And some optimizer you need otherwise you have no smart choice—and no AGI. And because the AGI presumably is super-smart to optimize this aggregate—however tricky you tune it—the results will likely be problematic.
Consider humans which after all also have a brain with a component which presumably tries to optimize (don’t you try to choose what is in a complex way best?). Our preferrences are complex. Extreme cases like sketched for AI which presumaby cause havoc and (self-)destruction have been heavy weeded out by evolution. But still sometimes some peoples brain the optimizer locks into an area of the optimizing space which still fails. Think suicide bomber. Despite lots and lots of ‘checking code’ (here e.g. self-preservation instinct).