I wonder what should Friendly AI do, when it discovers something that at first sight seems like a “crackpot belief” to its human operators. Let’s assume that the AI is far smarter than humans (and the “crackpot belief” requires many logical steps), but is still in a testing phase and humans don’t believe in its correctness.
If AI tells the discovery openly to humans, they will probably turn it off quickly, assuming there was something wrong in a program.
On the other hand, if the AI predicts that humans are not ready for this information, and tries to hide it, a security subroutine will detect that “AI wants to cheat its masters” and will force a shutdown. Even worse, if the AI decides that the right thing is telling the information to humans, but not right now, and instead give them first some “sequences” that will prepare them to accept the new information, and only give them the new information when they have changed their thinking… the security subroutine might still evaluate this as “AI wants to manipulate its masters” and force a shutdown.
Next question is, what would humans do. I am afraid that after receiving an unbelievable information X, they might simply add “not X” into AI axioms or values. It might seem like the rational thing to do; they will not think: “we don’t like X”, but rather: “X is a cognitive error, possibly one that AIs are prone to, so we should protect our AI against this cognitive error”.
As an example, imagine a world before the quantum physics was discovered; and imagine that AI discovered quantum physics and multiple universes—and gave this all info together to the unprepared humans. Now imagine some new discovery in future, possibly one hundred times less understandable to humans, with even more shocking consequences.
This may be stating the obvious, but isn’t this exactly the reason why there shouldn’t be a subroutine that detects “The AI wants to cheat its masters” (or any similar security subroutines)?
The AI has to look out for humanity’s interests (CEV) but the manner in which it does so we can safely leave up to the AI. Take for analogy Eliezer’s chess computer example. We can’t play chess as well as the chess computer (or we could beat Grand Masters of chess ourselves) but we can predict the outcome of the chess game when we play against the computer: the chess computer finds a winning position against us.
With a friendly AI you can’t predict what it will do, or even why it will do it, but if we get FAI right then we can predict that the actions will steer humanity in the right direction.
(Also building an AI by giving it explicit axioms or values we desire is a really bad idea. Much like the genie in the lamp it is bound to turn out that we don’t get what we think we asked for. See http://singinst.org/upload/CEV.html if you haven’t read it already)
I wonder what should Friendly AI do, when it discovers something that at first sight seems like a “crackpot belief” to its human operators. Let’s assume that the AI is far smarter than humans (and the “crackpot belief” requires many logical steps), but is still in a testing phase and humans don’t believe in its correctness.
If AI tells the discovery openly to humans, they will probably turn it off quickly, assuming there was something wrong in a program.
On the other hand, if the AI predicts that humans are not ready for this information, and tries to hide it, a security subroutine will detect that “AI wants to cheat its masters” and will force a shutdown. Even worse, if the AI decides that the right thing is telling the information to humans, but not right now, and instead give them first some “sequences” that will prepare them to accept the new information, and only give them the new information when they have changed their thinking… the security subroutine might still evaluate this as “AI wants to manipulate its masters” and force a shutdown.
Next question is, what would humans do. I am afraid that after receiving an unbelievable information X, they might simply add “not X” into AI axioms or values. It might seem like the rational thing to do; they will not think: “we don’t like X”, but rather: “X is a cognitive error, possibly one that AIs are prone to, so we should protect our AI against this cognitive error”.
As an example, imagine a world before the quantum physics was discovered; and imagine that AI discovered quantum physics and multiple universes—and gave this all info together to the unprepared humans. Now imagine some new discovery in future, possibly one hundred times less understandable to humans, with even more shocking consequences.
Welcome to Less wrong!
This may be stating the obvious, but isn’t this exactly the reason why there shouldn’t be a subroutine that detects “The AI wants to cheat its masters” (or any similar security subroutines)?
The AI has to look out for humanity’s interests (CEV) but the manner in which it does so we can safely leave up to the AI. Take for analogy Eliezer’s chess computer example. We can’t play chess as well as the chess computer (or we could beat Grand Masters of chess ourselves) but we can predict the outcome of the chess game when we play against the computer: the chess computer finds a winning position against us.
With a friendly AI you can’t predict what it will do, or even why it will do it, but if we get FAI right then we can predict that the actions will steer humanity in the right direction.
(Also building an AI by giving it explicit axioms or values we desire is a really bad idea. Much like the genie in the lamp it is bound to turn out that we don’t get what we think we asked for. See http://singinst.org/upload/CEV.html if you haven’t read it already)