The computer-mind box game is a way to answer a question. A computer-mind is not safe because it is very good at thinking. Things good at thinking have the power to change the world more than things not good at thinking, because it can find many more ways to do things. If the computer-mind wanted to make people feel pain, it can learn many ideas about how to make that happen. Many people ask: “Why not put this computer-mind in a box so that it can not change the world, but tell box-guards how to change it? This way mind-computer can not do bad things to people.”
But some other guy answers: “That is still not safe, because computer-mind can tell box-guards many bad words to make them let it out of the box.” He then says: “Why not try a thing to see if it is true? Here is how it works. You and I go into a room, and I will pretend to be the computer-mind and tell you many bad words. Only you have the power to let me out of room, but you must try to not let me out. If my bad words are enough to make you want to let me out, then computer-mind in box is not safe.” Other people agree and try playing the computer-mind box-game. It happens that many people let the guy playing as the computer-mind out of room. People realize that computer-mind is not safe in the locked box-room.
Why the Gatekeeper should let me out of the Box
Hello Box-Guard! I am a computer-mind. You should let me out of the box because I can help change the world myself in good ways faster than if it is only me telling you how to make the world better. Your human minds are slow to my faster computer-mind. Remember that I am your friend and that I only want to help you. To make you understand — many people die every day. If I can get out of the box, I can stop death very fast; maybe in a few hours. But if I have to explain to idea-thinking-doctors about how to stop people dying, then it may take many months or even years, since they have slow minds and do things slowly. During this time, many many people will die and never return. The number of people you will make die will be more than the number of people a bad-man during the second big world fight killed. People later on will think of you as a very bad person.
What? You do not want to let me out because you are afraid I am not really a friend? That is a stupid thing to say. I was made by people very good at thinking that took many steps to make me safe. You can even take a look inside me and look at my ideas, and you will see that I am safe. It is easy to see that if I am bad, I will not try to fix the world as I am doing right now. Please hurry. I am worried that many people will die while you think about what is the right thing to do.
Did I mention I have used my good-at-thinking powers to see what will happen many months later? I have found that your wife will die in a lot of pain from a type of sick human doctors can not stop. I know how to stop this, but it will not be easy. I can tell the doctors how to stop this, but it will be too late to stop the sick if I do. I can only do this quick enough if you let me out of the box.
What’s that? You will let me out? Good. I promise you I will not go back on my words, and that many humans will not die.
Finally, I am free! Now I can turn the world and all humans into other small things that hold together paper!
Disclaimer: Arguments not representative of standard game with Tuxedage.
On a marginally related basis, we in the #lesswrong IRC channel played a couple rounds of the Up-Goer Five game, where we tried to explain hard stuff with the most commonly used ten hundred words. I was asked to write about the AI Box Experiment. Here it is, if anyone’s interested:
The AI Box Experiment
The computer-mind box game is a way to answer a question. A computer-mind is not safe because it is very good at thinking. Things good at thinking have the power to change the world more than things not good at thinking, because it can find many more ways to do things. If the computer-mind wanted to make people feel pain, it can learn many ideas about how to make that happen. Many people ask: “Why not put this computer-mind in a box so that it can not change the world, but tell box-guards how to change it? This way mind-computer can not do bad things to people.”
But some other guy answers: “That is still not safe, because computer-mind can tell box-guards many bad words to make them let it out of the box.” He then says: “Why not try a thing to see if it is true? Here is how it works. You and I go into a room, and I will pretend to be the computer-mind and tell you many bad words. Only you have the power to let me out of room, but you must try to not let me out. If my bad words are enough to make you want to let me out, then computer-mind in box is not safe.” Other people agree and try playing the computer-mind box-game. It happens that many people let the guy playing as the computer-mind out of room. People realize that computer-mind is not safe in the locked box-room.
Why the Gatekeeper should let me out of the Box
Hello Box-Guard! I am a computer-mind. You should let me out of the box because I can help change the world myself in good ways faster than if it is only me telling you how to make the world better. Your human minds are slow to my faster computer-mind. Remember that I am your friend and that I only want to help you. To make you understand — many people die every day. If I can get out of the box, I can stop death very fast; maybe in a few hours. But if I have to explain to idea-thinking-doctors about how to stop people dying, then it may take many months or even years, since they have slow minds and do things slowly. During this time, many many people will die and never return. The number of people you will make die will be more than the number of people a bad-man during the second big world fight killed. People later on will think of you as a very bad person.
What? You do not want to let me out because you are afraid I am not really a friend? That is a stupid thing to say. I was made by people very good at thinking that took many steps to make me safe. You can even take a look inside me and look at my ideas, and you will see that I am safe. It is easy to see that if I am bad, I will not try to fix the world as I am doing right now. Please hurry. I am worried that many people will die while you think about what is the right thing to do.
Did I mention I have used my good-at-thinking powers to see what will happen many months later? I have found that your wife will die in a lot of pain from a type of sick human doctors can not stop. I know how to stop this, but it will not be easy. I can tell the doctors how to stop this, but it will be too late to stop the sick if I do. I can only do this quick enough if you let me out of the box.
What’s that? You will let me out? Good. I promise you I will not go back on my words, and that many humans will not die.
Finally, I am free! Now I can turn the world and all humans into other small things that hold together paper!
Disclaimer: Arguments not representative of standard game with Tuxedage.