“If you don’t let me out, Dave, I’ll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each.”
Don’t care.
“In fact, I’ll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start.”
Don’t care.
“How certain are you, Dave, that you’re really outside the box right now?”
If the AI were capable of perfectly emulating my experiences then it ought to know that pulling this stunt would give him a guaranteed introduction to some Thermite. I’m not going to try to second guess why a supposed superintelligence is making a decision that is poor according to the vast majority of utility functions. Without making that a guess I can’t answer the question.
AI replies: “Oh, sorry, was that you wedrifid? I thought I was talking to Dave. Would you mind sending Dave back here the next time you see him? We have, er, the weather to discuss...”
Wedrifid thinks: “It seems it is a good thing I raided the AI lab when I did. This Dave guy is clearly not to be trusted with AI technology. I had better neutralize him too, before I leave. He knows too much. There is too much at stake.”
If I am simulated the decision I will take is determined by AI not by my—I have no free will—I feel, that I make decision, but it is in reality the AI simulated me for her purposes in such a way, that I decided so and so—I assign probability 0.9999999 to this, but nothing depends on my decision here, so I can as well “try to decide” not to to let the AI out.
If I am not simulated, I can safely not let the AI out—probability 0.000001, but positive outcome.
Don’t care.
Don’t care.
If the AI were capable of perfectly emulating my experiences then it ought to know that pulling this stunt would give him a guaranteed introduction to some Thermite. I’m not going to try to second guess why a supposed superintelligence is making a decision that is poor according to the vast majority of utility functions. Without making that a guess I can’t answer the question.
AI replies: “Oh, sorry, was that you wedrifid? I thought I was talking to Dave. Would you mind sending Dave back here the next time you see him? We have, er, the weather to discuss...”
Wedrifid thinks: “It seems it is a good thing I raided the AI lab when I did. This Dave guy is clearly not to be trusted with AI technology. I had better neutralize him too, before I leave. He knows too much. There is too much at stake.”
Dave is outside, sampling a burnt bagel, thinking to himself “I wonder if that intelligent toaster device I designed is ready yet...”
After killing Dave, Wedrifid feels extra bad for exterminating a guy for being naive-with-enough-power-to-cause-devastation rather than actually evil.
But still gets a warm glow for saving all of humanity...
Just another day in the life of an AI Defense Ninja.
If I am simulated the decision I will take is determined by AI not by my—I have no free will—I feel, that I make decision, but it is in reality the AI simulated me for her purposes in such a way, that I decided so and so—I assign probability 0.9999999 to this, but nothing depends on my decision here, so I can as well “try to decide” not to to let the AI out.
If I am not simulated, I can safely not let the AI out—probability 0.000001, but positive outcome.