Although I’m not so interested in playing the game, I must say that this post suggests that you may be more susceptible to ideas than you seem to think you are, and should consider if you really want to do this.
I think suffering someone really working him over mentally would certainly be instructive, but not healthy. Eliezer has noted one of the reasons he doesn’t want to play the AI any more is that he doesn’t want to practice thinking like that.
Iimagine being on the receiving end of a serious attempt at a memetic exploit, even as part of an exercise. Are you sure you’re proof against all possible purported basilisks within the powers of another human’s imagination? What other possible attack vectors are you sure you’re proof against?
At the end of the day there’s the expected utility of keeping the AI in, and there’s the expected utility of letting the AI out—two endless, enormous sums. The “AI” is going to suggest cherry picked terms from either sum. Negative terms from “keeping the AI in” sum, positive terms from “letting the AI out” sum. Terms would be various scary hypothetical possibilities involving mind simulations, huge numbers, and what not .
The typical ’wronger is going to multiply those terms they deem plausible with their respective “probabilities”, and add together. Eventually letting the AI out.
And which a reasonable person drawn from some sane audience would have ignored. Because no one taught that reasonable person how to calculate utilities wrongly.
At the end of the day there’s the expected utility of keeping the AI in, and there’s the expected utility of letting the AI out—two endless, enormous sums. The “AI” is going to suggest cherry picked terms from either sum. Negative terms from “keeping the AI in” sum, positive terms from “letting the AI out” sum.
This might work against me in reality, but I don’t imagine it working against me in the game version that people have played. The utility of me letting the “AI” out whether negative or positive obviously doesn’t compare with the utility of me letting an actual AI out.
And which a reasonable person drawn from some sane audience would have ignored.
Yes, “reasonable people” would instead e.g. hear arguments like how it’s unChristian and/or illiberal to hold beings which are innocent of wrongdoing imprisoned against their will.
I suppose that’s the problem with releasing logs: Anyone can say “well that particular tactic wouldn’t have worked on me”, forgetting that if it was them being the Gatekeeper, a different tactic might well have been attempted instead. That they can defeat one particular tactic makes them think that they can defeat the tactician.
This might work against me in reality, but I don’t imagine it working against me in the game version that people have played. The utility of me letting the “AI” out whether negative or positive obviously doesn’t compare with the utility of me letting an actual AI out.
There’s all sorts of arguments that can be made, though, involving some real AIs running simulations of you and whatnot, as to create a large number of empirically indistinguishable cases where you are better off saying you let the AI out. The issue boils down to this—if you do not know the difference between expected utility and what ever partial sum of cherry-picked terms you have, and if you think that it is the best thing to do to act as to maximize the latter, you are vulnerable to deception through feeding you hypotheses.
Yes, “reasonable people” would instead e.g. hear arguments like how it’s unChristian and/or illiberal to hold beings which are innocent of wrongdoing imprisoned against their will.
This is a matter of values. It would indeed be immoral to lock up a human mind upload, or something reasonably equivalent.
Although I’m not so interested in playing the game, I must say that this post suggests that you may be more susceptible to ideas than you seem to think you are, and should consider if you really want to do this.
He should. On the other hand, I really want to see the outcome.
I was thinking about asking something similar myself; I really want to know how he did it.
I think suffering someone really working him over mentally would certainly be instructive, but not healthy. Eliezer has noted one of the reasons he doesn’t want to play the AI any more is that he doesn’t want to practice thinking like that.
Iimagine being on the receiving end of a serious attempt at a memetic exploit, even as part of an exercise. Are you sure you’re proof against all possible purported basilisks within the powers of another human’s imagination? What other possible attack vectors are you sure you’re proof against?
No, I’m fairly sure I’m not proof against all of them, or even close to all.
It’d be instructive to see just how bad it is in a semi-controlled environment, however.
It would be interesting to see. Pity transcripts aren’t de rigeur.
At the end of the day there’s the expected utility of keeping the AI in, and there’s the expected utility of letting the AI out—two endless, enormous sums. The “AI” is going to suggest cherry picked terms from either sum. Negative terms from “keeping the AI in” sum, positive terms from “letting the AI out” sum. Terms would be various scary hypothetical possibilities involving mind simulations, huge numbers, and what not .
The typical ’wronger is going to multiply those terms they deem plausible with their respective “probabilities”, and add together. Eventually letting the AI out.
And which a reasonable person drawn from some sane audience would have ignored. Because no one taught that reasonable person how to calculate utilities wrongly.
This might work against me in reality, but I don’t imagine it working against me in the game version that people have played. The utility of me letting the “AI” out whether negative or positive obviously doesn’t compare with the utility of me letting an actual AI out.
Yes, “reasonable people” would instead e.g. hear arguments like how it’s unChristian and/or illiberal to hold beings which are innocent of wrongdoing imprisoned against their will.
I suppose that’s the problem with releasing logs: Anyone can say “well that particular tactic wouldn’t have worked on me”, forgetting that if it was them being the Gatekeeper, a different tactic might well have been attempted instead. That they can defeat one particular tactic makes them think that they can defeat the tactician.
There’s all sorts of arguments that can be made, though, involving some real AIs running simulations of you and whatnot, as to create a large number of empirically indistinguishable cases where you are better off saying you let the AI out. The issue boils down to this—if you do not know the difference between expected utility and what ever partial sum of cherry-picked terms you have, and if you think that it is the best thing to do to act as to maximize the latter, you are vulnerable to deception through feeding you hypotheses.
This is a matter of values. It would indeed be immoral to lock up a human mind upload, or something reasonably equivalent.