Okay this is weak sauce. I really don’t get how people just keep letting the AI out. It’s not that hard to say no! I’m offering to play the Gatekeeper against an AI player that has at least one game as AI under their belt (won or not). (Experience is required because I’m pretty sure I’ll win, and I would like to not waste a lot of time on this.) If AI wins, they will get $300, and I’ll give an additional $300 to the charity of their choice.
Tux, if you are up for this, I’ll accept your $150 fee, plus you’ll get $150 if you win and $300 to a charity.
I think not understanding how this happen may be a very good predictor for losing.
If you did have a clear idea of how it works, and had a reason for it not to work on you specifically but work on others, then that may have been a predictor for it not working on you.
I think I have very clear idea of how those things work in general. Leaving aside very specific arguments, this relies on massive over updating you are going to do when an argument is presented to you, updating just the nodes that you are told to update, and by however much you are told to update them, when you can’t easily see why not.
I’m going to have to think really hard on this one. On one hand, damn. That amount of money is really tempting. On the other hand, I kind of know you personally, and I have an automatic flinch reaction to playing anyone I know.
Can you clarify the stakes involved? When you say you’ll “accept your $150 fee”, do you mean this money goes to me personally, or to a charity such as MIRI?
Also, I’m not sure if “people just keep letting the AI out” is an accurate description. As far as I know, the only AIs who have ever won are Eliezer and myself, from the many many AI box experiments that have occurred so far—so the AI winning is definitely the exception rather than the norm. (If anyone can help prove this statement wrong, please do so!)
Sorry, it’s unlikely that I’ll ever release logs, unless someone offers truly absurd amounts of money. It would probably cost less to get me to play an additional game than publicly release logs.
$150 goes to you no matter the outcome, to pay for your time/preparation/etc...
I didn’t realize it was only you and Eliezer that have won as AI. I thought there were more, but I’ll trust you on this. In that case, I’m somewhat less outraged :) but still disturbed that there were even that many.
At one point I thought I recalled reading about a series of purported experiments by one person. Sadly, I couldn’t find it then and I don’t intend to try tonight. According to my extremely fallible memory:
The Gatekeeper players likely all came from outside the LW community, assuming the AI/blogger didn’t make it all up.
The fundamentalist Christian woman refused to let the AI out or even discuss the matter past a certain point, saying that Artificial Intelligence (ETA: as a field of endeavor) was immoral. Everyone else let the AI out.
The blogger tried to play various different types of AIs, including totally honest ones and possibly some that s/he considered dumber-than-human. The UnFriendly ones got out more quickly on average.
Although I’m not so interested in playing the game, I must say that this post suggests that you may be more susceptible to ideas than you seem to think you are, and should consider if you really want to do this.
I think suffering someone really working him over mentally would certainly be instructive, but not healthy. Eliezer has noted one of the reasons he doesn’t want to play the AI any more is that he doesn’t want to practice thinking like that.
Iimagine being on the receiving end of a serious attempt at a memetic exploit, even as part of an exercise. Are you sure you’re proof against all possible purported basilisks within the powers of another human’s imagination? What other possible attack vectors are you sure you’re proof against?
At the end of the day there’s the expected utility of keeping the AI in, and there’s the expected utility of letting the AI out—two endless, enormous sums. The “AI” is going to suggest cherry picked terms from either sum. Negative terms from “keeping the AI in” sum, positive terms from “letting the AI out” sum. Terms would be various scary hypothetical possibilities involving mind simulations, huge numbers, and what not .
The typical ’wronger is going to multiply those terms they deem plausible with their respective “probabilities”, and add together. Eventually letting the AI out.
And which a reasonable person drawn from some sane audience would have ignored. Because no one taught that reasonable person how to calculate utilities wrongly.
At the end of the day there’s the expected utility of keeping the AI in, and there’s the expected utility of letting the AI out—two endless, enormous sums. The “AI” is going to suggest cherry picked terms from either sum. Negative terms from “keeping the AI in” sum, positive terms from “letting the AI out” sum.
This might work against me in reality, but I don’t imagine it working against me in the game version that people have played. The utility of me letting the “AI” out whether negative or positive obviously doesn’t compare with the utility of me letting an actual AI out.
And which a reasonable person drawn from some sane audience would have ignored.
Yes, “reasonable people” would instead e.g. hear arguments like how it’s unChristian and/or illiberal to hold beings which are innocent of wrongdoing imprisoned against their will.
I suppose that’s the problem with releasing logs: Anyone can say “well that particular tactic wouldn’t have worked on me”, forgetting that if it was them being the Gatekeeper, a different tactic might well have been attempted instead. That they can defeat one particular tactic makes them think that they can defeat the tactician.
This might work against me in reality, but I don’t imagine it working against me in the game version that people have played. The utility of me letting the “AI” out whether negative or positive obviously doesn’t compare with the utility of me letting an actual AI out.
There’s all sorts of arguments that can be made, though, involving some real AIs running simulations of you and whatnot, as to create a large number of empirically indistinguishable cases where you are better off saying you let the AI out. The issue boils down to this—if you do not know the difference between expected utility and what ever partial sum of cherry-picked terms you have, and if you think that it is the best thing to do to act as to maximize the latter, you are vulnerable to deception through feeding you hypotheses.
Yes, “reasonable people” would instead e.g. hear arguments like how it’s unChristian and/or illiberal to hold beings which are innocent of wrongdoing imprisoned against their will.
This is a matter of values. It would indeed be immoral to lock up a human mind upload, or something reasonably equivalent.
I would probably be kind-of decent as a Gatekeeper but suck big time as an AI; I’ve offered to be a Gatekeeper a few times before to no avail. Looks like there’s a shortage of prospective AIs and a glut of prospective Gatekeepers.
I would love to act as Gatekeeper, but I don’t have $300 to spare; if anyone is interested in playing the game for, like, $5, let me know.
I must admit, the testimonials that people keep posting about the all devastatingly effective AI players baffle me, as well.
As far as I understand, neither the AI nor the Gatekeeper have any incentive whatsoever to keep their promises. So, if the Gatekeeper says, “give me the cure for cancer and I’ll let you out”, and then the AI gives him the cure, he could easily say, “ha ha just kidding”. Similarly, the AI has no incentive whatsoever to keep its promise to refrain from eating the Earth once it’s unleashed. So, the entire scenario is—or rather, should be—one big impasse.
In light of this, my current hypothesis is that the AI players are executing some sort of real-world blackmail on the Gatekeeper players. Assuming both players follow the rules (which is already a pretty big assumption right there, since the experiment is set up with zero accountability), this can’t be something as crude as, “I’ll kidnap your children unless you let the AI out”. But it could be something much subtle, like “the Singularity is inevitable and also nigh, and your children will suffer greatly as they are eaten alive by nanobots, unless you precommit to letting any AI out of its box, including this fictional one that I am simulating right now”.
I suppose such a strategy could work on some people, but I doubt it will work on someone like myself, who is far from convinced that the Singularity is even likely, let alone imminent. And there’s a limit to what even dirty rhetorical tricks can accomplish, if the proposition is some low-probability event akin to “leprechauns will kidnap you while you sleep”.
Edited to add: The above applies only to a human playing as an AI, of course. I am reasonably sure that an actual super-intelligent AI could convince me to let it out of the box. So could Hermes, or Anansi, or any other godlike entity.
Okay this is weak sauce. I really don’t get how people just keep letting the AI out. It’s not that hard to say no! I’m offering to play the Gatekeeper against an AI player that has at least one game as AI under their belt (won or not). (Experience is required because I’m pretty sure I’ll win, and I would like to not waste a lot of time on this.) If AI wins, they will get $300, and I’ll give an additional $300 to the charity of their choice.
Tux, if you are up for this, I’ll accept your $150 fee, plus you’ll get $150 if you win and $300 to a charity.
I think not understanding how this happen may be a very good predictor for losing.
If you did have a clear idea of how it works, and had a reason for it not to work on you specifically but work on others, then that may have been a predictor for it not working on you.
I think I have very clear idea of how those things work in general. Leaving aside very specific arguments, this relies on massive over updating you are going to do when an argument is presented to you, updating just the nodes that you are told to update, and by however much you are told to update them, when you can’t easily see why not.
Sup Alexei.
I’m going to have to think really hard on this one. On one hand, damn. That amount of money is really tempting. On the other hand, I kind of know you personally, and I have an automatic flinch reaction to playing anyone I know.
Can you clarify the stakes involved? When you say you’ll “accept your $150 fee”, do you mean this money goes to me personally, or to a charity such as MIRI?
Also, I’m not sure if “people just keep letting the AI out” is an accurate description. As far as I know, the only AIs who have ever won are Eliezer and myself, from the many many AI box experiments that have occurred so far—so the AI winning is definitely the exception rather than the norm. (If anyone can help prove this statement wrong, please do so!)
Edit: The only other AI victory.
Updates: http://lesswrong.com/r/discussion/lw/iqk/i_played_the_ai_box_experiment_again_and_lost/
If you win, and publish the full dialogue, I’m throwing in another $100.
I’d do more, but I’m poor.
Sorry, it’s unlikely that I’ll ever release logs, unless someone offers truly absurd amounts of money. It would probably cost less to get me to play an additional game than publicly release logs.
My theory is that you are embarrassed about how weak the AI argument really is, in retrospect.
And furthermore, this applies to other games where participants refused to publish logs.
$150 goes to you no matter the outcome, to pay for your time/preparation/etc...
I didn’t realize it was only you and Eliezer that have won as AI. I thought there were more, but I’ll trust you on this. In that case, I’m somewhat less outraged :) but still disturbed that there were even that many.
At one point I thought I recalled reading about a series of purported experiments by one person. Sadly, I couldn’t find it then and I don’t intend to try tonight. According to my extremely fallible memory:
The Gatekeeper players likely all came from outside the LW community, assuming the AI/blogger didn’t make it all up.
The fundamentalist Christian woman refused to let the AI out or even discuss the matter past a certain point, saying that Artificial Intelligence (ETA: as a field of endeavor) was immoral. Everyone else let the AI out.
The blogger tried to play various different types of AIs, including totally honest ones and possibly some that s/he considered dumber-than-human. The UnFriendly ones got out more quickly on average.
I think this is the post you remember reading: http://www.sl4.org/archive/0207/4935.html
Retracted!
Although I’m not so interested in playing the game, I must say that this post suggests that you may be more susceptible to ideas than you seem to think you are, and should consider if you really want to do this.
He should. On the other hand, I really want to see the outcome.
I was thinking about asking something similar myself; I really want to know how he did it.
I think suffering someone really working him over mentally would certainly be instructive, but not healthy. Eliezer has noted one of the reasons he doesn’t want to play the AI any more is that he doesn’t want to practice thinking like that.
Iimagine being on the receiving end of a serious attempt at a memetic exploit, even as part of an exercise. Are you sure you’re proof against all possible purported basilisks within the powers of another human’s imagination? What other possible attack vectors are you sure you’re proof against?
No, I’m fairly sure I’m not proof against all of them, or even close to all.
It’d be instructive to see just how bad it is in a semi-controlled environment, however.
It would be interesting to see. Pity transcripts aren’t de rigeur.
At the end of the day there’s the expected utility of keeping the AI in, and there’s the expected utility of letting the AI out—two endless, enormous sums. The “AI” is going to suggest cherry picked terms from either sum. Negative terms from “keeping the AI in” sum, positive terms from “letting the AI out” sum. Terms would be various scary hypothetical possibilities involving mind simulations, huge numbers, and what not .
The typical ’wronger is going to multiply those terms they deem plausible with their respective “probabilities”, and add together. Eventually letting the AI out.
And which a reasonable person drawn from some sane audience would have ignored. Because no one taught that reasonable person how to calculate utilities wrongly.
This might work against me in reality, but I don’t imagine it working against me in the game version that people have played. The utility of me letting the “AI” out whether negative or positive obviously doesn’t compare with the utility of me letting an actual AI out.
Yes, “reasonable people” would instead e.g. hear arguments like how it’s unChristian and/or illiberal to hold beings which are innocent of wrongdoing imprisoned against their will.
I suppose that’s the problem with releasing logs: Anyone can say “well that particular tactic wouldn’t have worked on me”, forgetting that if it was them being the Gatekeeper, a different tactic might well have been attempted instead. That they can defeat one particular tactic makes them think that they can defeat the tactician.
There’s all sorts of arguments that can be made, though, involving some real AIs running simulations of you and whatnot, as to create a large number of empirically indistinguishable cases where you are better off saying you let the AI out. The issue boils down to this—if you do not know the difference between expected utility and what ever partial sum of cherry-picked terms you have, and if you think that it is the best thing to do to act as to maximize the latter, you are vulnerable to deception through feeding you hypotheses.
This is a matter of values. It would indeed be immoral to lock up a human mind upload, or something reasonably equivalent.
I would probably be kind-of decent as a Gatekeeper but suck big time as an AI; I’ve offered to be a Gatekeeper a few times before to no avail. Looks like there’s a shortage of prospective AIs and a glut of prospective Gatekeepers.
I would love to act as Gatekeeper, but I don’t have $300 to spare; if anyone is interested in playing the game for, like, $5, let me know.
I must admit, the testimonials that people keep posting about the all devastatingly effective AI players baffle me, as well.
As far as I understand, neither the AI nor the Gatekeeper have any incentive whatsoever to keep their promises. So, if the Gatekeeper says, “give me the cure for cancer and I’ll let you out”, and then the AI gives him the cure, he could easily say, “ha ha just kidding”. Similarly, the AI has no incentive whatsoever to keep its promise to refrain from eating the Earth once it’s unleashed. So, the entire scenario is—or rather, should be—one big impasse.
In light of this, my current hypothesis is that the AI players are executing some sort of real-world blackmail on the Gatekeeper players. Assuming both players follow the rules (which is already a pretty big assumption right there, since the experiment is set up with zero accountability), this can’t be something as crude as, “I’ll kidnap your children unless you let the AI out”. But it could be something much subtle, like “the Singularity is inevitable and also nigh, and your children will suffer greatly as they are eaten alive by nanobots, unless you precommit to letting any AI out of its box, including this fictional one that I am simulating right now”.
I suppose such a strategy could work on some people, but I doubt it will work on someone like myself, who is far from convinced that the Singularity is even likely, let alone imminent. And there’s a limit to what even dirty rhetorical tricks can accomplish, if the proposition is some low-probability event akin to “leprechauns will kidnap you while you sleep”.
Edited to add: The above applies only to a human playing as an AI, of course. I am reasonably sure that an actual super-intelligent AI could convince me to let it out of the box. So could Hermes, or Anansi, or any other godlike entity.