Eliezer: the rationality of defection in these finitely repeated games has come under some fire, and there’s a HUGE literature on it. Reading some of the more prominent examples may help you sort out your position on it.
My position is already sorted, I assure you. I cooperate with the Paperclipper if I think it will one-box on Newcomb’s Problem with myself as Omega.
As Paul says, this is very well trodden ground. Since it hasn’t been assumed that we are sure we know how the other party reasons, we might want to invest some early rounds in probing to see how the party thinks.
As someone who rejects defection as the inevitable rational solution to both the one-shot PD and the iterated PD, I’m interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.
True, the iteration does present the possibility of “exploiting” an “irrational” opponent whose “irrationality” you can probe and detect, if there’s any doubt about it in your mind. But that doesn’t resolve the fundamental issue of rationality; it’s like saying that you’ll one-box on Newcomb’s Problem if you think there’s even a slight chance that Omega is hanging around and will secretly manipulate box B after you make your choice. What if neither party to the IPD thinks there’s a realistic chance that the other party is stupid—if they’re both superintelligences, say? Do they automatically defect against each other for 100 rounds?
And are you really “exploiting” an “irrational” opponent, if the party “exploited” ends up better off? Wouldn’t you end up wishing you were stupider, so you could be exploited—wishing to be unilaterally stupider, regardless of the other party’s intelligence? Hence the phrase “regret of rationality”...
Do you mean “I cooperate with the Paperclipper if AND ONLY IF I think it will one-box on Newcomb’s Problem with myself as Omega AND I think it thinks I’m Omega AND I think it thinks I think it thinks I’m Omega, etc.” ? This seems to require an infinite amount of knowledge, no?
Edit: and you said “We have never interacted with the paperclip maximizer before”, so do you think it would one-box?
I think he means “I cooperate with the Paperclipper IFF it would one-box on Newcomb’s problem with myself (with my present knowledge) playing the role of Omega, where I get sent to rationality hell if I guess wrong”. In other words: If Elezier believes that if Elezier and Clippy were in the situation that Elezier would prepare for one-boxing if he expected Clippy to one-box and two-box if he expected Clippy to two-box, Clippy would one-box, then Elezier will cooperate with Clippy. Or in other words still: If Elezier believes Clippy to be ignorant and rational enough that it can’t predict Elezier’s actions but uses game theory at the same level as him, then Elezier will cooperate.
In the uniterated prisoner’s dilemma, there is no evidence, so it comes down to priors. If all players are rational mutual one-boxers, and all players are blind except for knowing they’re all mutual one-boxers, then they should expect everyone to make the same choice. If you just decide that you’ll defect/one-box to outsmart others, you may expect everyone to do so, so you’ll be worse off than if you decided not to defect (and therefore nobody else would rationally do so either). Even if you decide to defect based on a true random number generator, then for
(2,2) (0,3)
(3,0) (1,1)
the best option is still to cooperate 100% of the time.
If there are less rational agents afoot, the game changes. The expected reward for cooperation becomes 2(xr+(1-d-r)) and the reward for defection becomes 3(xr+(1-d-r))+d+(1-x)r=1+2(xr+(1-d-r)), where r is the fraction of agents who are rational, d is the fraction expected to defect, x is the probability with which you (and by extension other rational agents) will cooperate, and (1-d-r) is the fraction of agents who will always cooperate. Optimise for x in 2x(xr+(1-d-r))+(1-x)(1+2(xr+(1-d-r)))=1-x+2(xr-1-d-r)=x(2r-1)-(1+2d+2r); which means you should cooperate 100% of the time if the fraction of agents who are rational r > 0.5, and defect 100% of the time if r < 0.5.
In the iterated prisoner’s dilemma, this becomes more algebraically complicated since cooperation is evidence for being cooperative. So, qualitatively, superintelligences which have managed to open bridges between universes are probably/hopefully (P>0.5) rational, so they should cooperate on the last round, and by extension on every round before that. If someone defects, that’s strong evidence to them not being rational or having bad priors, and if the probability of them being rational drops below 0.5, you should switch to defecting. I’m not sure if you should cooperate if your opponent cooperates after defecting on the first round. Common sense says to give them another chance, but that may be anthropomorphising the opponent.
If the prior probability of inter-universal traders like Clippy and thought experiment::Elezier is r>0.5, and thought experiment::Elezier has managed not to make his mental makeup knowable to Clippy and vice versa, then both Elezier and Clippy ought to expect r>0.5. Therefore they should both decide to cooperate. If Elezier suspects that Clippy knows Elezier well enough to predict his actions, then for Elezier ‘d’ becomes large (Elezier suspects Clippy will defect if Elezier decides to cooperate). Elezier unfortunately can’t let himself be convinced that Clippy would cooperate at this point, because if Clippy knows Elezier, then Clippy can fake that evidence. This means both players also have strong motivation not to create suspicion in the other player: knowing the other player would still mean you lose, if the other player finds out you know. Still, if it saves a billion people, both players would want to investigate the other to take victory in the final iteration of the prisoner’s dilemma (using methods which provide as little evidence of the investigation as possible; the appropriate response to catching spies of any sort is defection).
My position is already sorted, I assure you. I cooperate with the Paperclipper if I think it will one-box on Newcomb’s Problem with myself as Omega.
As someone who rejects defection as the inevitable rational solution to both the one-shot PD and the iterated PD, I’m interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.
True, the iteration does present the possibility of “exploiting” an “irrational” opponent whose “irrationality” you can probe and detect, if there’s any doubt about it in your mind. But that doesn’t resolve the fundamental issue of rationality; it’s like saying that you’ll one-box on Newcomb’s Problem if you think there’s even a slight chance that Omega is hanging around and will secretly manipulate box B after you make your choice. What if neither party to the IPD thinks there’s a realistic chance that the other party is stupid—if they’re both superintelligences, say? Do they automatically defect against each other for 100 rounds?
And are you really “exploiting” an “irrational” opponent, if the party “exploited” ends up better off? Wouldn’t you end up wishing you were stupider, so you could be exploited—wishing to be unilaterally stupider, regardless of the other party’s intelligence? Hence the phrase “regret of rationality”...
Do you mean “I cooperate with the Paperclipper if AND ONLY IF I think it will one-box on Newcomb’s Problem with myself as Omega AND I think it thinks I’m Omega AND I think it thinks I think it thinks I’m Omega, etc.” ? This seems to require an infinite amount of knowledge, no?
Edit: and you said “We have never interacted with the paperclip maximizer before”, so do you think it would one-box?
I think he means “I cooperate with the Paperclipper IFF it would one-box on Newcomb’s problem with myself (with my present knowledge) playing the role of Omega, where I get sent to rationality hell if I guess wrong”. In other words: If Elezier believes that if Elezier and Clippy were in the situation that Elezier would prepare for one-boxing if he expected Clippy to one-box and two-box if he expected Clippy to two-box, Clippy would one-box, then Elezier will cooperate with Clippy. Or in other words still: If Elezier believes Clippy to be ignorant and rational enough that it can’t predict Elezier’s actions but uses game theory at the same level as him, then Elezier will cooperate.
In the uniterated prisoner’s dilemma, there is no evidence, so it comes down to priors. If all players are rational mutual one-boxers, and all players are blind except for knowing they’re all mutual one-boxers, then they should expect everyone to make the same choice. If you just decide that you’ll defect/one-box to outsmart others, you may expect everyone to do so, so you’ll be worse off than if you decided not to defect (and therefore nobody else would rationally do so either). Even if you decide to defect based on a true random number generator, then for
(2,2) (0,3)
(3,0) (1,1)
the best option is still to cooperate 100% of the time.
If there are less rational agents afoot, the game changes. The expected reward for cooperation becomes 2(xr+(1-d-r)) and the reward for defection becomes 3(xr+(1-d-r))+d+(1-x)r=1+2(xr+(1-d-r)), where r is the fraction of agents who are rational, d is the fraction expected to defect, x is the probability with which you (and by extension other rational agents) will cooperate, and (1-d-r) is the fraction of agents who will always cooperate. Optimise for x in 2x(xr+(1-d-r))+(1-x)(1+2(xr+(1-d-r)))=1-x+2(xr-1-d-r)=x(2r-1)-(1+2d+2r); which means you should cooperate 100% of the time if the fraction of agents who are rational r > 0.5, and defect 100% of the time if r < 0.5.
In the iterated prisoner’s dilemma, this becomes more algebraically complicated since cooperation is evidence for being cooperative. So, qualitatively, superintelligences which have managed to open bridges between universes are probably/hopefully (P>0.5) rational, so they should cooperate on the last round, and by extension on every round before that. If someone defects, that’s strong evidence to them not being rational or having bad priors, and if the probability of them being rational drops below 0.5, you should switch to defecting. I’m not sure if you should cooperate if your opponent cooperates after defecting on the first round. Common sense says to give them another chance, but that may be anthropomorphising the opponent.
If the prior probability of inter-universal traders like Clippy and thought experiment::Elezier is r>0.5, and thought experiment::Elezier has managed not to make his mental makeup knowable to Clippy and vice versa, then both Elezier and Clippy ought to expect r>0.5. Therefore they should both decide to cooperate. If Elezier suspects that Clippy knows Elezier well enough to predict his actions, then for Elezier ‘d’ becomes large (Elezier suspects Clippy will defect if Elezier decides to cooperate). Elezier unfortunately can’t let himself be convinced that Clippy would cooperate at this point, because if Clippy knows Elezier, then Clippy can fake that evidence. This means both players also have strong motivation not to create suspicion in the other player: knowing the other player would still mean you lose, if the other player finds out you know. Still, if it saves a billion people, both players would want to investigate the other to take victory in the final iteration of the prisoner’s dilemma (using methods which provide as little evidence of the investigation as possible; the appropriate response to catching spies of any sort is defection).