Suppose Omega (the same superagent from Newcomb’s Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:
“I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads—can I have $1000?”
Obviously, the only reflectively consistent answer in this case is “Yes—here’s the $1000”, because if you’re an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers “Yes” to this sort of question—just like with Newcomb’s Problem or Parfit’s Hitchhiker.
Compute the probabilities P(0)..P(n) that this deal will be offered to you again n times in the future. Sum over 499500 P(n) (n) for all n and agree to pay if the sum is greater than 1,000.
Hi. Found the site about a week ago. I read the TDT paper and was intrigued enough to start poring through Eliezer’s old posts. I’ve been working my way through the sequences and following backlinks. The material on rationality has helped me reconstruct my brain after a Halt, Melt and Catch Fire event. Good stuff.
I observe that comments on old posts are welcome, and I notice no one has yet come back to this post with the full formal solution for this dilemma since the publication of TDT. So here it is.
Whatever our opponent’s decision algorithm may be, it will either depend to some degree on a prediction of our behavior, or it will not. It can only rationally base its decision on a prediction of our behavior to the extent that it believes a) we will attempt to predict its own behavior; and b) we will only cooperate to the extent that we believe it will cooperate. It will thus be incentivized to cooperate to the extent that it believes we can and will successfully condition our behavior on its own. To the extent that it chooses independently of any prediction of our behavior, its only rational choice is to defect. Any other choices it could make will do worse than the above decisions in all cases, and the following strategy will gain extra utility against any such suboptimal choices, as will become clear.
There are thus two unknown probabilities for us to condition on: The probability that the opponent will choose to cooperate iff it believes we will cooperate, which I’ll call P(c), and the probability that the opponent will be able to successfully predict our action, which I’ll call P(p).
We want to calculate the utility of cooperating, u(C), and the utility of defecting, u(D), for each relevant case. So we shut up and multiply.
If the opponent is uncooperative (~c), they always defect. Thus u(C|~c) = 0 and u(D|~c) = 1.
In cases where a potentially cooperative opponent successfully predicts our action, we have u(C|c,p) = 2 and u(D|c,p) = 1. When such an opponent guesses our action incorrectly, we have u(C|c,~p) = 0 and u(D|c,~p) = 3.
Thus we have:
u(C) = 2 P(c) P(p)
u(D) = P(~c) + P(c) P(p) + 3 P(c) P(~p) = 1 - P(c) + P(c) P(p) + 3 P(c) (1 - P(p)) = 1 + 2 P(c) − 2 P(c) * P(p)
We consider the one-shot dilemma first. An intelligent opponent can be assumed to have behavioral predictive capabilities at least better than chance (P(p) > 0.5), and perhaps approaching perfection (P(p) ~ 1) if it is a superintelligence. In the worst case, u(C) ~ P(c), and u(D) ~ 1 + P(c), and we should certainly defect. In the best case, u(C) ~ 2 * P(c) and u(D) ~ 1, so we should defect if P(c) < 0.5, that is, if we assess that our opponent is even slightly more likely to automatically defect than to consider cooperation. If we have optimistic priors for both probabilities due to applicable previous experiences or any immediate observational cues, we may choose to cooperate; we plug in our numbers, and shut up and multiply.
In the iterated case, we have the opportunity to observe our opponent’s behavior and update priors as we go. We are incentivized to cooperate when we believe it will do so, and to defect when we believe it will defect or when we believe we can do so without it anticipating us. Both players are incentivized to cooperate more often than defecting when they believe the other is good at predicting them. A player with a dominating edge in predictive capabilities can potentially attain a better result than pure mutual cooperation against an opponent with weak capabilities, through occasional strategic defections; the weaker player may find themselves incentivized not to punish the defector if they realize that they cannot do so without being anticipated and losing just as many utilons as the superior player would lose from the punishment. To the extent that the superior predictor can ascertain that their opponent is savvy enough to know when it’s dominated and would choose not to lose further utilons through vindictive play, such a strategy may be profitable.
Thus the spoils go to the algorithm with the best ability to predict an opponent. Skilled poker players or experts at “Rock-Paper-Scissors” could perform quite well in such contests against the average human. That could be fun to watch.