Here’s yet another problem whose proper formulation I’m still not sure of, and it runs as follows. First, consider the Prisoner’s Dilemma. Informally, two timeless decision agents with common knowledge of the other’s timeless decision agency, but no way to communicate or make binding commitments, will both Cooperate because they know that the other agent is in a similar epistemic state, running a similar decision algorithm, and will end up doing the same thing that they themselves do. In general, on the True Prisoner’s Dilemma, facing an opponent who can accurately predict your own decisions, you want to cooperate only if the other agent will cooperate if and only if they predict that you will cooperate. And the other agent is reasoning similarly: They want to cooperate only if you will cooperate if and only if you accurately predict that they will cooperate.
But there’s actually an infinite regress here which is being glossed over—you won’t cooperate just because you predict that they will cooperate, you will only cooperate if you predict they will cooperate if and only if you cooperate. So the other agent needs to cooperate if they predict that you will cooperate if you predict that they will cooperate… (...only if they predict that you will cooperate, etcetera).
On the Prisoner’s Dilemma in particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion, so that you can expect their action to be the symmetrical analogue of your own—in which case (C, C) is preferable to (D, D). But what if you’re facing a more general decision problem, with many agents having asymmetrical choices, and everyone wants to have their decisions depend on how they predict that other agents’ decisions depend on their own predicted decisions? Is there a general way of resolving the regress?
Yes. You can condition on two prior probabilities: that an agent will successfully predict your actual action, and that an agent will respond in a particular way based on the action they predict you to take. For the solution in the case of the Truly Iterated Prisoner’s Dilemma, see here.
(EDIT, 6/18/2011:
On further consideration, my assertion—that the indicated solution to the Prisoner’s Dilemma constitutes a general method for resolving infinite regress in the full class of problems specified—is a naive oversimplification. The indicated solution to a specific dilemma is suggestive of an area of solution space to search for the general solution or solutions to specific similar problems, but considerable work remains to be done before a general solution to the problem class can be justifiably claimed. I’ll analyze the full problem further and see what I come up with.)
Hi. Found the site about a week ago. I read the TDT paper and was intrigued enough to start poring through Eliezer’s old posts. I’ve been working my way through the sequences and following backlinks. The material on rationality has helped me reconstruct my brain after a Halt, Melt and Catch Fire event. Good stuff.
I observe that comments on old posts are welcome, and I notice no one has yet come back to this post with the full formal solution for this dilemma since the publication of TDT. So here it is.
Whatever our opponent’s decision algorithm may be, it will either depend to some degree on a prediction of our behavior, or it will not. It can only rationally base its decision on a prediction of our behavior to the extent that it believes a) we will attempt to predict its own behavior; and b) we will only cooperate to the extent that we believe it will cooperate. It will thus be incentivized to cooperate to the extent that it believes we can and will successfully condition our behavior on its own. To the extent that it chooses independently of any prediction of our behavior, its only rational choice is to defect. Any other choices it could make will do worse than the above decisions in all cases, and the following strategy will gain extra utility against any such suboptimal choices, as will become clear.
There are thus two unknown probabilities for us to condition on: The probability that the opponent will choose to cooperate iff it believes we will cooperate, which I’ll call P(c), and the probability that the opponent will be able to successfully predict our action, which I’ll call P(p).
We want to calculate the utility of cooperating, u(C), and the utility of defecting, u(D), for each relevant case. So we shut up and multiply.
If the opponent is uncooperative (~c), they always defect. Thus u(C|~c) = 0 and u(D|~c) = 1.
In cases where a potentially cooperative opponent successfully predicts our action, we have u(C|c,p) = 2 and u(D|c,p) = 1. When such an opponent guesses our action incorrectly, we have u(C|c,~p) = 0 and u(D|c,~p) = 3.
Thus we have:
u(C) = 2 P(c) P(p)
u(D) = P(~c) + P(c) P(p) + 3 P(c) P(~p) = 1 - P(c) + P(c) P(p) + 3 P(c) (1 - P(p)) = 1 + 2 P(c) − 2 P(c) * P(p)
We consider the one-shot dilemma first. An intelligent opponent can be assumed to have behavioral predictive capabilities at least better than chance (P(p) > 0.5), and perhaps approaching perfection (P(p) ~ 1) if it is a superintelligence. In the worst case, u(C) ~ P(c), and u(D) ~ 1 + P(c), and we should certainly defect. In the best case, u(C) ~ 2 * P(c) and u(D) ~ 1, so we should defect if P(c) < 0.5, that is, if we assess that our opponent is even slightly more likely to automatically defect than to consider cooperation. If we have optimistic priors for both probabilities due to applicable previous experiences or any immediate observational cues, we may choose to cooperate; we plug in our numbers, and shut up and multiply.
In the iterated case, we have the opportunity to observe our opponent’s behavior and update priors as we go. We are incentivized to cooperate when we believe it will do so, and to defect when we believe it will defect or when we believe we can do so without it anticipating us. Both players are incentivized to cooperate more often than defecting when they believe the other is good at predicting them. A player with a dominating edge in predictive capabilities can potentially attain a better result than pure mutual cooperation against an opponent with weak capabilities, through occasional strategic defections; the weaker player may find themselves incentivized not to punish the defector if they realize that they cannot do so without being anticipated and losing just as many utilons as the superior player would lose from the punishment. To the extent that the superior predictor can ascertain that their opponent is savvy enough to know when it’s dominated and would choose not to lose further utilons through vindictive play, such a strategy may be profitable.
Thus the spoils go to the algorithm with the best ability to predict an opponent. Skilled poker players or experts at “Rock-Paper-Scissors” could perform quite well in such contests against the average human. That could be fun to watch.