Identifiability Problem for Superrational Decision Theories
Superrationality, and generalizations of it, must treat options differently depending on how they’re named.
Consider the penny correlation game: Both players decide independently on either head or tails. Then if they decided on the same thing, they each get one util, otherwise they get nothing. You play this game with an exact copy of yourself. You reason: since the other guy is an exact copy of me, whatever I do he will do the same thing. So we will get the util. Then you pick heads because its first alphabetically or some other silly consideration, and then you win. How good that you got to play with a copy, otherwise you would have only gotten half an util.
Now consider the penny anti-correlation game: Both players decide independently on either head or tails. If they decided the same thing, they get nothing, otherwise they get one util each. You play this game with an exact copy of yourself. You reason: since the other guy is an exact copy of me, whatever I do he will do the same thing. So the best I can do is to pick randomly with 50% chance, that way I get half an util. Thats if the gamemaster is nice. If he isnt nice, then identical copies in identical environments get the same result from their RNG. In that case you lose the game whatever you do. How bad that you had to play with a copy, otherwise you could have gotten half an util.
The problem is that these are the same game, only with the labels for one players actions switched (or for the other player. which is exactly the problem). Despite this, superrational reasoning gives us different results. This could happen because we have taken a theoretical symmetry from a physical one, and physical symmetry can pay attention to the detailed makeup of symbols. If we are aware of the difference, it should not be too surprising that there is a setup exploiting it.
Now I think the reasoning presented is correct in both cases, and the lesson here is for our expectations of rationality. From what I’ve seen people formed their expectations about what algorithm-controlling decision theories will do when they are worked out largely around rationality conditions, and what ought to be the outcome. In this way, common knowledge of rationality was supposed to guarantee various things, like pareto-optimal equilibria or an existent and unique solution concept. This has taken a blow here. Superrationality, from which we originally tried to generalize these properties, doesn’t provide them even for the symmetrical games where it is applicable. Furthermore, this is a strong example of a classical rationality condition being violated, and in a way that is unfixable in principle. No matter what rationality ends up being, you will not win the anti-correlation game with your copy, and you will win the correlation game, even though they only differ by a renaming.