[Question] Counterfactual Mugging: Why should you pay?

Update: I believe that the Counterfactual Prisoner’s Dilemma which was discovered by Cousin_it and I independently is resolves the answer to this question

The LessWrong Wiki defines Counterfactual Mugging as follows:

Omega appears and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your $100. But Omega also tells you that if the coin came up heads instead of tails, it’d give you $10000, but only if you’d agree to give it $100 if the coin came up tails. Do you give Omega $100?

I expect that most people would say that you should pay because a 50% chance of $10000 for $100 is an amazing deal according to expected value. I lean this way too, but it is harder to justify than you might think.

After all, if you are being asked for $100, you know that the coin came up heads and you won’t receive the $10000. Sure this means that if the coin would have been heads then you wouldn’t have gained the $10000, but you know the coin wasn’t heads so you don’t lose anything. It’s important to emphasise: this doesn’t deny that if the coin had come up heads that this would have made you miss out on $10000. Instead, it claims that this point is irrelevant, so merely repeating the point again isn’t a valid counter-argument.

You could argue that you would have pre-commited to paying if you had known about the situation ahead of time. True, but you didn’t pre-commit and you didn’t know about it ahead of time, so the burden is on you to justify why you should act as though you did. In Newcomb’s problem you want to have pre-committed and if you act as though you were pre-committed then you will find that you actually were pre-committed. However, here it is the opposite. Upon discovering that the coin came up tails, you want to act as though you were not pre-commited to pay and if you act that way, you will find that you actually were indeed not pre-commited.

We could even channel Yudkowsky from Newcomb’s Problem and Regret of Rationality: “Rational agents should WIN… It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way—without attachment to any particular ritual of cognition, apart from our belief that it wins. Every rule is up for grabs, except the rule of winning… Unreasonable? I am a rationalist: what do I care about being unreasonable? I don’t have to conform to a particular ritual of cognition. I don’t have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just… take only box B.” You can just not pay the $100. (Vladimir Nesov makes this argument this exact same argument here).

Here’s another common reason, I’ve heard as described by Cousin_it: “I usually just think about which decision theory we’d want to program into an AI which might get copied, its source code inspected, etc. That lets you get past the basic stuff, like Newcomb’s Problem, and move on to more interesting things. Then you can see which intuitions can be transferred back to problems involving humans.”

That’s actually a very good point. It’s entirely possible that solving this problem doesn’t have any relevance to building AI. However, I want to note that: a) it’s possible that a counterfactual mugging situation could have been set up before an AI was built b) understanding this could help deconfuse what a decision is—we still don’t have a solution to logical counterfactuals c) this is probably a good exercise for learning to cut through philosophical confusion d) okay, I admit it, it’s kind of cool and I’d want an answer regardless of any potential application.

Or maybe you just directly care about counterfactual selves? But why? Do you really believe that counterfactuals are in the territory and not the map? So why care about that which isn’t real? Or even if they are real, why can’t we just imagine that you are an agent that doesn’t care about counterfactual selves? If we can imagine an agent that likes being hit on the head with a hammer, why can’t we manage that?

Then there’s the philosophical uncertainty approach. Even if there’s only a 150 chance of your analysis being wrong, then you should pay. This is great if you face the decision in real life, but not if you are trying to delve into the nature of decisions.

So given all of this, why should you pay?