**Counterfactual mugging** is a thought experiment for testing and differentiating decision theories, stated as follows:

Omega, a perfect predictor, flips a coin. If it comes up tails Omega asks you for $100. If it comes up heads, Omega pays you $10,000 if it predicts that you would have paid if it had come up tails.

Depending on how the problem is phrased, intuition calls for different answers. For example, Eliezer Yudkowsky has argued that framing the problem in a way Omega is a regular aspect of the environment which regularly asks such types of questions makes most people answer ‘Yes’. However, Vladimir Nesov points out that Rationalists Should Win could be interpreted as suggesting that we should not pay. After all, even though paying in the tails case would cause you to do worse in the counterfactual where the coin came up heads, you already know the counterfactual didn’t happen, so it’s not obvious that you should pay. This issue has been discussed in this question.

Formal decision theories also diverge. For Causal Decision Theory, you can only affect those probabilities that you are causally linked to. Hence, the answer should be ‘No’. In Evidential Decision Theory any kind of connection is accounted, then the answer should be ‘No’. Timeless Decision Theory answer seems undefined, however Yudkowsky has argued that if the problem is recurrently presented, one should answer ‘Yes’ on the basis of enhancing its probability of gaining $10000 in the next round. This seems to be Causal Decision Theory prescription as well. Updateless decision theory1 prescribes giving the $100, on the basis your decision can influence both the ‘heads branch’ and ‘tails branch’ of the universe.

Regardless of the particular decision theory, it is generally agreed that if you can pre-commit in advance that you should do so. The dispute is purely over what you should do if you didn’t pre-commit.

Eliezer listed this in his 2009 post Timeless Decision Theory Problems I can’t Solve, although that was written before Updateless Decision Theory.

## Variants

The Counterfactual Prisoner’s Dilemma is a symmetric variant of he original independently suggested by Chris Leong and Cousin_it:

Omega, a perfect predictor, flips a coin. If if comes up heads, Omega asks you for $100, then pays you $10,000 if it predict you would have paid if it had come up tails and you were told it was tails. If it comes up tails, Omega asks you for $100, then pays you $10,000 if it predicts you would have paid if it had come up heads and you were told it was heads

In this scenario, an updateless agent receives $9900 and an updateful agent receives nothing regardless of the coin flip, while in the original scenario the upateless agent only comes out ahead if the coin shows tails. This is claimed as a demonstration of the principle that when evaluating decisions we should consider the counterfactual and not just our particular branch of possibility space.

In Logical Counterfactual Mugging instead of flipping a coin, Omega tells you the 10,000th digit of pi, which we assume you don’t know off the top of your head. If it is odd, we treat it like heads in the original problem and if it is even treat it like tails. Logical inductors have been proposed as a solution to this problem. Applying this to Logical Counterfactual Mugging.

The Counterfactual Mugging Poker Game is a somewhat complicated variant by Scott Garrabrant. Player A receives a single card that is either high or low, which they can then reveal if they so desire. Player B then shares their true probability estimate that player A has a high card. Player B is essentially perfect at predicting your behaviour, but doesn’t get to see you after you’ve drawn the card. Additionally, player A loses dollars. If you show the card if it is low, then you lose 0. However, since B can predict your behaviour, this means that if the card had been high then player B would be able to guess that you had a high card even if you hadn’t revealed it. This would lose you a whole dollar and on average you’d be better if you always showed it. Garrabrant states that he prefers this scenario because Counterfactual Mugging feels like it is trying to trick you, while in this scenario you are the one creating the Counterfactual Mugging like situation to withhold information.

## Comparison to Other Problem

In Two Types of Updatelessness, makes a distinction between all-upside updatelessness and mixed-upside updatelessness. In all-upside case, utilising an updateless decision theory provides a better result in the current situation, while in a mixed-upside case the benefits go to other possible selves. Unlike Newcomb’s Problem or Parfit’s Hitchhiker, Counterfactual Mugging is a mixed-upside case.

## Blog posts

Timeless Decision Theory: Problems I Can’t Solve by Eliezer Yudkowsky

The sin of updating when you can change whether you exist by Benya Fallenstein

Counterfactual Mugging: Why should you Pay?- Question by Chris Leong

## External links

Thoughts on Updatelessness by Caspar Oesterheld

I’m not familiar with this position. Citation needed?

I’d guess that line was referencing this:

From your Counterfactual Mugging post.

I was careful to avoid endorsing any conclusion in the post in order to foment discussion. If we add in my own position (that paying up is winning), the point is that the conception of winning that only cares about actuality is wrong. This does rest on the possibility of the unfortunate interpretation of winning as only caring about actuality, and therefore suggesting that we shouldn’t pay, though saying that I was

pointing that outseems like a stretch. So maybe it’s kinda correct to say that I was pointing it out, but seriously misleading, to the point where it misled even me...