Counterfactual Mugging

TagLast edit: 13 Aug 2023 3:08 UTC by asher

Counterfactual mugging is a thought experiment for testing and differentiating decision theories, stated as follows:

Omega, a perfect predictor, flips a coin. If it comes up tails Omega asks you for $100. If it comes up heads, Omega pays you $10,000 if it predicts that you would have paid if it had come up tails.

Depending on how the problem is phrased, intuition calls for different answers. For example, Eliezer Yudkowsky has argued that framing the problem in a way Omega is a regular aspect of the environment which regularly asks such types of questions makes most people answer ‘Yes’. However, Vladimir Nesov points out that Rationalists Should Win could be interpreted as suggesting that we should not pay. After all, even though paying in the tails case would cause you to do worse in the counterfactual where the coin came up heads, you already know the counterfactual didn’t happen, so it’s not obvious that you should pay. This issue has been discussed in this question.

Formal decision theories also diverge. For Causal Decision Theory, you can only affect those probabilities that you are causally linked to. Hence, the answer should be ‘No’. In Evidential Decision Theory any kind of connection is accounted, then the answer should be ‘No’. Timeless Decision Theory answer seems undefined, however Yudkowsky has argued that if the problem is recurrently presented, one should answer ‘Yes’ on the basis of enhancing its probability of gaining $10000 in the next round. This seems to be Causal Decision Theory prescription as well. Updateless decision theory 1 prescribes giving the $100, on the basis your decision can influence both the ‘heads branch’ and ‘tails branch’ of the universe.

Regardless of the particular decision theory, it is generally agreed that if you can pre-commit in advance that you should do so. The dispute is purely over what you should do if you didn’t pre-commit.

Eliezer listed this in his 2009 post Timeless Decision Theory Problems I can’t Solve, although that was written before Updateless Decision Theory.

Variants

The Counterfactual Prisoner’s Dilemma is a symmetric variant of he original independently suggested by Chris Leong and Cousin_it:

Omega, a perfect predictor, flips a coin. If if comes up heads, Omega asks you for $100, then pays you $10,000 if it predict you would have paid if it had come up tails and you were told it was tails. If it comes up tails, Omega asks you for $100, then pays you $10,000 if it predicts you would have paid if it had come up heads and you were told it was heads

In this scenario, an updateless agent receives $9900 and an updateful agent receives nothing regardless of the coin flip, while in the original scenario the upateless agent only comes out ahead if the coin shows tails. This is claimed as a demonstration of the principle that when evaluating decisions we should consider the counterfactual and not just our particular branch of possibility space.

In Logical Counterfactual Mugging instead of flipping a coin, Omega tells you the 10,000th digit of pi, which we assume you don’t know off the top of your head. If it is odd, we treat it like heads in the original problem and if it is even treat it like tails. Logical inductors have been proposed as a solution to this problem. Applying this to Logical Counterfactual Mugging.

The Counterfactual Mugging Poker Game is a somewhat complicated variant by Scott Garrabrant. Player A receives a single card that is either high or low, which they can then reveal if they so desire. Player B then shares their true probability estimate that player A has a high card. Player B is essentially perfect at predicting your behaviour, but doesn’t get to see you after you’ve drawn the card. Additionally, player A loses $p^{2}$ dollars. If you show the card if it is low, then you lose 0. However, since B can predict your behaviour, this means that if the card had been high then player B would be able to guess that you had a high card even if you hadn’t revealed it. This would lose you a whole dollar and on average you’d be better if you always showed it. Garrabrant states that he prefers this scenario because Counterfactual Mugging feels like it is trying to trick you, while in this scenario you are the one creating the Counterfactual Mugging like situation to withhold information.

Comparison to Other Problem

In Two Types of Updatelessness, makes a distinction between all-upside updatelessness and mixed-upside updatelessness. In all-upside case, utilising an updateless decision theory provides a better result in the current situation, while in a mixed-upside case the benefits go to other possible selves. Unlike Newcomb’s Problem or Parfit’s Hitchhiker, Counterfactual Mugging is a mixed-upside case.

Blog posts

Counterfactual Mugging by Vladimir Nesov
Timeless Decision Theory: Problems I Can’t Solve by Eliezer Yudkowsky
Towards a New Decision Theory by Wei Dai
The sin of updating when you can change whether you exist by Benya Fallenstein
Counterfactual Mugging: Why should you Pay?- Question by Chris Leong

External links

Conterfactual Blackmail (of oneself) by Paul F. Christiano
Thoughts on Updatelessness by Caspar Oesterheld

Counterfactual Mugging

Vladimir_Nesov19 Mar 2009 6:08 UTC

84 points

299 comments2 min readLW link

Counterfactual Mugging and Logical Uncertainty

Vladimir_Nesov5 Sep 2009 22:31 UTC

16 points

21 comments3 min readLW link

Counterfactual Mugging Poker Game

Scott Garrabrant13 Jun 2018 23:34 UTC

131 points

4 comments1 min readLW link

The sin of updating when you can change whether you exist

Benya28 Feb 2014 1:25 UTC

17 points

17 comments10 min readLW link

Extremely Counterfactual Mugging or: the gist of Transparent Newcomb

Bongo9 Feb 2011 15:20 UTC

10 points

79 comments1 min readLW link

Hazing as Counterfactual Mugging?

SilasBarta11 Oct 2010 14:17 UTC

5 points

8 comments1 min readLW link

AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy

DanielFilan10 Mar 2021 4:30 UTC

35 points

12 comments36 min readLW link

[Question] Counterfactual Mugging: Why should you pay?

Chris_Leong17 Dec 2019 22:16 UTC

11 points

59 comments3 min readLW link

Timeless Decision Theory: Problems I Can’t Solve

Eliezer Yudkowsky20 Jul 2009 0:02 UTC

58 points

156 comments6 min readLW link

The Counterfactual Prisoner’s Dilemma

Chris_Leong21 Dec 2019 1:44 UTC

21 points

17 comments3 min readLW link

Counterfactual mugging: alien abduction edition

Emile28 Sep 2010 21:25 UTC

4 points

18 comments1 min readLW link

Logical Line-Of-Sight Makes Games Sequential or Loopy

StrivingForLegibility19 Jan 2024 4:05 UTC

40 points

0 comments7 min readLW link

Precommitting to paying Omega.

topynate20 Mar 2009 4:33 UTC

5 points

33 comments7 min readLW link

UDT might not pay a Counterfactual Mugger

winwonce21 Nov 2020 23:27 UTC

5 points

18 comments2 min readLW link

Naive TDT, Bayes nets, and counterfactual mugging

Stuart_Armstrong23 Oct 2012 15:58 UTC

26 points

39 comments3 min readLW link

Machine learning could be fundamentally unexplainable

George3d616 Dec 2020 13:32 UTC

26 points

15 comments15 min readLW link

(cerebralab.com)

Updatelessness doesn’t solve most problems

Martín Soto8 Feb 2024 17:30 UTC

137 points

45 comments12 min readLW link

Applying the Counterfactual Prisoner’s Dilemma to Logical Uncertainty

Chris_Leong16 Sep 2020 10:34 UTC

13 points

5 comments2 min readLW link

Disentangling four motivations for acting in accordance with UDT

Julian Stastny5 Nov 2023 21:26 UTC

35 points

4 comments7 min readLW link

Vladimir_Nesov 13 Sep 2021 18:50 UTC
4 points
0

However, Vladimir Nesov points out that Rationalists Should Win could be interpreted as suggesting that we should not pay.

I’m not familiar with this position. Citation needed?
- ESRogs 13 Sep 2021 21:22 UTC
  4 points
  0
  Parent
  I’d guess that line was referencing this:
  And so I ask you all: is the decision to give up $100 when you have no real benefit from it, only counterfactual benefit, an example of winning?
  From your Counterfactual Mugging post.
  - Vladimir_Nesov 13 Sep 2021 23:10 UTC
    2 points
    0
    Parent
    I was careful to avoid endorsing any conclusion in the post in order to foment discussion. If we add in my own position (that paying up is winning), the point is that the conception of winning that only cares about actuality is wrong. This does rest on the possibility of the unfortunate interpretation of winning as only caring about actuality, and therefore suggesting that we shouldn’t pay, though saying that I was pointing that out seems like a stretch. So maybe it’s kinda correct to say that I was pointing it out, but seriously misleading, to the point where it misled even me...

Coun­ter­fac­tual Mugging

Variants

Comparison to Other Problem

Blog posts

External links

See also

Counterfactual Mugging