So, the flaw in your reasoning is after updating we’re in the city, e2 doesn’t go “logically impossible, infinite utility”. We just go “alright, off-history measure gets converted to 0 utility”, a perfectly standard update. So e2 updates to (0,0) (ie, there’s 0 probability I’m in this situation in the first place, and my expected utility for not getting into this situation in the first place is 0, because of probably dying in the desert)

As for the proper way to do this analysis, it’s a bit finicky. There’s something called “acausal form”, which is the fully general way of representing decision-theory problems. Basically, you just give an infrakernel Θ:Π→□(A×O)ω that tells you your uncertainty over which history will result, for each of your policies.

So, you’d have Θ(pay)=(0.99δalive,poor+0.01δdead,0) Θ(nopay)=(0.99δdead+0.01δalive,rich,0) Ie, if you pay, 99 percent chance of ending up alive but paying and 1 percent chance of dying in the desert, if you don’t pay, 99 percent chance of dying in the desert and 1 percent chance of cheating them, no extra utility juice on either one.

You update on the event “I’m alive”. The off-event utility function is like “being dead would suck, 0 utility”. So, your infrakernel updates to (leaving off the scale-and-shift factors, which doesn’t affect anything) Θ(pay)=(0.99δalive,poor,0) Θ(nopay)=(0.01δalive,rich,0) Because, the probability mass on “die in desert” got burned and turned into utility juice, 0 of it since it’s the worst thing. Let’s say your utility function assigns 0.5 utility to being alive and rich, and 0.4 utility to being alive and poor. So the utility of the first policy is 0.99⋅0.4=0.396, and the utility of the second policy is 0.01⋅0.5=0.005, so it returns the same answer of paying up. It’s basically thinking “if I don’t pay, I’m probably not in this situation in the first place, and the utility of “I’m not in this situation in the first place” is also about as low as possible.”

BUT

There’s a very mathematically natural way to translate any decision theory to “causal form”, and as it turns out, the process which falls directly out of the math is that thing where you go “hard-code in all possible policies, go to Nirvana if I behave differently from the hard-coded policy”. This has an advantage and a disadvantage. The advantage is that now your decision-theory problem is in the form of an infra-POMDP, a much more restrictive form, so you’ve got a much better shot at actually developing a practical algorithm for it. The disadvantage is that not all decision-theory problems survive the translation process unchanged. Speaking informally the “fairness criterion” to translate a decision-theory problem into causal form without too much loss in fidelity is something like “if I was mispredicted, would I actually have a good shot at entering the situation where I was mispredicted to prove the prediction wrong”.

Counterfactual mugging fits this. If Omega flubs its prediction, you’ve got a 50 percent chance of being able to prove it wrong. XOR blackmail fits this. If the blackmailer flubs its prediction and thinks you’ll pay up, you’ve got like a 90 percent chance of being able to prove it wrong. Newcomb’s problem fits this. If Omega flubs its prediction and thinks you’ll 2-box, you’ll definitely be able to prove it wrong.

Transparent Newcomb and Parfait’s Hitchiker don’t fit this “fairness property” (especially for 100 percent accuracy), and so when you translate them to a causal problem, it ruins things. If the predictor screws up and thinks you’ll 2-box on seeing a filled transparent box/won’t pay up on seeing you got saved, then the transparent box is empty/you die in the desert, and you don’t have a significant shot at proving them wrong.

Let’s see what’s going wrong. Our two a-environments are (predicted to not pay, probably die in desert,0) (predicted to pay, probably survive,0)

Update on the event “I didn’t die in the desert”. Then, neglecting scale-and-shift, our two a-environments are (0.01in city, pay implies Nirvana,0) (0.99in city, no-pay implies Nirvana,0)

Letting N be the utility of Nirvana, If you pay up, then the expected utilities of these are 0.01⋅N and 0.99⋅0.4 If you don’t pay up, then the expected utilities of these are 0.01⋅0.5 and 0.99⋅N

Now, if N is something big like 100, then the worst-case utilities of the policies are 0.396 vs 0.005, as expected, and you pay up. But if N is something like 1, then the worst-case utilities of the policies are 0.01 vs 0.005, which… well, it technicallygets the right answer, but those numbers are suspiciously close to each other, the agent isn’t thinking properly. And so, without too much extra effort tweaking the problem setup, it’s possible to generate decision-theory problems where the agent just straight-up makes the wrong decision after changing things to the causal setting.

Thank you so much for your detailed reply. I’m still thinking this through, but this is awesome. A couple things:

I don’t see the problem at the bottom. I thought we were operating in the setting where Nirvana meant infinite reward? It seems like of course if N is small, we will get weird behavior because the agent will sometimes reason over logically impossible worlds.

Is Parfit’s Hitchiker with a perfect predictor unsalvageable because it violates this fairness criteria?

The fairness criterion in your comment is the pseudocausality condition, right?

So, if you make Nirvana infinite utility, yes, the fairness criterion becomes “if you’re mispredicted, you have any probability at all of entering the situation where you’re mispredicted” instead of “have a significant probability of entering the situation where you’re mispredicted”, so a lot more decision-theory problems can be captured if you take Nirvana as infinite utility. But, I talk in another post in this sequence (I think it was “the many faces of infra-beliefs”) about why you want to do Nirvana as 1 utility instead of infinite utility.

Parfit’s Hitchiker with a perfect predictor is a perfectly fine acausal decision problem, we can still represent it, it just cannot be represented as an infra-POMDP/causal decision problem.

Yes, the fairness criterion is tightly linked to the pseudocausality condition. Basically, the acausal->pseudocausal translation is the part where the accuracy of the translation might break down, and once you’ve got something in pseudocausal form, translating it to causal form from there by adding in Nirvana won’t change the utilities much.

So, the flaw in your reasoning is after updating we’re in the city, e2 doesn’t go “logically impossible, infinite utility”. We just go “alright, off-history measure gets converted to 0 utility”, a perfectly standard update. So e2 updates to (0,0) (ie, there’s 0 probability I’m in this situation in the first place, and my expected utility for not getting into this situation in the first place is 0, because of probably dying in the desert)

As for the proper way to do this analysis, it’s a bit finicky. There’s something called “acausal form”, which is the fully general way of representing decision-theory problems. Basically, you just give an infrakernel Θ:Π→□(A×O)ω that tells you your uncertainty over which history will result, for each of your policies.

So, you’d have

Θ(pay)=(0.99δalive,poor+0.01δdead,0)

Θ(nopay)=(0.99δdead+0.01δalive,rich,0)

Ie, if you pay, 99 percent chance of ending up alive but paying and 1 percent chance of dying in the desert, if you don’t pay, 99 percent chance of dying in the desert and 1 percent chance of cheating them, no extra utility juice on either one.

You update on the event “I’m alive”. The off-event utility function is like “being dead would suck, 0 utility”. So, your infrakernel updates to (leaving off the scale-and-shift factors, which doesn’t affect anything)

Θ(pay)=(0.99δalive,poor,0)

Θ(nopay)=(0.01δalive,rich,0)

Because, the probability mass on “die in desert” got burned and turned into utility juice, 0 of it since it’s the worst thing. Let’s say your utility function assigns 0.5 utility to being alive and rich, and 0.4 utility to being alive and poor. So the utility of the first policy is 0.99⋅0.4=0.396, and the utility of the second policy is 0.01⋅0.5=0.005, so it returns the same answer of paying up. It’s basically thinking “if I don’t pay, I’m probably not in this situation in the first place, and the utility of “I’m not in this situation in the first place” is also about as low as possible.”

BUT

There’s a very mathematically natural way to translate any decision theory to “causal form”, and as it turns out, the process which falls directly out of the math is that thing where you go “hard-code in all possible policies, go to Nirvana if I behave differently from the hard-coded policy”. This has an advantage and a disadvantage. The advantage is that now your decision-theory problem is in the form of an infra-POMDP, a much more restrictive form, so you’ve got a much better shot at actually developing a practical algorithm for it. The disadvantage is that not all decision-theory problems survive the translation process unchanged. Speaking informally the “fairness criterion” to translate a decision-theory problem into causal form without too much loss in fidelity is something like “if I was mispredicted, would I actually have a good shot at entering the situation where I was mispredicted to prove the prediction wrong”.

Counterfactual mugging fits this. If Omega flubs its prediction, you’ve got a 50 percent chance of being able to prove it wrong.

XOR blackmail fits this. If the blackmailer flubs its prediction and thinks you’ll pay up, you’ve got like a 90 percent chance of being able to prove it wrong.

Newcomb’s problem fits this. If Omega flubs its prediction and thinks you’ll 2-box, you’ll definitely be able to prove it wrong.

Transparent Newcomb and Parfait’s Hitchiker don’t fit this “fairness property” (especially for 100 percent accuracy), and so when you translate them to a causal problem, it ruins things. If the predictor screws up and thinks you’ll 2-box on seeing a filled transparent box/won’t pay up on seeing you got saved, then the transparent box is empty/you die in the desert, and you don’t have a significant shot at proving them wrong.

Let’s see what’s going wrong. Our two a-environments are

(predicted to not pay, probably die in desert,0)

(predicted to pay, probably survive,0)

Update on the event “I didn’t die in the desert”. Then, neglecting scale-and-shift, our two a-environments are

(0.01in city, pay implies Nirvana,0)

(0.99in city, no-pay implies Nirvana,0)

Letting N be the utility of Nirvana,

If you pay up, then the expected utilities of these are 0.01⋅N and 0.99⋅0.4

If you don’t pay up, then the expected utilities of these are 0.01⋅0.5 and 0.99⋅N

Now, if N is something big like 100, then the worst-case utilities of the policies are 0.396 vs 0.005, as expected, and you pay up. But if N is something like 1, then the worst-case utilities of the policies are 0.01 vs 0.005, which… well, it technicallygets the right answer, but those numbers are suspiciously close to each other, the agent isn’t thinking properly. And so, without too much extra effort tweaking the problem setup, it’s possible to generate decision-theory problems where the agent just straight-up makes the wrong decision after changing things to the causal setting.

Thank you so much for your detailed reply. I’m still thinking this through, but this is awesome. A couple things:

I don’t see the problem at the bottom. I thought we were operating in the setting where Nirvana meant infinite reward? It seems like of course if N is small, we will get weird behavior because the agent will sometimes reason over logically impossible worlds.

Is Parfit’s Hitchiker with a perfect predictor unsalvageable because it violates this fairness criteria?

The fairness criterion in your comment is the pseudocausality condition, right?

So, if you make Nirvana infinite utility, yes, the fairness criterion becomes “if you’re mispredicted, you have any probability at all of entering the situation where you’re mispredicted” instead of “have a significant probability of entering the situation where you’re mispredicted”, so a lot more decision-theory problems can be captured if you take Nirvana as infinite utility. But, I talk in another post in this sequence (I think it was “the many faces of infra-beliefs”) about why you want to do Nirvana as 1 utility instead of infinite utility.

Parfit’s Hitchiker with a perfect predictor is a perfectly fine acausal decision problem, we can still represent it, it just cannot be represented as an infra-POMDP/causal decision problem.

Yes, the fairness criterion is tightly linked to the pseudocausality condition. Basically, the acausal->pseudocausal translation is the part where the accuracy of the translation might break down, and once you’ve got something in pseudocausal form, translating it to causal form from there by adding in Nirvana won’t change the utilities much.