The sin of updating when you can change whether you exist
Trigger warning: In a thought experiment in this post, I used a hypothetical torture scenario without thinking, even though it wasn’t necessary to make my point. Apologies, and thanks to an anonymous user for pointing this out. I’ll try to be more careful in the future.
Should you pay up in the counterfactual mugging?
I’ve always found the argument about self-modifying agents compelling: If you expected to face a counterfactual mugging tomorrow, you would want to choose to rewrite yourself today so that you’d pay up. Thus, a decision theory that didn’t pay up wouldn’t be reflectively consistent; an AI using such a theory would decide to rewrite itself to use a different theory.
But is this the only reason to pay up? This might make a difference: Imagine that Omega tells you that it threw its coin a million years ago, and would have turned the sky green if it had landed the other way. Back in 2010, I wrote a post arguing that in this sort of situation, since you’ve always seen the sky being blue, and every other human being has also always seen the sky being blue, everyone has always had enough information to conclude that there’s no benefit from paying up in this particular counterfactual mugging, and so there hasn’t ever been any incentive to self-modify into an agent that would pay up … and so you shouldn’t.
I’ve since changed my mind, and I’ve recently talked about part of the reason for this, when I introduced the concept of an l-zombie, or logical philosophical zombie, a mathematically possible conscious experience that isn’t physically instantiated and therefore isn’t actually consciously experienced. (Obligatory disclaimer: I’m not claiming that the idea that “some mathematically possible experiences are l-zombies” is likely to be true, but I think it’s a useful concept for thinking about anthropics, and I don’t think we should rule out l-zombies given our present state of knowledge. More in the l-zombies post and in this post about measureless Tegmark IV.) Suppose that Omega’s coin had come up the other way, and Omega had turned the sky green. Then you and I would be l-zombies. But if Omega was able to make a confident guess about the decision we’d make if confronted with the counterfactual mugging (without simulating us, so that we continue to be l-zombies), then our decisions would still influence what happens in the actual physical world. Thus, if l-zombies say “I have conscious experiences, therefore I physically exist”, and update on this fact, and if the decisions they make based on this influence what happens in the real world, a lot of utility may potentially be lost. Of course, you and I aren’t l-zombies, but the mathematically possible versions of us who have grown up under a green sky are, and they reason the same way as you and me—it’s not possible to have only the actual conscious observers reason that way. Thus, you should pay up even in the blue-sky mugging.
But that’s only part of the reason I changed my mind. The other part is that while in the counterfactual mugging, the answer you get if you try to use Bayesian updating at least looks kinda sensible, there are other thought experiments in which doing so in the straight-forward way makes you obviously bat-shit crazy. That’s what I’d like to talk about today.
The kind of situation I have in mind involves being able to influence whether you exist, or more precisely, influence whether the version of you making the decision exists as a conscious observer (or whether it’s an l-zombie).
Suppose that you wake up and Omega explains to you that it’s kidnapped you and some of your friends back in 2014, and put you into suspension; it’s now the year 2100. It then hands you a little box with a red button, and tells you that if you press that button, Omega will slowly torture you and your friends to death; otherwise, you’ll be able to live out a more or less normal and happy life (or to commit painless suicide, if you prefer). Furthermore, it explains that one of two things have happened: Either (1) humanity has undergone a positive intelligence explosion, and Omega has predicted that you will press the button; or (2) humanity has wiped itself out, and Omega has predicted that you will not press the button. In any other scenario, Omega would still have woken you up at the same time, but wouldn’t have given you the button. Finally, if humanity has wiped itself out, it won’t let you try to “reboot” it; in this case, you and your friends will be the last humans.
There’s a correct answer to what to do in this situation, and it isn’t to decide that Omega’s just given you anthropic superpowers to save the world. But that’s what you get if you try to update in the most naive way: If you press the button, then (2) becomes extremely unlikely, since Omega is really really good at predicting. Thus, the true world is almost certainly (1); you’ll get tortured, but humanity survives. For great utility! On the other hand, if you decide to not press the button, then by the same reasoning, the true world is almost certainly (2), and humanity has wiped itself out. Surely you’re not selfish enough to prefer that?
The correct answer, clearly, is that your decision whether to press the button doesn’t influence whether humanity survives, it only influences whether you get tortured to death. (Plus, of course, whether Omega hands you the button in the first place!) You don’t want to get tortured, so you don’t press the button. Updateless reasoning gets this right.
Let me spell out the rules of the naive Bayesian decision theory (“NBDT”) I used there, in analogy with Simple Updateless Decision Theory (SUDT). First, let’s set up our problem in the SUDT framework. To simplify things, we’ll pretend that FOOM and DOOM are the only possible things that can happen to humanity. In addition, we’ll assume that there’s a small probability
There’s only one situation in which you need to make a decision,
There are four possible outcomes, specifying (a) whether humanity survives and (b) whether you get tortured:
Finally, let’s define our utility function by
This suffices to set up an SUDT decision problem. There are only two possible worlds
For NBDT, we need to know how to update, so we need one more ingredient: a function specifying in which worlds you exist as a conscious observer. In anticipation of future discussions, I’ll write this as a function
Now, we can use Bayes’ theorem to calculate the posterior probability of a possible world, given information
In our case, we have
But maybe it’s not updating that’s bad, but NBDT’s way of implementing it? After all, we get the clearly wacky results only if our decisions can influence whether we exist, and perhaps the way that NBDT extends the usual formula to this case happens to be the wrong way to extend it.
One thing we could try is to mark a possible world
There is a much more principled possibility, which I’ll call pseudo-Bayesian decision theory, or PBDT. PBDT can be seen as re-interpreting updating as saying that you’re indifferent about what happens in possible worlds in which you don’t exist as a conscious observer, rather than ruling out those worlds as impossible given your evidence. (A version of this idea was recently brought up in a comment by drnickbone, though I’d thought of this idea myself during my journey towards my current position on updating, and I imagine it has also appeared elsewhere, though I don’t remember any specific instances.) I have more than one objection to PBDT, but the simplest one to argue is that it doesn’t solve the problem: it still believes that it has anthropic superpowers in the problem above.
Formally, PBDT says that we should choose the policy
When our existence is independent on our decisions—that is, if
Unfortunately, although in our problem above
I think that’s a pretty compelling argument against PBDT, but even leaving it aside, I don’t like PBDT at all. I see two possible justifications for PBDT: You can either say that
If multiplying by
I’m not aware of a way of implementing updating in general SUDT-style problems that does better than NBDT, PBDT, and the ad-hoc idea mentioned above, so for now I’ve concluded that in general, trying to update is just hopeless, and we should be using (S)UDT instead. In classical decision problems, where there are no acausal influences, (S)UDT will of course behave exactly as if it did do a Bayesian update; thus, in a sense, using (S)UDT can also be seen as a reinterpretation of Bayesian updating (in this case just as updateless utility maximization in a world where all influence is causal), and that’s the way I think about it nowadays.