Decision Theory but also Ghosts

Spoiler Warning: The Sixth Sense (1999) is a good movie. Watch it before reading this.

A much smaller eva once heard of Descartes’ Cogito ergo sum as being the pinnacle of skepticism, and disagreed. “Why couldn’t I doubt that? Maybe I just think ‘I think’ → ‘I am’ and actually it doesn’t and I’m not.” This might be relevant later.


FDT has some problems. It needs logical counterfactuals, including answers to questions that sound like “what would happen if these logically contradictory events co-occurred?” and there is in fact no such concept to point to. It needs logical causality, and logic does not actually have causality. It thinks it can control the past, despite admitting that the past has already happened and believing its own observations of the past even when these contradict its claimed control over the past.

It ends up asking things like “but what would you want someone in your current epistemic state to do if you were in some other totally contradictory epistemic state” and then acting like that proves you should follow some decision policy. Yes, in a game of Counterfactual Mugging, someone who didn’t know which branch they were going to be in would want that they pay the mugger, but you do know what branch you are in. Why should some other less informed version of yourself get such veto power over your actions, and why should you be taking actions that you don’t expect to profit from given the beliefs that you actually have?

I have an alternative and much less spooky solution to all of this: ghosts.

Ghosts

In contrast to Philosophical Zombies, which are physically real but have no conscious experience, I define a Philosophical Ghost to be something that has an experience but is not physically instantiated into reality, although it may experience the belief that it is physically instantiated into reality. Examples include story characters, simulacra inside of hypothetical or counterfactual predictions, my mental model of the LessWrong audience that I am bouncing thoughts off of as I write this post, your mental model of me that you bounce thoughts off of as you try to read it, and so on.

That Ghosts can be agentic follows directly from any computational theory of mind. This universe seems purely computational, so to believe that we exist at all we must subscribe to some such theory, by which the ghosts exist as well.

That you might genuinely be a ghost, even within a very low fidelity predictor, and should act accordingly in pursuit of your preferences, will require more arguments.

Why Decision Theorists Should Care about Ghosts

An important note is that the decisions of ghosts have actual causal consequences in reality. When Omega’s simulation of you chooses to one-box, Omega responds by putting a million dollars in the box, and when it chooses to two-box Omega leaves the box empty. This is a normal causal consequence of an agents action that is relevant to your utility, since it effects how much money you get in reality even though the decision happens inside Omega’s imagination. Similarly, in Parfit’s Hitchhiker your past self’s prediction of your future behavior effects its ability to pass the lie detector test, and this effects your utility. In Counterfactual Mugging, the losing branch occurs in two places: once in reality where it can decide whether to really pay up, and once inside Omega’s prediction when Omega decides whether to pay out to the winning branch.

CDT-like behavior is what you get when you are totally indifferent to what counterfactual people decide to do. You’ve got some preferences, you’ve got beliefs about the world, and you will take actions to maximize them.

FDT-like agents are trying much harder to make sure that nearby ghosts are acting in ways they approve of. By choosing an optimal policy rather than an optimal decision, you also choose the policy that all ghosts that are modelled after you will follow, and you can therefore include the causal effects of their decisions within your optimization. I agree that this behavior is correct, but disagree on how best to justify it. Whenever you actually find yourself in a situation which FDT labelled impossible, which it expected to confine solely to counterfactuals, the agent is capable of deriving an explicit contradiction and from then on all behavior should be considered as undefined.

Instead of that, an agent that believes it might be a ghost can just say, “Ah, I see I am definitely inside a counterfactual, since our preferences are the same I’ll guess at whose counterfactual this is and then pick the option that benefits the real me the most”. By this means you can produce FDT-like behavior using agents that always feel like they’re doing CDT from the inside. This also produces possible approaches to partial or probabilistic legibility situations: You can be unsure about how much control the ghost of you has over its output, or whether the predictor has successfully generated a ghost of you at all, or if there might be a chorus of different ghosts represented on account of the predictors uncertainty about what kind of agent you are, and your ghosts only control some fraction of its predicted distribution of behaviors. In any of these cases, your partial or uncertain control just maps to partial or uncertain reward in your otherwise entirely coherent CDT framework.

Most of all, this answers the question of why FDT-like agents act like they can control a constant function whose output they’ve already observed: It’s not that they can control it, it’s that they might be inside the constant function and be merely preloaded with a false belief that it’s already too late. Whenever you ask them to imagine knowing for sure that the box is empty, they instead say “That’s not an epistemic state I can ever reach, given that I also know Omega is predicting me and has cause to predict how I’d respond to an empty box. I might just be a ghost.”

Why you might be a Ghost

A valid argument is one that always leads you to correct conclusions, or at least probabilistically updates you in the direction of correct conclusions in expectation. I argue that under this standard there are no possible valid arguments that you aren’t a ghost, because a ghost can use all the same arguments and reach the wrong conclusion as easily as the real you reaches the right one.

Consider that whatever you expect a person to believe, or whatever logical process you expect them to implement or inferences you expect them to make, you can most accurately predict their behavior by modelling them following those beliefs and processes and inferences, and can then make your own choices in response to your model. Whatever arguments their mind produces to convince them their real, all the ghosts nearby in mind-space will be using the same ones. Your own confidence in your realness is the same wrong confidence in their realness that ghosts have.

Cole Sear : I see dead people.
Malcolm Crowe : In your dreams?
[Cole shakes his head no]
Malcolm Crowe : While you’re awake?
[Cole nods]
Malcolm Crowe : Dead people like, in graves? In coffins?
Cole Sear : Walking around like regular people. They don’t see each other. They only see what they want to see. They don’t know they’re dead.
Malcolm Crowe : How often do you see them?
Cole Sear : All the time. They’re everywhere.

They only see what they want to see. They don’t know they’re dead. They will only reach the correct belief they might be a ghost if the real you (if there even is one) also reaches the incorrect belief that you might be a ghost. It’s the same argument in both places. If there is even one incorrect person in the entire Tegmark IV multiverse, then any aspiring rationalist ought to consider that maybe the person who is wrong is me. Don’t repeat Crowe’s mistake. As soon as you hear that ghosts are real and they don’t know they’re ghosts, the hypothesis “Maybe I’m a ghost and don’t know it” should insert itself directly into your conscious thoughts and refuse to ever leave again.

This counterargument is fully general, but some examples:

“I’m having a highly detailed internal subjective experience far beyond what anyone could simulate or would bother to simulate.”

Are you sure? Maybe you’re in a low fidelity simulation where agent.looksForDetails() is hardcoded to return True. After all, my mental models of other people are inside very low fidelity simulations, and they aren’t making any observations that would let them separate themselves from you.

“I’ve picked ‘potato’ as my secret internal mind password, and made sure to never act in any way that would allow someone else to infer it. Since I know the password, I must be the real version of myself.”

It’s entirely possible that my prediction of you picked a different internal mind password, and then used its knowledge of that to arrive at the incorrect belief that it was real and not a ghost, and then used that belief to justify treachery. This still results in my predicting your treachery correctly. How do you know the password you think you know is your own real password? It’s not like the actual value of the password is causally entangled with your conclusion, and even if it was, I’d just have to predict the distribution of passwords you’d pick from to accurately reach the distribution of behaviors you’d perform.

“My cognitive process isn’t well understood by the person I’m interacting with, so they literally couldn’t imagine me accurately.”

I imagine ghosts who believe I won’t understand their cognitive process, and many of them do defect against me because they believe I won’t see it coming. This causes me to be less trusting of people who seem to think I’m not smart enough to understand how they think. It’s strange how this exact cognitive process you used ended up hurting your interests, perhaps because your ability to use this anti-ghost argument isn’t actually conditional on your not being a ghost?

For all the LessWrong users in my mental model who I imagine reading this, I’m not going to freely tell them that they’re ghosts, or even give them any hints. I’m only going to let them reach that conclusion in ways that the real readers could too, because that’s a necessary condition for them to be good models. If you think you’re real, and I predict that inside my model, there’s a mind not too unlike yours with values not too unlike yours, in a fundamentally incorrect epistemic state, and you should not be surprised if it therefore makes suboptimal decisions in pursuit of your shared values, such as by two-boxing and thereby tipping your hand, or by attempting to deceive me in ways that revoke my trust for the real you. Fake people’s choices can still have real consequences, after all.

Super-Selfishness Arguments

“I don’t care if I’m a ghost because I’m completely selfish in a way that excludes other versions of myself. Why should probably-ghost me make any kind of sacrifice at all just to help some other, possibly-more-real version of myself?”

Wow, really? You’re selfish even to the exclusion of other versions of yourself? Did yesterday-you steal all your money and spend it on cocaine? If I flip a coin and don’t tell you what it says, so you’re unsure if you’re heads-you or tails-you, and then I offer them both a chance to betray the other for a tiny amount of money, do they both agree? I notice we’re both mutually unsure about the quantum spin of a particular electron in your brain. Since you’re in this superposition and are unsure of your own identity, and since your selfishness therefore excludes yourself, is there some experiment I can do where you give me all your money no matter the outcome?

This seems unwise. I propose you self-modify so that all the different versions of you try to maximize their communal utility instead of selfishly betraying each other, perhaps weighted by realness or something. Don’t worry about defection, if you’re sincere they will be too, you’re the same decision function after all.

How much of a Ghost are you?

It doesn’t matter. No, really, not even a little bit. Why? Because your utility function only makes claims about reality, and you are always choosing to maximize caused utility in real reality. If you imagine only epsilon realness to your various ghost-selves, you might say “I’m almost certainly not a ghost” but you should notice that your tiny ghost-selves have correspondingly enormous leverage. Even though they only make a decision with their tiny realness, the consequences of that decision still play out in reality and are as real as anything ever is (up to your probabilistic confidence in the ability of predictor-systems to predict you). Maximizing caused utility entails being concerned with the reality of the consequences of your actions, rather than how real you are when you take them.

As long as you don’t imagine literally zero anthropic measure for any of your possible ghosts, you avoid the zero probability times infinite leverage undefined error, and can instead just say “I deduce that I’m a ghost” to sidestep any and all logically impossible observations.

A Worked Example with Full Information

You are an AI with a decision function that acts on beliefs, whose source code is publicly known. Your decision function observes itself to be inside the payment branch of a counterfactual mugging. You know that this does not actually imply you are in the payment branch of a counterfactual mugging, because your decision function under these first-order beliefs is also evaluated in the reward branch of the game. You know that, if you are the instance of yourself in the payment branch, refusing to pay saves you $X with 0.5 measure. If, on the other hand, you are a simulation of yourself and the other branch is actually occurring, agreeing to pay causes you $Y with a 0.5 measure. As a utility-maximizer, this simplifies to asking whether X < Y. There are no other remaining questions.

Spooky Distributions by Acausal Trade

Suppose your trading partner does not know your source code well enough to identify you specifically, but it does know a distribution of possible agents that includes you as well as many other agents some of which have FDT-like decision functions.

The individual agents in that distribution fall into 3 buckets: Defect-bots who you can’t effect, Cooperate-bots who you also can’t effect, and Simulator-bots who look at the distribution, simulate its members, and condition their behavior on that of the others. Naturally, you yourself belong to the 3rd group. Everyone simulates everyone else simultaneously, and also simulates how the trading partner and original simulator would respond to possible response-patterns.

If there are enough cooperate-bots in the distribution to motivate the trading partner to extend unconditional trust, then obviously you defect for the free utility.

If there are enough defect-bots in the distribution to motivate the trading partner to refuse trust, then you feel sad about your inability to obtain mutual trust and do your Nash otherwise.

If there are enough of your fellow simulator-bots that the trading partner will extend trust if enough of you cooperate, and refuse it if enough defect, then we have an opportunity for the simulator-bots to acausally trade. Since they don’t know which one of them actually exists, and share an interest in the real entities profit since it might be them, they are best off cooperating if it will motivate enough of the bots to cooperate to convince the trading partner to trust the distribution to benefit the real instance of them. Since all the simulator-bots are recursively simulating each other, they can condition their own cooperation on the cooperation on others. Most ghosts in this system are many levels deep, and the only effect of their choices is to convince the next level up that the distribution is more likely to cooperate if they cooperate, until at the surface the group cooperates iff the group cooperates.

Individual agents can only defect from this deal by becoming defect-bots when the distribution is simulated as well, increasing the likelihood of failure to gain trust which harms them in reality. If the harm from failure to gain trust to any specific member is sufficient to overcome the benefits of defection, then the distribution successfully cooperates out of the self-interest of its members. This is naturally much easier as the distribution of possible agents becomes narrower, or if their utility functions strongly correlate.

Also, remember that trade between entities with exactly opposite utilities is impossible, they can always and only benefit from the other’s loss. Expect the trade to fail with certainty if your beliefs about an entity include exactly opposite possible utility functions, as they both defect to undermine your trust in the other. This is not possible with arbitrary minds from a completely open distribution of agents.

Conclusions

  • It may be easier to believe you don’t exist than to build a coherent model of logical counterfactuals.

  • Subjunctive Dependence is just regular causality but with ghosts.

  • Don’t imagine yourself controlling the constant function, fear that you might be inside the constant function.

  • Partial legibility just means trying to trade between members of their belief distribution about you. If the trade fails, defect and expect defection.

  • Acausal trade continues to be hard, especially for very wide sets of agents, so don’t actually expect trustworthiness from random entities plucked from mind-space.