Fundamentals of kicking anthropic butt


Galactus

Introduction

An anthropic problem is one where the very fact of your existence tells you something. “I woke up this morning, therefore the earth did not get eaten by Galactus while I slumbered.” Applying your existence to certainties like that is simple—if an event would have stopped you from existing, your existence tells you that that it hasn’t happened. If something would only kill you 99% of the time, though, you have to use probability instead of deductive logic. Usually, it’s pretty clear what to do. You simply apply Bayes’ rule: the probability of the world getting eaten by Galactus last night is equal to the prior probability of Galactus-consumption, times the probability of me waking up given that the world got eaten by Galactus, divided by the probability that I wake up at all. More exotic situations also show up under the umbrella of “anthropics,” such as getting duplicated or forgetting which person you are. Even if you’ve been duplicated, you can still assign probabilities. If there are a hundred copies of you in a hundred-room hotel and you don’t know which one you are, don’t bet too much that you’re in room number 68.

But this last sort of problem is harder, since it’s not just a straightforward application of Bayes’ rule. You have to determine the probability just from the information in the problem. Thinking in terms of information and symmetries is a useful problem-solving tool for getting probabilities in anthropic problems, which are simple enough to use it and confusing enough to need it. So first we’ll cover what I mean by thinking in terms of information, and then we’ll use this to solve a confusing-type anthropic problem.

Parable of the coin

Eliezer has already written about what probability is in Probability is in the Mind. I will revisit it anyhow, using a similar example from Probability Theory: The Logic of Science.

It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there’s a 0.5 probability of heads and a 0.5 probability of tails. You draw the coin forth, flip it, and slap it down. What is the probability that when you take your hand away, you see heads?

Well, you performed a fair coin flip, so the chance of heads is 0.5. What’s the problem? Well, imagine the coin’s perspective. When you say “heads, 0.5,” that doesn’t mean the coin has half of heads up and half of tails up: the coin is already how it’s going to be, sitting pressed under your hand. And it’s already how it is with probability 1, not 0.5. If the coin is already tails, how can you be correct when you say that it’s heads with probability 0.5? If something is already determined, how can it still have the property of randomness?

The key idea is that the randomness isn’t in the coin, it’s in your map of the coin. The coin can be tails all it dang likes, but if you don’t know that, you shouldn’t be expected to take it into account. The probability isn’t a physical property of the coin, nor is it a property of flipping the coin—after all, your probability was still 0.5 when the truth was sitting right there under your hand. The probability is determined by the information you have about flipping the coin.

Assigning probabilities to things tells you about the map, not the territory. It’s like a machine that eats information and spits out probabilities, with those probabilities uniquely determined by the information that went in. Thinking about problems in terms of information, then, is about treating probabilities as the best possible answers for people with incomplete information. Probability isn’t in the coin, so don’t even bother thinking about the coin too much—think about the person and what they know.

When trying to get probabilities from information, you’re going to end up using symmetry a lot. Because information uniquely specifies probability, if you have identical information about two things, then you should assign them equal probability. For example, if someone switched the labels “heads” and “tails” in a fair coin flip, you couldn’t tell that it had been done—you never had any different information about heads as opposed to tails. This symmetry means you should give heads and tails equal probability. Because heads and tails are mutually exclusive (they don’t overlap) and exhaustive (there can’t be anything else), the probabilities have to add to 1 (which is all the probability there is), so you give each of them probability 0.5.

Brief note on useless information

Real-world problems, even when they have symmetry, often start you off with a lot more information than “it could be heads or tails.” If we’re flipping a real-world coin there’s the temperature to consider, and the humidity, and the time of day, and the flipper’s gender, and that sort of thing. If you’re an ordinary human, you are allowed to call this stuff extraneous junk. Sometimes, this extra information could theoretically be correlated with the outcome—maybe the humidity really matters somehow, or the time of day. But if you don’t know how it’s correlated, you have at least a de facto symmetry. Throwing away useless information is a key step in doing anything useful.

Sleeping Beauty

So thinking with information means assigning probabilities based on what people know, rather than treating probabilities as properties of objects. To actually apply this, we’ll use as our example the sleeping beauty problem:

Suppose Sleeping Beauty volunteers to undergo the following experiment, which is described to her before it begins. On Sunday she is given a drug that sends her to sleep, and a coin is tossed. If the coin lands heads, Beauty is awakened and interviewed on Monday, and then the experiment ends. If the coin comes up tails, she is awakened and interviewed on Monday, given a second dose of the sleeping drug that makes her forget the events of Monday only, and awakened and interviewed again on Tuesday. The experiment then ends on Tuesday, without flipping the coin again.
Beauty wakes up in the experiment and is asked, “With what subjective probability do you believe that the coin landed tails?”

If the coin lands heads, Sleeping Beauty is only asked for her guess once, while if the coin lands tails she is asked for her guess twice, but her memory is erased in between so she has the same memories each time.

When trying to answer for Sleeping Beauty, many people reason as follows: It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there’s a 0.5 probability of heads and a 0.5 probability of tails. So since the probability of tails is 0.5, Beauty should say “0.5,” Q.E.D. Readers may notice that this argument is all about the coin, not about what Beauty knows. This violation of good practice may help explain why it is dead wrong.

Thinking with information: some warmups

To collect the ingredients of the solution, I’m going to first go through some similar-looking problems.

In the Sleeping Beauty problem, she has to choose between three options—let’s call them {H, Monday}, {T, Monday}, and {T, Tuesday}. So let’s start with a very simple problem involving three options: the three-sided die. Just like for the fair coin, you know that the sides of the die are mutually exclusive and exhaustive, and you don’t know anything else that would be correlated with one side showing up more than another. Sure, the sides have different labels, but the labels are extraneous junk as far as probability is concerned. Mutually exclusive and exhaustive means the probabilities have to add up to one, and the symmetry of your information about the sides means you should give them the same probabilities, so they each get probability 13.

Next, what should Sleeping Beauty believe before the experiment begins? Beforehand, her information looks like this: she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails.

Diagram of the Sleeping Beauty problem before it starts
This way of stating her information is good enough most of the time, but what’s going on is clearer if we’re a little more formal. There are three exhaustive (but not mutually exclusive) options: {H, Monday}, {T, Monday}, and {T, Tuesday}. She knows that anything with heads is mutually exclusive with anything with tails, and that {T, Tuesday} happens if and only if {T, Monday} happened.

One good way to think of this last piece of information is as a special “AND” structure containing {T, Monday} and {T, Tuesday}, like in the picture to the right. What it means is that since the things that are “AND” happen together, the other probabilities won’t change if we merge them into a single option, which I shall call {T, Both}. Now we have two options, {H, Monday} and {T, Both}, which are both exhaustive and mutually exclusive. This looks an awful lot like the fair coin, with probabilities of 0.5.

But can we leave it at that? Why shouldn’t two days be worth twice as much probability as one day, for instance? Well, it turns out we can leave at that, because we have now run out of information from the original problem. We used that there were three options, we used that they were exhaustive, we used that two of them always happened together, and we used that the remaining two were mutually exclusive. That’s all, and so that’s where we should leave it—any more and we’d be making up information not in the problem, which is bad.

So to decompress, before the experiment begins Beauty assigns probability 0.5 to the coin landing heads and being woken up on Monday, probability 0.5 to the coin landing tails and being woken up on Monday, and probability 0.5 to the coin landing tails and being woken up on Tuesday. This adds up to 1.5, but that’s okay since these things aren’t all mutually exclusive.

Diagram of the two coins problem
Okay, now for one last warmup. Suppose you have two coins. You flip the first one, and if it lands heads, you place the second coin on the table heads up. If the first coin lands tails, though, you flip the second coin.

This new problem looks sort of familiar. You have three options, {H, H}, {T, H} and {T, T}, and these options are mutually exclusive and exhaustive. So does that mean it’s the same set of information as the three-sided die? Not quite. Similar to the “AND” previously, my drawing for this problem has an “OR” between {T, H} and {T,T}, representing additional information.

I’d like to add a note here about my jargon. “AND” makes total sense. One thing happens and another thing happens. “OR,” however, doesn’t make so much sense, because things that are mutually exclusive are already “or” by default—one thing happens or another thing happens. What it really means is that {H, H} has a symmetry with the sum of {T, H} and {T, T} (that is, {T, H} “OR” {T, T}). The “OR” can also be thought of as information about {H, H} instead—it contains what could have been both the {H, H} and {H, T} events, so there’s a four-way symmetry in the problem, it’s just been relabeled.

When we had the “AND” structure, we merged the two options together to get {tails, both}. For “OR,” we can do a slightly different operation and replace {T, H} “OR” {T, T} by their sum, {T, either}. Now the options become {H, H} and {T, either}, which are mutually exclusive and exhaustive, which gets us back to the fair coin. Then, because {T, H} and {T, T} have a symmetry between them, you split the probability from {T, either} evenly to get probabilities of 0.5, 0.25, and 0.25.

Okay, for real now

Okay, so now what do things look like once the experiment has started? In English, now she knows that she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails, went to sleep, and now she’s been woken up.

This might not seem that different from before, but the “anthropic information” that Beauty is currently one of the people in the experiment changes the formal picture a lot. Before, the three options were not mutually exclusive, because she was thinking about the future. But now {H, Monday}, {T, Monday}, and {T, Tuesday} are both exhaustive and mutually exclusive, because only one can be the case in the present. From the coin flip, she still knows that anything with heads is mutually exclusive with anything with tails. But once two things are mutually exclusive you can’t make them any more mutually exclusive.

But the “AND” information! What happens to that? Well, that was based on things always happening together, and we just got information that those things are mutually exclusive, so there’s no more “AND.” It’s possible to slip up here and reason that since there used to be some structure there, and now they’re mutually exclusive, it’s one or the other, therefore there must be “OR” information. At least the confusion in my terminology reflects an easy confusion to have, but this “OR” relationship isn’t the same as mutual exclusivity. It’s a specific piece of information that wasn’t in the problem before the experiment, and wasn’t part of the anthropic information (that was just mutual exclusivity). So Monday and Tuesday are “or” (mutually exclusive), but not “OR” (can be added up to use another symmetry).

And so this anthropic requirement of mutual exclusivity turns out to make redundant or render null a big chunk of the previous information, which is strange. You end up left with three mutually exclusive, exhaustive options, with no particular asymmetry. This is the three-sided die information, and so each of {H, Monday}, {T, Monday}, and {T, Tuesday} should get probability 13. So when asked for P(tails), Beauty should answer 23.

“SSA” and “SIA”

When assigning prior probabilities in anthropic problems, there are two main “easy” ways to assign probabilities, and these methods go by the acronyms “SSA” and “SIA.” “SSA” is stated like this1:

All other things equal, an observer should reason as if they are randomly selected from the set of all actually existent observers (past, present and future) in their reference class.

For example, if you wanted the prior probability that you lived in Sweden, you might say ask “what proportion of human beings have lived in Sweden?”

On the other hand, “SIA” looks like this2:

All other things equal, an observer should reason as if they are randomly selected from the set of all possible observers.

Now the question becomes “what proportion of possible observers live in Sweden?” and suddenly it seems awfully improbable that anyone could live in Sweden.

The astute reader will notice that these two “assumptions” correspond to two different sets of starting information. If you want a quick exercise, figure out what those two sets of information are now. I’ll wait for you in the next paragraph.

Hi again. The information assumed for SSA is pretty straightforward. You are supposed to reason as if you know that you’re an actually existent observer, in some “reference class.” So an example set of information would be “I exist/​existed/​will exist and am a human.” Compared to that, SIA seems to barely assume any information at all—all you get to start with is “I am a possible observer.” Because “existent observers in a reference class” are a subset of possible observers, you can transform SIA into SSA by adding on more information, e.g. “I exist and am a human.” And then if you want to represent a more complicated problem, you have to add extra information on top of that, like “I live in 2012″ or “I have two X chromosomes.”

Trouble only sneaks in if you start to see these acronyms as mysterious probability generators rather than sets of starting information to build on. So don’t do that.

Closing remarks

When faced with straightforward problems, you usually don’t need to use this knowledge of where probability comes from. It’s just rigorous and interesting, like knowing how to do integration as a Riemann sum. But whenever you run into foundational or even particularly confusing problems, it’s good to remember that probability is about making the best use you can of incomplete information. If not, you run the risk of a few silly failure modes, or even (gasp) frequentism.

I recently read an academic paper3 that used the idea that in a multiverse, there will be some universe where a thrown coin comes up heads every time, and so the people in that universe will have very strange ideas about how coins work. Therefore, this actual academic paper argued, since reasoning with probability can lead people to be wrong, it cannot be applied to anything like a multiverse.

My response is: what have you got that works better? In this post we worked through assigning probabilities by using all of our information. If you deviate from that, you’re either throwing information away or making it up. Incomplete information lets you down sometimes, that’s why it’s called incomplete. But that doesn’t license you to throw away information or make it up, out of some sort of dissatisfaction with reality. The truth is out there. But the probabilities are in here.