SSA rejects anthropic shadow, too

Link post

(or: “Please Do Anthropics with Actual Math”)

The anthropic shadow argument states something like:

Anthropic principle! If the LHC had worked, it would have produced a black hole or strangelet or vacuum failure, and we wouldn’t be here!

or:

You can’t use “we survived the cold war without nuclear war” as evidence of anything. Because of the anthropic principle, we could have blown up the human race in the 1960′s in 99% of all possible worlds and you’d still be born in one where we didn’t.

This argument has already been criticized (here, here). In criticizing it myself, I first leaned on reasoning about large universes (e.g. ones where there are 100 worlds with low nuclear risk and 100 with high nuclear risk in the same universe) in a way that implies similar conclusions to SIA, thinking that SSA in a small, single-world universe would endorse anthropic shadow. I realized I was reasoning about SSA incorrectly, and actually both SSA and SIA agree in rejecting anthropic shadow, even in a single-world universe.

Recapping the Doomsday Argument

To explain SSA and SIA, I’ll first recap the Doomsday Argument. Suppose, a priori, that it’s equally likely that there will be 1 billion humans total, or 1 trillion; for simplicity, we’ll only consider these two alternatives. We could number humans in order (numbering the humans 1, 2, …), and assume for simplicity that each human knows their index (which is the same as knowing how many humans there have been in the past). Suppose you observe that you are one of the first 1 billion humans. How should you reason about the probability that there will be 1 billion or 1 trillion humans total?

SSA reasons as follows. To predict your observations, you should first sample a random non-empty universe (in proportion to its prior probability), then sample a random observer in that universe. Your observations will be that observer’s observations, and, ontologically, you “are” that observer living in that universe.

Conditional on being in a billion-human universe, your probability of having an index between 1 and 1 billion is 1 in 1 billion, and your probability of having any other index is 0. Conditional on being in a trillion-human universe, your probability of having an index between 1 and 1 trillion is 1 in 1 trillion, and your probability of having any other index is 0.

You observe some particular index that does not exceed 1 billion; say, 45,639,104. You are 1000 more times likely to observe this index conditional on living in a billion-human universe than a trillion-human universe. Hence, you conclude that you are in a billion-human universe with 1000:1 odds.

This is called the “doomsday argument” because it implies that it’s unlikely that you have a very early index (relative to the total number of humans), so humans are likely to go extinct before many more humans have been born than have already been born.

SIA implies a different conclusion. To predict your observations under SIA, you should first sample a random universe proportional to its population, then sample a random observer in that universe. The probabilities of observing each index are the same conditional on the universe, but the prior probabilities of being in a given universe have changed.

We start with 1000:1 odds in favor of the 1-trillion universe, due to its higher population. Upon observing our sub-1-billion index, we get a 1000:1 update in favor of a 1-billion universe, as with SIA. These exactly cancel out, leaving the probability of each universe at 50%.

As Bostrom points out, both SSA and SIA have major counterintuitive implications. Better anthropic theories are desired. And yet, having some explicit anthropic theory at all helps to reason in a principled way that is consistent across situations. My contention is that anthropic shadow reasoning tends not to be principled in this way, and will go away when using SSA or SIA.

Analyzing the Cold War Scenario

Let’s analyze the cold war scenario as follows. Assume, for simplicity, that there is only one world with intelligent life in the universe; adding more worlds tends to shift towards SIA-like conclusions, but is otherwise similar. This world may be one of four types:

  1. High latent nuclear risk, cold war happens

  2. Low latent nuclear risk, cold war happens

  3. High latent nuclear risk, no cold war happens

  4. Low latent nuclear risk, no cold war happens

“High latent nuclear risk” means that, counterfactual on a cold war happening, there’s a high (99%) risk of extinction. “Low latent nuclear risk” means that, counterfactual on a cold war happening, there’s a low (10%) risk of extinction. Latent risk could vary due to factors such as natural human tendencies regarding conflict, and the social systems of cold war powers. For simplicity, assume that each type of world is equally likely a priori.

As with the doomsday argument, we need a population model. If there is no extinction, assume there are 1 billion humans who live before the cold war, and 1 billion humans who live after it. I will, insensitively, ignore the perspectives of those who live through the cold war, for the sake of simplicity. If there is extinction, assume there are 1 billion humans who live before the cold war, and none after.

The anthropic shadow argument asserts that, upon observing being post-cold-war, we should make no update in the probability of high latent nuclear risk. Let’s check this claim with both SSA and SIA.

SSA first samples a universe (which, in this case, contains only one world), then samples a random observer in the universe. It samples a universe of each type with ¼ probability. There are, however, two subtypes of type-1 or type-2 universes, namely, ones with nuclear extinction or not. It samples a nuclear extinction type-1 universe with ¼ * 99% probability, a non nuclear extinction type-1 universe with ¼ * 1% probability, a nuclear extinction type-2 universe with ¼ * 10% probability, and a non nuclear extinction type-2 universe with ¼ * 90% probability.

Conditional on sampling a universe with no nuclear extinction, the sampled observer will be prior to the cold war with 50% probability, and after the cold war with 50% probability. Conditional on sampling a universe with nuclear extinction, the sampled observer will be prior to the cold war with 100% probability.

Let’s first compute the prior probability of high latent nuclear risk. SSA believes you learn nothing “upon waking up” (other than that there is at least one observer, which is assured in this example), so this probability matches the prior over universes: type-1 and type-3 universes have high latent nuclear risk, and their probability adds up to 50%.

Now let’s compute the posterior. We observe being post cold war. This implies eliminating all universes with nuclear extinction, and all type-3 and type-4 universes. The remaining universes each have a 50% chance of a sampled observer being post cold war, so we don’t change their relative probabilities. What we’re left with are non-nuclear-extinction type-1 universes (with prior probability ¼ * 1%) and non-nuclear-extinction type-2 universes (with prior probability ¼ * 90%). Re-normalizing our probabilities, we end up with 90:1 odds in favor of being in a type-2 universe, corresponding to a 1.1% posterior probability of high latent nuclear risk. This is clearly an update, showing that SSA rejects the anthropic shadow argument.

Let’s try this again with SIA. We now sample universes proportional to their population times their original probability. Since universes with nuclear extinction have half the population, they are downweighted by 50% in sampling. So, the SIA prior probabilities for each universe are proportional to: ¼ * 99% * ½ for type-1 with nuclear extinction, ¼ * 1% for type-1 without nuclear extinction, ¼ * 10% * ½ for type-2 with nuclear extinction, ¼ * 90% for type-2 without nuclear extinction, ¼ for type-3, and ¼ for type-4. To get actual probabilities, we need to normalize these weights; in total, type-1 and type-3 (high latent risk) sum to 43.6%. It’s unsurprising that this is less than 50%, since SIA underweights worlds with low population, which are disproportionately worlds with high latent risk.

What’s the posterior probability of high latent risk under SIA? The likelihood ratios are the same as with SIA: we eliminate all universes with nuclear extinction or with no cold war (type-3 or type-4), and don’t change the relative probabilities otherwise. The posterior probabilities are now proportional to: ¼ * 1% for type-1 without nuclear extinction, and ¼ * 90% for type-2 without nuclear extinction. As with SSA, we now have 90:1 odds in favor of low latent nuclear risk.

So, SSA and SIA reach the same conclusion about posterior latent risk. Their updates only differ because their priors differ; SIA learns less about latent risk from observing being after the cold war, since it already expected low latent risk due to lower population conditional on high latent risk.

Moreover, the conclusion that they reach the same posterior doesn’t depend on the exact numbers used. The constants used in the odds calculation (probability of type-1 (¼), probability of type-2 (¼), probability of survival conditional on type-1 (1%), probability of survival conditional on type-2 (90%)) could be changed, and the SSA and SIA formulae would produce the same result, since the formulae use these constants in exactly the same combination.

To generalize this, SSA and SIA only disagree when universes having a non-zero posterior probability have different populations. In this example, all universes with a non-zero posterior probability have the same population, 2 billion, since we only get a different population (1 billion) conditional on nuclear extinction, and we observed being post cold war. They disagreed before taking the observation of being post cold war into account, because before that observation, there were possible universes with different populations.

Someone might disagree with this conclusion despite SSA and SIA agreeing on it, and think the anthropic shadow argument holds water. If so, I would suggest that they spell out their anthropic theory in enough detail that it can be applied to arbitrary hypothetical scenarios, like SSA and SIA. This would help to check that this is actually a consistent theory, and to check implications for other situations, so as to assess the overall plausibility of the theory. SSA and SIA’s large counterintuitive conclusions imply that it is hard to formulate a consistent anthropic theory with basically intuitive conclusions across different hypothetical situations, so checking new theories against different situations is critical for developing a better theory.

Michael Vassar points out that the anthropic shadow argument would suggest finding evidence of past nuclear mass deaths in the fossil record. This is because nuclear war is unlikely to cause total extinction, and people will rebuild afterwards. We could model this as additional population after a nuclear exchange, but less than if there were no nuclear exchange. If this overall additional population is high enough, then, conditional on being in a world with high nuclear risk, a randomly selected observer is probably past the first nuclear exchange. So, if anthropic shadow considerations somehow left us with a high posterior probability of being in a high-latent-risk world with cold wars, we’d still expect not to be before the first nuclear exchange. This is a different way of rejecting the conclusions of ordinary anthropic shadow arguments, orthogonal to the main argument in this post.

Probability pumping and time travel

The LHC post notes the theoretical possibility of probability pumping: if the LHC keeps failing “randomly”, we might conclude that it destroys the world if turned on, and use it as a probability pump, turning it on in some kind of catastrophic scenario. This is similar to quantum suicide.

I won’t explicitly analyze the LHC scenario; it’s largely similar to the cold war scenario. Instead I’ll consider the implications of probability pumping more generally, such as the model of time turners in Yudkowsky’s post “Causal Universes”.

In this proposal, the universe is a cellular automaton, and the each use of a time turner creates a loop consisting of an earlier event where some object “comes from the future”, and a later event where some object is later “sent back in time”. Many possible cellular automaton histories are searched over to find ones with only consistent time loops: ones where the same object is sent back in time as comes from the future.

The probability distribution over universe histories can’t be modeled as a causal Bayesian network; instead, it can be modeled as a factor graph. To form this factor graph, first create factors for each variable determined causally (not coming back from time), in the usual way of converting a Bayesian network into a factor graph. Then, add a factor that is 1 if the object going back in time matches the object coming from the future.

For simplicity, I’ll assume that the objects are strings of bits (e.g. writing on a paper), and that when using the time turner, you request some number of bits from the future. The most trivial case is when requesting zero bits; in this case, there are no additional variables in the factor graph (other than, perhaps, a constant variable, which makes no difference to the calculation), and the factor is always 1, since the empty string sent back in time matches the empty string gotten from the future.

What if we request one bit from the future, and send it back exactly? We add a binary variable to the causal graph, and note that the factor is always 1. We’ve doubled the number of possible worlds without changing their weight (product of all factors; probability is proportional to weight). If requesting n bits and sending them back exactly, we multiply the weight by 2^n in the branch where this time turner request is made.

Suppose we flip a coin to determine whether to use a time turner, and the time turner is only used in this scenario. If the coin comes up heads, we request 10 bits from the future and send them back exactly. In the factor graph, all possibilities (tails, heads and got 0000000000, heads and got 0000000001, …) have equal weight; therefore, the vast majority have heads. Accordingly, we would expect to observe the coin come up heads.

Conversely, imagine we request a non-zero number of bits from the time turner, flip all the bits, and send this bitwise negation back to the past. This is necessarily a different bit string, and so the factor will be 0. This means the weight of each universe is 0, and we can’t get a probability distribution by re-normalizing.

If we flip a coin to determine whether to never use the time turner or to use it once to request bits that are flipped before being sent back, then the coin will always come up such that the time turner is not used, since the other possible universe histories have weight 0.

Overall, this implies that stochastic events will tend to turn out such that the time turner is used often to send the same bit-string back, and never to send a different bit-string back. This could lead to, for example, a universe where time turners are only invented by agents (such as AGIs) that have enough security to ensure that the time turners are used often to send back the same bit string and never to send back a different bit string. Without security, it’s likely that some time turner, somewhere, would be used to send back a different bit string. We should expect the invention of time turners under insecure conditions to tend to result in universe destruction (setting the weight to 0), since the weights from different time turner uses are multiplicative, and anything multiplied by 0 is 0.

And so, given that we aren’t in a world with enough security to ensure no time turner universe destruction events, it is entirely unsurprising that we see no time turners around, even under the assumption that they’re physically possible!

Conclusion

Reasoning about non-causal factor graphs has a teleological feel to it: stochastic events arrange themselves such that time turners will tend to be used some ways and not other ways in the future. Anthropics involves similar non-causal probabilistic reasoning; if there could be 0 or 1 observers, SSA and SIA agree that we will only observe being in a universe with 1 observer (they disagree about the weighting between 1 observers and more observers, however), which means early universe events are more or less likely depending on the future. SSA additionally implies probability pumping from a subjective scenario, as in the Adam and Eve thought experiment. Chris Langan’s CTMU generalizes anthropic reasoning to more general teleological principles for modeling the universe, implying the existence of God. The theory is a bit too galaxy-brained for me to accept at this time, although it’s clearly onto something with relating anthropics to teleology.

Philosophically, I would suggest that anthropic reasoning results from the combination of a subjective view from the perspective of a mind, and an objective physical view-from-nowhere. The Kantian a priori (which includes the analytic and the a priori synthetic) is already subjective; Kantian spacetime is a field in which experiential phenomena appear. In reasoning about the probabilities of various universes, we imagine a “view from nowhere”, e.g. where the universe is some random stochastic Turing machine. I’ll call this the “universe a priori”. Put this way, these a prioris are clearly different things. SSA argues that we don’t learn anything upon waking up, and so our subjective prior distribution over universes should match the universe a priori; SIA, meanwhile, argues that we do learn something upon waking up, namely, that our universe is more likely to have a higher population. SSA’s argument is less credible when distinguishing the Kantian a priori from the universe a priori. And, these have to be different, because even SSA agrees that we can’t observe an empty universe; upon waking up, we learn that there is at least one observer.

Teleological reasoning can also show up when considering the simulation hypothesis. If the average technological civilization creates many simulations of its past (or, the pasts of alternative civilizations) in expectation, then most observers who see themselves in a technological but not post-singularity world will be in ancestor simulations. This is immediately true under SIA and is true under SSA in universe sufficiently large to ensure that at least one civilization creates many ancestor simulations. While there are multiple ways of attempting to reject the simulation argument, one is especially notable: even if most apparently pre-singularity observers are in ancestor simulations, these observers matter less to how the future plays out (and the distribution of observers’ experiences) than actually pre-singularity observers, who have some role in determining how the singularity plays out. Therefore, pragmatically, it makes sense for us to talk as if we probably live pre-singularity; we have more use for money if we live pre-singularity than if we live in an ancestor simulation, so we would rationally tend to bet in favor of being pre-singularity. This reasoning, however, implies that our probabilities depend on how much different agents can influence the future, which is a teleological consideration similar to with non-causal factor graphs. I’m not sure how to resolve all this yet, but it seems important to work out a more unified theory.