The SIA population update can be surprisingly small

With many thanks to Damon Binder, and the spirited conversations that lead to this post, and to Anders Sandberg.

People often think that the self-indication assumption (SIA) implies a huge number of alien species, millions of times more than otherwise. Thought experiments like the presumptuous philosopher seem to suggest this.

But here I’ll show that, in many cases, updating on SIA doesn’t change the expected number of alien species much. It all depends on the prior, and there are many reasonable priors for which the SIA update does nothing more than double the probability of life in the universe[1].

This can be the case even if the prior says that life is very unlikely! We can have a situation where we are astounded, flabbergasted, and disbelieving about our own existence—“how could we exist, how can this beeeeee?!?!?!?”—and still not update much—“well, life is still pretty unlikely elsewhere, I suppose”.

In the one situation where we have an empirical distribution, the “Dissolving the Fermi Paradox” paper, the effect of the SIA anthropics update is to multiply the expected civilization per planet by seven. Not seven orders of magnitude—just seven.

The formula

Let be the probability of advanced space-faring life evolving on a given planet; for the moment, ignore issues of life expanding to other planets from their one point of origin. Let be the prior distribution of , with mean and variance . This means that, if we visit another planet, our probability of finding life is .

On this planet, we exist[2]. Then if we update on our existence we get a new distribution ; this distribution will have mean :

To see a proof of this result, look at this footnote[3].

Define to be this multiplicative factor between and ; we’ll show that there are many reasonable situations where is surprisingly low: think to , rather than in the millions or billions.

Beta distributions I

Let’s start with the most uninformative prior of all: a uniform prior over . The expectation of is , so, without any other information, we expect a planet to have life with probability. The variance is .

Thus if we update on our existence on Earth, we get the posterior ; the mean of this is (either direct calculation or using ).

Even though this change in expectation is multiplicatively small, it does seem that the uniform prior and the are very different, with heavily skewed to the right. But now consider what happens if we look at Mars and notice that it hasn’t got life. The probability of no life, given , is . Updating on this and renormalising gives a posterior :

The expectation of , symmetric around , is of course . Thus one extra observation (that Mars is dead) has undone, in expectation, all the anthropic impact of our own existence.

This is an example of a beta distribution for and (yes, beta distributions have a parameter called and another one that’s ; just deal with it). Indeed, the uniform prior is also a beta distribution (with ) as is the anthropic updated version (which has , ).

The update rule for beta distributions is that a positive observation (ie life) increases by , and a negative observation (a dead planet) increases by . The mean of an updated beta distribution is a generalised version of Laplace’s law of succession: if our prior is a beta distribution with parameters and , and we’ve had positive observations and negative ones, then the mean of the posterior is:

Suppose now that we have observed dead planets, but no life, and that we haven’t done an anthropic update yet, then we have a probability of life of . Upon adding the anthropic update, this shifts to , meaning that the multiplicative factor is at most . If we started with the uniform prior with its , this multiplies the probability of life by at most . In a later section, we’ll look at .

High prior probability is not required for weak anthropic update

The uniform prior has and starts at expectation . But we can set and a much higher , which skews the distribution to the left; for example, for , , and :

Even though these priors are skewed to the left, and have lower prior probabilities of life (, , and ), the anthropic update has a factor that is less than .

Also note that if we scale the prior by a small , so replace on the range with on the range , then is multiplied by and is multiplied by . Thus is unchanged. Here, for example, is the uniform distribution, scaled down by , , and :

All of these will have the same (which is , just as for the uniform distribution). And, of course, doing the same scaling with the various beta distributions we’ve seen up until now will also keep constant.

Thus there are a lot of distributions with very low (ie very low prior probability of life) but an that’s less than (ie the anthropic update is less than a doubling of the probability of life).

Beta distributions II and log-normals

The best-case scenario for is if assigns probability to . In that case, and : the anthropic update changes nothing.

Conversely, the worse-case scenario for is if only allows and . In that case, assigns probability to and to , for a mean of and a variance of , and a multiplicative factor of . In this case, after anthropic update, assigns certainty to (since any life at all, given this , means life on all planets).

But there are also more reasonable priors with large . We’ve already seen some, implicitly, above: the beta distributions with . In that case, is bounded by . If and , for instance, this corresponds to the (unbounded) distribution ; the multiplicative factor is below , which is slightly above . But as declines, the multiplicative factor can go up surprisingly fast; at it is , at it is :

In general, for , the multiplicative factor is bounded by . This gets arbitrarily large as . Though itself corresponds to the improper prior , whose integral diverges. On a log scale, this corresponds to the log-uniform distribution, which is roughly what you get if you assume “we need steps, each of probability , to get life; let’s put a uniform prior over the possible s”.

It’s not clear why one might want to choose for a prior, but there is a class of prior that is much more natural: the log-normal distributions. These are random variables such that is normally distributed.

If we choose to have a mean that is highly negative (and a variance that isn’t too large), then we can mostly ignore the fact that takes values above , and treat it as a prior distribution for . The mean and variance of the log-normal distributions can be explicitly defined, thus giving the multiplications factor as:

Here, is the variance of the normal distribution . This might be large, as it denotes (roughly) “we need steps, each of probability , to get life; let’s put a uniform-ish prior over a range of possible s”. Unlike , this is a proper prior, and a plausible one; therefore there are plausible priors with very large . The log normal is quite likely to appear, as it is the approximate limit of multiplying together a host of different independent parameters.

Multiplication law

Do you know what’s more likely to be useful than “the approximate limit of multiplying together a host of different independent parameters”? Actually multiplying together independent parameters.

The famous Drake equation is:

Here is the number of stars in our galaxy, the fraction of those with planets, the number of planets that can support life per star that has planets, the fraction of those that develop life, the fraction of those that develop intelligent life, the fraction of those that release detectable signs of their existence, and is the length of time those civilizations endure as detectable.

Then the proportion of advanced civilizations per planet is , where is the proportion of life-supporting planets among all planets. To compute the of this distribution, we have the highly useful result (the proof is in this footnote[4]):

  • Let be independent random variables with multiplicative factors , and let be the multiplicative factor of . Then - the total is the product of the individual .

The paper “dissolving the Fermi paradox” gives estimated distributions for all the terms in the Drake equation. The , which doesn’t appear in that paper, is a constant, so has . The has a log-uniform distribution from to ; the can be computed from the mean and variance of such distributions, so .

The term is more complicated; it is distributed like where is a standard normal distribution. Fortunately, we can estimate its mean and variance without having to figure out its distribution, by numerical integration of and on the normal distribution. This gives , and . The overall the multiplicative effect of anthropic update is:

What if we considered the proportion of advanced civilization per star, rather than per planet? Then we can drop the term and add in and . Those are both estimated to be distributed as log-uniform on ; for a total of

Why is the higher for civilizations per star than civilizations per planet? That’s because when we update on our existence, we increase the proportion of civilizations per planet, but we also update the proportion of planets per star—both of these can make life more likely. The incorporates both effects, so is strictly higher than .

We can do the same by considering the number of civilizations per galaxy; then we have to incorporate as well. This is log-uniform on , giving:

What about if we include the Fermi observation (the fact that we don’t see anything in our galaxy)? The “dissolving the Fermi paradox” paper shows there are multiple different ways of including this update, depending on how we parse out “not seeing anything” and how easy it is for civilizations to expand.

I did a crude estimate here by taking the Fermi observation to mean “the proportion of civilizations per galaxy must be less than one”. Then I did a Monte-Carlo simulation, ignoring all results above on the log scale:

From this, I got an estimated mean of , variance of , and a total multiplier of:

With the Fermi observation and the anthropic update combined, we expect civilizations per galaxy.

Limitations of the multiplier

Low multiplier, strong effects

It’s important to note that the anthropic update can be very strong, without changing the expected population much. So a low doesn’t necessary mean a low impact.

Consider for instance the presumptuous philosopher, slightly modified to use planetary population densities. Thus theory predicts (one in a trillion) and predicts ; we put initial probabilities on both theories.

As Nick Bostrom noted, the SIA update pushes to being a trillion times more probable than ; a postiori, is roughly a certainty (the actual probability is ).

However, the expected population goes from roughly (the average of and ) to roughly (since a postiori is almost certain). This gives a of roughly . So, despite the strong update towards , the actual population update is small—and, conversely, despite the actual population update being small, we have a strong update towards .

Combining multiple theories

In the previous post, note that that both and were point estimates: they posit a constant . So they have a variance of zero, and hence a of . But has a much stronger anthropic update. Thus we can’t use their to compare the anthropic effects on different theories.

We also can’t relate the individual s to that of a combined theory. As we’ve seen, and have s of , but the combined theory has an of roughly . But we can play around with the relative initial weight of and to get other s.

If we started with odds on vs , then this has a mean of roughly ; the anthropic update sends it to odds, with a mean of roughly . So this combined theory has an of roughly , half a trillion.

But, conversely, if we started with odds on vs , then we have an initial mean of of roughly one; its anthropic update is odds of , also with a mean of roughly one. So this combined theory has an of roughly .

There is a weak relation between and the of the various . Let be the multiplier of has a multiplier of ; we can reorder the so that for . Let be a combined theory that assigns probability to .

  1. For all , .

  2. For all , there exists with all , so that .

So, the minimum value of the is a lower bound on , and we can get arbitrarily close to that bound. See the proof in this footnote[5].


  1. ↩︎

    As we’ll see, the population update is small even in the presumptuous philosopher experiment itself.

  2. ↩︎

    Citation partially needed: I’m ignoring Boltzmann brains and simulations and similar ideas.

  3. ↩︎

    Given a fixed , the probability of observing life on our own planet is exactly . So Bayes’s theorem implies that . With the full normalisation, this is

    If we want to get the mean of this distribution, we further multiply by and integrate:

    Let’s multiply this by and regroup the terms:

    Thus , using the fact that the variance is the expectation of minus the square of the expectation of .

  4. ↩︎

    I adapted the proof in this post.

    So, let be independent random variables with means and variances . Let , which has mean and variance . Due to the independence of the , the expectations of their products are the product of their expectations. Note that and are also independent if . Then we have:

  5. ↩︎

    Let be probability distributions on , with mean , variance , expectation squared , and . Without loss of generality, reorder the so that for .

    Let be the probability distribution , with associated multiplier . Without loss of generality, assume for . Then we’ll show that .

    We’ll first show this in the special case where and , then generalise to the general case, as is appropriate for a generalisation. If , then, since all terms are non-negative, there exists an such that while . Then for any given , the of is:

    The function is convex, so, interpolating between the values and , we know that for all , the term must be lower than . Therefore is at most , and . This shows the result for if .

    Now assume that , so that . Then replace with , which is lower than , so that . If we define as the expression for with $s_2′ substituting for , we know that , since . Then the previous result shows that , thus too.

    To show the result for larger , we’ll induct on . For the result is a tautology, , and we’ve shown the result for . Assume the result is true for , and then notice that can be re-written as , where for . Then, by the induction hypothesis, if is the of , then . Then applying the result for between and , gives . However, since and , we know that , proving the general result.

    To show can get arbitrarily close to , simply note that is continuous in the , define , for , and let tend to .