Limitations of Laplace’s rule of succession

Summary

Laplace’s rule typically overestimates the chance of unprecedented events occurring; this is because it assigns 50% to the event occurring in the first timestep. We can do better by using a rule that assigns a lower probability to the event occurring in the timestep, choosing the new probability using empirical reference classes or common sense.

Introduction

A couple of years ago I spent some time thinking about Laplace’s rule of succession. My thoughts on it ended up being buried in a report that covered many other topics, so I’m writing this as a clear reference to its limitations that I can point people to.

What is Laplace’s rule of succession?

Laplace’s rule tells you how much probability to assign to an event happening. It’s often used when you don’t have much evidence. The rule can be illustrated with the example of the sunrise problem:

Suppose you knew nothing about the universe except whether, on each day, the sun has risen. Suppose there have been N days so far, and the sun has risen on all of them. What should your subjective probability be that the sun _won’t_ rise tomorrow?

Laplace’s rule states that the probability is 1 / (N+2).^[1] If the sun has risen for 8 days so far, your probability that it won’t rise tomorrow should be ¹⁄₁₀.

You can also use the rule to answer a related question: “how long will the observed trend, where the sun rises every day, last?” It turns out that Laplace’s rule implies there’s a 50% chance the trend continues for another N+1 days or more.^[2] If the sun has risen for 50 days so far, there’s a 50% chance that it keeps rising for another 51 days or more. For this reason, I sometimes summarise Laplace’s rule as roughly claiming “If something’s been happening for X years, we should expect it to continue happening for about another ~X years on average.”

(There’s nothing special about using “days” as the unit of time. You can apply Laplace’s rule using “seconds” or “years” instead. As I discuss below, this arbitrariness can lead to problems! I’ll use ‘year’ as a default going forward.)

You can use Laplace’s rule for questions other than the sunrise problem. It’s often applied to questions where we don’t seem to have much empirical evidence; or to generate an initial guess about a question before taking more evidence into account. The general pattern is that if some trend has been going for N years, Laplace’s rule says there’s a 1 / (N + 2) probability the trend is broken next year and a 50% chance the trend continues for another N+1 years or more. Or if some event hasn’t happened for N years, Laplace’s rule says there’s a 1 / (N + 2) probability the event happening next year and a 50% chance the event doesn’t happen another N+1 years or more.

Some example questions Laplace’s rule could be used to answer:

Q: What’s the chance we develop AGI next year?
- A: We’ve been trying without success for 66 years, so there’s a 1 / (N + 2) = ¹⁄₆₈ chance we succeed next year.
Q: How long until we develop AGI?
- A: We’ve been trying without success for 66 years, so there’s a 50% we succeed sometime within the next 67 years.
Q: What’s the chance of nuclear war happening next year?
- A: We’ve observed 73 years where two countries have nuclear weapons but no war occurred, so the chance of nuclear war next year is ¹⁄₇₅.
Q: How long will Christianity last?
- A: It’s lasted 2000 years, so there’s a 50% chance it lasts another 2001 years.
Q: What’s the chance that the Gherkin collapses next year? (The Gherkin is a building in London that was finished in 2003.)
- A: The Gherkin hasn’t collapsed for 19 years so there’s a ¹⁄₂₁ chance it collapses next year.

Notice that applying the rule requires us to identify a start time. We only “count” years that happen after the start time when calculating N. In the last example, I didn’t “count” years before 2003 because the Gherkin hadn’t been built. The start time should be the beginning of a period where the event has a significantly heightened probability of occurring. The Gherkin can’t collapse before it’s been built; we can’t have a nuclear war before two countries have nukes. This “heightened probability” justifies not counting the years before the start time when calculating N.

What are the problems with Laplace’s rule?^[3]

The key problem is that Laplace’s rule typically overestimates the chance of unprecedented events occurring.

Take the example of AGI. Let’s imagine applying Laplace’s rule when N = 0, i.e. when we were about to start trying to develop AGI for the first time.

Q: What’s the chance we develop AGI next year?
- A: We’ve been trying without success for 0 years, so there’s a 1/(N+2) = 1/(0 + 2) = 50% chance we succeed next year.

Clearly this answer is too high. Even before trying to develop AGI, we had strong reason to think we’re unlikely to succeed in the first year.

In the same way, Laplace’s rule would have predicted that there was a 50% chance of nuclear war in 1950 (the first year after two countries had nuclear weapons), and predicted in 2003 that there was a 50% chance the Gherkin would collapse in its first year. These probabilities are knowably too high, because we have other reasons to think these events won’t occur.

In general, the problem is that Laplace’s rule excludes evidence we already have at the start time about how likely something is to occur. And, in my experience of seeing the rule applied, this nearly always causes an upwards bias in the estimated probability that some unprecedented event (like nuclear war, or AGI) will occur. This upwards bias is most obvious when N = 0, but it doesn’t go away as you make more observations and N increases.

A second problem, which I think is much less severe, is that applying Laplace’s rule requires choosing a start time (when N = 0). In some cases, this choice feels arbitrary.

A third problem is that you have to make an arbitrary choice about whether you count time in years, days, seconds, or something else. This affects the predictions of the rule, especially near the start time. I think the right solution to the first problem resolves this third problem (see next section).

How can we do better?

Laplace’s rule implies that, at the start time (with N = 0), the chance of the event occurring in the next year is 50%. I propose we alter the rule so that this chance is x%, and choose x by looking at relevant reference classes (or failing that use common sense to pick something reasonable). In the case of nuclear war, we might look at the historical frequency with which all-out-wars begin between the two most powerful countries. Though choosing x can be messy and somewhat unprincipled, I think this is better than using a value that we know to be too high (x=50).

More precisely, I would change the rule as follows. Laplace’s rule says that after observing N years where an event hasn’t occurred, your probability it occurs in the next year should be 1 / (N + 2). I suggest replacing this formula with 1 / (N + M + 2), and pick M such that we assign a reasonable probability to the event occurring when N = 0.^[4] So x% = 1 / M.

This resolves the third problem (days vs months vs years), I believe, because any empirically grounded method (and I think any reasonable method) for picking M will be sensitive to whether we’re counting time in years, days, or seconds.^[5] Take the example of nuclear war. Suppose we’re counting time in years and estimate M = 50. This implies a ¹⁄₅₀ chance of nuclear war happening in the first year after 1949. If we’d instead counted time in months then any empirically grounded estimate would have instead implied a ~1/600 (1/50 * ¹⁄₁₂) chance of nuclear war in the first month.

Is it ever OK to use Laplace’s rule? If M << N, my suggested change won’t make much difference to the results. These are cases when the main evidence we have that some event won’t occur is simply the fact it hasn’t already occurred. I find this plausible for the case of Christianity: my main reason for thinking that it will last for a long time is that it has already lasted such a long time. In cases like this, it may be OK to just use Laplace’s rule.^[6]

The second problem – choosing a start time – is still awkward. If there is no privileged time after which the probability of the event is significantly heightened, I recommend choosing several plausible start times and then taking an average to get your overall answer.^[7]

Notes

↩︎
This formula can be derived from two assumptions: i) each day there is a constant but unknown probability p that the sun rises; ii) our initial subjective probability distribution over the value of p is uniform distribution over the range [0, 1].
↩︎
Of course, this isn’t the only other question Laplace’s rule can answer. For any possible future pattern of the sun rising vs not rising (e.g. ‘the sun rises for the next 5 days then doesn’t rise for 11 days’), the rule assigns it a probability.
↩︎
The comments here apply equally to alternatives to Laplace’s rule, including other ‘uninformative priors’ (like the Jeffereys prior) and the Pareto distribution (which can be thought of as the continuous analogue of an uninformative prior).
↩︎
Mathematically, this first change to Laplace’s rule of succession corresponds to replacing Laplace’s uniform distribution over p with a beta distribution. The rule I propose corresponds to the following choice about the beta distribution’s two shape parameters α and β: α=1, M = (α + β) / α. I am not the first to suggest replacing Laplace’s uniform distribution with the more general, yet analytically tractable, beta distribution. For example see Huttegger (2017)<span style=”text-decoration:underline;”>; Raman (2000); Bernardo and Smith (1994)</span>, p271-272, example 5.4, 2nd edition.
↩︎
See section 3.2.2.
↩︎
Though there is still a question about whether you should use an alternative uninformative prior instead.
↩︎
If you want to be fancy, you can update the probability you place on each start time by how surprised the associated rule is by the evidence so far. See section 8.7.