# The Prediction Hierarchy

**Related:** Advancing Certainty, Reversed Stupidity Is Not Intelligence

*The substance of this post is derived from a conversation in the comment thread which I have decided to promote. Teal;deer: if you have to rely on a calculation you may have gotten wrong for your prediction, your expectation for the case when your calculation is wrong should use a simpler calculation, such as reference class forecasting.*

*Edit 2010-01-19: Toby Ord mentions in the comments Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes (PDF) by Toby Ord, Rafaela Hillerbrand, and Anders Sandberg of the Future of Humanity Institute, University of Oxford. It uses a similar mathematical argument, but is much more substantive than this.*

A lottery has a jackpot of a million dollars. A ticket costs one dollar. Odds of a given ticket winning are approximately one in forty million. If your utility is linear in dollars, should you bet?

The obvious (and correct) answer is “no”. The clever (and incorrect) answer is “yes”, as follows:

According to your calculations, “this ticket will not win the lottery” is true with probability 99.9999975%. But can you really be

surethat you can calculate anything to that good odds? Surely you couldn’t expect to make forty million predictions of which you were that confident and only be wrong once. Rationally, you ought to ascribe a lower confidence to the statement: 99.99%, for example. But this means a 0.01% chance ofwinningthe lottery, corresponding to an expected value of a hundred dollars. Therefore, you should buy the ticket.

The logic is not obviously wrong, but where is the error?

First, let us write out the calculation algebraically. Let **E(L)** be the expected value of playing the lottery. Let **p(L)** be your calculated probability that the lottery will pay off. Let **p(C)** be your probability that your calculations are correct. Finally, let **j** represent the value of the jackpot and let **t** represent the price of the ticket. The obvious way to write the clever theory is:

E(L) = max(p(L), 1-p(C)) * j—t

This doesn’t sound quite right, though—surely you should ascribe a higher confidence when you calculate a higher probability. That said, when p(L) is much less than p(C), it shouldn’t make a *large* difference. The straightforward way to account for this is to take p(C) as the probability that p(L) is correct, and write the following:

E(L) = [ p(C)*p(L) + 1-p(C) ] * j—t

which can be rearranged as:

E(L) = p(C) * [p(L)*j—t] + (1-p(C)) * [j—t]

I believe this exposes the problem with the clever argument quite explicitly. Why, if your calculations are incorrect (probability 1-p(C)), should you assume that you are *certain* to win the lottery? If your calculations are incorrect, they should tell you *almost nothing* about whether you will win the lottery or not. So what do you do?

What appears to me the elegant solution is to use a *less complex* calculation—or a series of less complex calcuations—to act as your backup hypothesis. In a tricky engineering problem (say, calculating the effectiveness of a heat sink), your primary prediction might come out of a finite element fluid dynamics calculator with p(C) = 0.99 and narrow error bars, but you would also refer to the result of a simple algebraic model with p(C) = 0.9999 and much wider error bars. And then you would backstop the lot with your background knowledge about heat sinks in general, written with wide enough error bars to call p(C) = 1 - epsilon.

In this case, though, the calculation was simple, so our backup prediction is just the background knowledge. Say that, knowing nothing about a lottery but “it’s a lottery”, we would have an expected payoff **e**. Then we write:

E(L) = p(C) * [p(L)*j—t] + (1-p(C)) * e

I don’t know about you, but for me, *e* is approximately equal to -*t*. And justice is restored.

We are advised that, when solving hard problems, we should solve multiple problems at once. This is relatively trivial, but I can point out a couple other relatively trivial examples where it shows up well:

**Suppose the lottery appears to be marginally profitable: should you bet on it?** Not unless you are confident in your numbers.

**Suppose we consider the LHC. Should we (have) switch(ed) it on?** Once you’ve checked that it is safe, yes. As a high-energy physics experiment, the backup comparison would be to things like nuclear energy, which have only small chances of devastation on the planetary scale. If your calculations were to indicate that the LHC is completely safe, even if your P(C) were as low as three or four nines (99.9%, 99.99%), your actual estimate of the safety of turning it on should be no lower than six or seven nines, and probably higher. (In point of fact, given the number of physicists analyzing the question, P(C) is much higher. Three cheers for intersubjective verification.)

**Suppose we consider our Christmas shopping?** When you’re estimating your time to finish your shopping, your calculations are not very reliable. Therefore your answer is strongly dominated by the simpler, much more reliable reference class prediction.

**But what are the odds that this ticket won’t win the lottery?** …how many nines do I type, again?

- 21 Jan 2010 5:17 UTC; 10 points) 's comment on Winning the Unwinnable by (
- Winning the Unwinnable by 21 Jan 2010 3:01 UTC; 4 points) (
- 20 Jan 2010 17:06 UTC; 3 points) 's comment on That Magical Click by (
- 24 Feb 2010 12:33 UTC; 2 points) 's comment on “Outside View!” as Conversation-Halter by (
- 23 Jan 2010 23:36 UTC; 2 points) 's comment on Normal Cryonics by (
- 22 Jan 2010 8:34 UTC; 1 point) 's comment on Privileged Snuff by (
- 19 Jan 2010 3:22 UTC; 0 points) 's comment on Advancing Certainty by (
- 30 Mar 2010 19:17 UTC; 0 points) 's comment on The I-Less Eye by (
- 14 Jun 2010 3:30 UTC; 0 points) 's comment on Attention Lurkers: Please say hi by (

You may wish to check out the paper we wrote at the FHI on the problem of taking into account mistakes in one’s own argument. The mathematical result is the same as the one here, but the proof is more compelling. Also, we demonstrate that when applied to the LHC, the result is very different to the above analysis.

http://www.fhi.ox.ac.uk/__data/assets/pdf_file/0006/4020/probing-the-improbable.pdf

I haven’t read the paper through, but the similarity in algebra cannot be denied. I have added a reference to the post.

Thanks, interesting read. Could you expand more on the points of similarity and difference between your argument and RobinZ’s? They currently seem very disparate approaches to me.

Why, when you consider the case where you calculated the odds of winning the lottery incorrectly, do you increase rather than decrease the odds?

In any case, with a lottery, you

doknow the odds of winning; they’re stated on the ticket.Edit: I see I misread the remarks. See downthread.At the moment, I’m calculating my expected value, not the odds, but there are a number of reasons to think that jackpot / stated-odds is optimistic:

The lottery may be a fraud.

The lottery may go bust.

I may lose the ticket.

I may have to split the pot.

In general, the rigorous approach would be to rewrite everything as probability distributions.

Besides: if you want to assume the average lottery ticket is more valuable that I would—

e= −0.5*t, say—that’s your right. I make no justification for my priors.You quoted this from somewhere:

This says that rationally, you should assign a much higher expected value to the ticket. But all 4 factors you just listed are ones which would make the expected value of the ticket lower.

Oh, I see what you mean. That wasn’t a quote, actually—it was essentially an articulation of ciphergoth’s clever (but incorrect) argument. The purpose of this post was to explain my method for rebutting it.

It’s just a restatement of the Pascal’s Mugging problem, but with the lottery in place of the mugging.

I’m still ambivalent about Pascal’s Mugging, however—my instinct is to refuse to pay, but I don’t feel I can sufficiently justify that response.

The lottery, as an ordinary situation, is far more tractable.

Can’t you apply a similar argument? Instead of considering P(mugger’s statement is true), you consider P(you have the faintest idea what’s going on).

My instinctive probability measurement for such a statement is not so small as 1/3^^^^3. My best retort at the moment is purely pragmatic: never accept such a mugging, because otherwise you will be mugged.

Indeed, the probability that we don’t know what’s going on is non-negligible. What I’m suggesting is that we don’t have to assign a non-negligible probability to the specific hypothesis “this mugger is speaking the literal truth”—instead we avoid overconfidence by trying to consider all of the hypotheses that might hide behind the general assertion “our grasp on this situation is much less than we think” and try to use broader reference classes to see what the outcomes of various strategies might be in those instances, using the strategy you outline for the lottery.

Not to engage in needless turnabout, but how does that translate into math?

Instead of thinking of the proposition H = the mugger is honest, and trying to calculate E(U|AH)P(H) + E(U|A~H)P(~H) where A is an action such as handing over your wallet and U is utility, you consider the hypotheses you’re really applying, T, that your general theory about muggers is sufficient to understand the situation, and ~T, that you just

don’t have a handle on the situation. Then instead of directly using the mugger’s stated utility for the value you try to appeal to a simpler and more general theory to find a value for E(U|A~T). The more general “an offer from a stranger” reference class should suffice; you buy only a tiny minority of the things that you’re offered. Beyond that you have the “don’t know” reference class, but that has to have expected zero utility.This argument doesn’t apply to any possibility that you are in a position to properly think about. You

arein a position to assess the probability that Miriam Achaba wishes to entrust you with 25M USD, so you’re best advised not to reach for the “other” column on that one.Agreed—the trick is that being wrong “only once” is deceptive. I may be wrong more than once on a one-in-forty-million chance. But I may also be wrong zero times in 100 million tries, on a problem as frequent and well-understood as the lottery, and I’m hesitant to say that any reading problems I may have would bias the test toward more lucrative mistakes.

Let’s rephrase, then. Suppose for a moment that you are 100% confident a lottery ticket costs $1, you can buy it, it pays $10^6 on a win, etc etc and that you are reading the ticket right now and believe it says the probability the ticket will win is 1/(4x10^6). Should you believe the ticket is +EV?

The wrong calculation: Yes, because you estimate you’ll misread the ticket (or it’s lying, etc etc) 1 in a million times, which makes the EV 10^6 x (10^-6 + (1-10^-6) x 1/(4x10^6)) = 1 + ~0.25.

The right calculation: No, because you’ll misread the ticket 1 in a million times, which makes the EV 10^6

(10^-6 x+ (1-10^-6) x 1/(4x10^6)) = P + ~0.25 where P is whatever probability of winning with 1 ticket you assign to an arbitrary lottery that costs $1 and pays $10^6 where you incorrectly read the probability off the back of the ticket as being 10^-6 (or it’s lying, etc etc). If your priors say P ~= 1 then they need adjusting; if they say P ~= 10^-7 to 10^-6 then they probably don’t need adjusting. And then the EV is ~= 0.25 again.*PAFAICT this is the same as in the post, but I’m not certain I understand precisely where your question is.

Edit: ha, I put 10^-5 to 10^-6 (which is of course silly) instead of 10^-7 to 10^-6, but RobinZ put ~0 anyway

I must confer on you the highest form of praise among aspiring rationalists:

“Damn it, why didn’t

Ithink of that?”Thank you!

A lot of credit has to go to ciphergoth and Wei_Dai, actually—I was just trying to run a stack trace on my instantaneous rejection of the original lottery argument; they’re the ones who made me quantify it.

Seconded. In fact I’m almost tempted to declare that this supersedes my post

Great work! That said, I’m suspicious of this for being too convenient—I’m given cause to worry by the way I like the answers it gives

too much. It almost seems to make the standard caution against overconfidence disappear from our calculations altogether,especiallyin the cases where it’s hardest to think about. And it gets us into “reference class tennis” again.That’s a good point—standard cautions against overconfidence should reduce p(C), much like time pressure, exhaustion, and complexity of argument.

So, upon learning that my calculations were wrong, am I correct in saying that my new probability estimate—before doing any further calculations—should become whatever my prior probability was before I did the calculation?

Let me be more precise:

beforeyou see anything wrong with your calculations, you have no real reason to expect locating an error in them to give you evidence ofanything specific. Therefore, when doing your initial post-calculations, the prior probability is appropriate.Afteryou find an error in your calculations, you can usuallyfixthe error in your calculations.Not quite. It depends on your beliefs about how the calculation could go wrong and how much this would change the result. If you are very confident in all parts except a minor correcting term, and are simply told that there is an error in the calculation, then you can still have some kind of rough confidence in the result (you can see how to spell this out in maths). If you know the exact part of the calculation that was mistaken, then the situation is slightly different, but still not identical to reverting to your prior.

I believe that the analysis of this problem can be made more mathematically rigorous than is done in this post. Not only will a formal analysis help us avoid problem’s in our reasoning, but it will clearly illustrate what assumptions have been made (so we can question their legitimacy).

Let’s assume (as is done implicitly in the post) that you know with 100% certainty that the only two possible payouts are $1 million and $0. Then:

expected earnings = p($1 million payout)

$1 million + p($0 payout)$0 - (ticket price)= p($1 million payout) * $1 million - (ticket price)

= p($1 million payout|correctly computed odds) p(correctly computed odds) * $1 million

p($1 million payout|incorrectly computed odds) p(incorrectly computed odds) * $1 million

(ticket price)

= (1/40,000,000) p(correctly computed odds) * $1 million

p($1 million payout|incorrectly computed odds) (1 - p(correctly computed odds)) * $1 million

(ticket price)

We note now that we can write:

p($1 million payout|incorrectly computed odds) (1 - p(correctly computed odds))

$1 million = p($1 million payout|incorrectly computed odds)$1 million(1 - p(correctly computed odds)) = (p($1 million payout|incorrectly computed odds)$1 million + p($0 payout|incorrectly computed odds)$0)(1 - p(correctly computed odds)) = (expected payout given incorrectly computed odds) (1 - p(correctly computed odds))Hence, our resulting equation is:

expected earnings = (1/40,000,000) p(correctly computed odds) * $1 million

(expected payout given incorrectly computed odds) (1 - p(correctly computed odds))

(ticket price)

Now, under the fairly reasonable (but not quite true) assumption (which seems to be implicitly made by the author) that

(expected payout given incorrectly computed odds) = (expected payout given that we know nothing except that we are dealing with a lotto that costs (ticket price) to play)

we can convert to the notation of the article, which gives us:

E(L) = p(C)

p(L)j + (1 - p(C)) * (e + t) - tHere I have interpreted e as the expected value given that we are dealing with a lotto that we know nothing else about (rather than expected earnings under those circumstances). The author describes e as an “expected payoff” but I don’t think that is really quite what was meant (unless “payoff” returns to total net payoff including the ticket price).

We can now rearrange this formula:

E(L) = p(C)

p(L)j + (1 - p(C))e + (1 - p(C))t—t = p(C)p(L)j + (1 - p(C))e + (1 - p(C))t—t = p(C)p(L)j + (1 - p(C))e—p(C) t = p(C)( p(L)j—t) + (1 - p(C))ewhich finally gets us to the author’s terminal formula.

What is the point of doing this careful, formal analysis? Well, we now see where the author’s formula comes from explicitly, it is proven rigorously, and we are fully aware of what assumptions were made. The assumptions are:

You know with 100% certainty that the only two possible payouts are $1 million and $0

and

expected payout given incorrectly computed odds = expected payout given that we know nothing except that we are dealing with a lotto that costs the given ticket price to play

The first assumption is reasonable assuming that lotto is not fraudulent, you don’t have problems reading the rules, it is not possible for multiple people to claim the payout, etc.

The second assumption, however, is harder to justify. There are many ways that a calculation of odds could go wrong (putting a decimal point in the wrong place, making a multiplication error, unknowingly misunderstanding the laws of probability, actually being insane, etc.) If we could really enumerate all of them, understand how they effect our computed payout probability, and estimate the probability of each occurring, then we could compute this missing factor exactly. As things stand though, it is probably untenable. It should not be expected though that errors that make the payout probability artificially larger will balance those that make it artificially smaller. Misplacing a decimal point, for example, will almost certainly be noticed if it leads to a percentage greater than 100%, but not if it leads to one that is less than that (creating an asymmetry).

This is a valid point, and one I missed in my writeup. (Toby_Ord said something similar, but that was in response to a specific question.)

It is probably a useful skill to recognize asymmetries in the possible direction of error, such as that which you pointed out. I can see two ways to handle this:

a. Additional terms in the derivation, such as P(decimal-point error) and P(sign error), with the

eterm restricted to the unanticipated-error case.b. Modification of

e.If it were 40 million predictions about lotteries of that size I could.

Don’t be so overconfident, apparently in 1997 New York ran a promotion that doubled the payouts of a single game on exactly 4 days in exactly 1 month, and payed out $1.40 to $1 on average.

That’s a different game though, where the odds of winning aren’t one in 40 million.

My grasp of statistics is atrocious, something I hope to improve this year with an open university maths course, so apologies if this is a dumb question:

Do the figures change if you take “playing the lottery” as over the whole of your lifespan? I mean, most of the people I know who play the lottery make a commitment to play regularly. Is the calculation affected in any meaningful way? At least the costs of playing the lottery weekly over say 20 years become much less trivial in appearance

If by ‘do the figures change’ you mean ‘does it ever become a good bet’ then no.

Your odds of winning once go up as you increase the number of tickets you buy (# of tickets purchased * Chance of winning per ticket). The expected value of a given ticket remains the same. All you are doing is focusing more money away from other possibilities. If you buy 5 tickets a week for your entire life, and the odds of winning are 1 in 100 million, then you have a 0.000169 chance of winning the lottery, but you could have spent your 16 thousand on a new TV or a vacation.

It comes out to about the right number in this case, but your math is wrong. The

expected number of timesyou win inntrials at probabilitypequalsnp, but theprobability of winning at least onceis slightly less at 1-(1-p)^n.Yes, thanks for the correction.

As mattnewport and LucasSloan point out, it doesn’t change the

actualnumbers—a bad bet multiplied a thousandfold is still a bad bet—but it does change the wrong numbers: buying a thousand tickets for a 0.01% chance of a million dollars is a losing bet again.* More evidence that the ignorance argument fails.* How I calculate this (changes in italics):

You somehow make an assumption that making a fault of calculating the utility of the lottery ticket to low is more likely than making a fault of calculating it to high.

In principle those two sorts of possible errors should balance each other out.

I make no such point. If you read the post, nowhere do I assume any specific relation between [p(L)*

j—t] ande—my point is specifically that you should use somethingwith no dependence onyour calculation (and strictly more reliable) to draw conclusions from when you’re not sure.A non-mathematical rule of thumb for the same situation might be the idea that if you can’t be very certain of the nominal odds, then you can’t be very certain of actually receiving the payoff either.