Bayes’ Theorem Illustrated (My Way)

(This post is el­e­men­tary: it in­tro­duces a sim­ple method of vi­su­al­iz­ing Bayesian calcu­la­tions. In my defense, we’ve had other el­e­men­tary posts be­fore, and they’ve been found use­ful; plus, I’d re­ally like this to be on­line some­where, and it might as well be here.)

I’ll ad­mit, those Monty-Hall-type prob­lems in­vari­ably trip me up. Or at least, they do if I’m not think­ing very care­fully—do­ing quite a bit more work than other peo­ple seem to have to do.

What’s more, peo­ple’s ex­pla­na­tions of how to get the right an­swer have al­most never been satis­fac­tory to me. If I con­cen­trate hard enough, I can usu­ally fol­low the rea­son­ing, sort of; but I never quite “see it”, and nor do I feel equipped to solve similar prob­lems in the fu­ture: it’s as if the solu­tions seem to work only in ret­ro­spect.

Minds work differ­ently, illu­sion of trans­parency, and all that.

For­tu­nately, I even­tu­ally man­aged to iden­tify the source of the prob­lem, and I came up a way of think­ing about—vi­su­al­iz­ing—such prob­lems that suits my own in­tu­ition. Maybe there are oth­ers out there like me; this post is for them.

I’ve men­tioned be­fore that I like to think in very ab­stract terms. What this means in prac­tice is that, if there’s some sim­ple, gen­eral, el­e­gant point to be made, tell it to me right away. Don’t start with some messy con­crete ex­am­ple and at­tempt to “work up­ward”, in the hope that difficult-to-grasp ab­stract con­cepts will be made more palat­able by re­lat­ing them to “real life”. If you do that, I’m li­able to get stuck in the trees and not see the for­est. Chances are, I won’t have much trou­ble un­der­stand­ing the ab­stract con­cepts; “real life”, on the other hand...

...well, let’s just say I pre­fer to start at the top and work down­ward, as a gen­eral rule. Tell me how the trees re­late to the for­est, rather than the other way around.

Many peo­ple have found Eliezer’s In­tu­itive Ex­pla­na­tion of Bayesian Rea­son­ing to be an ex­cel­lent in­tro­duc­tion to Bayes’ the­o­rem, and so I don’t usu­ally hes­i­tate to recom­mend it to oth­ers. But for me per­son­ally, if I didn’t know Bayes’ the­o­rem and you were try­ing to ex­plain it to me, pretty much the worst thing you could do would be to start with some de­tailed sce­nario in­volv­ing breast-can­cer screen­ings. (And not just be­cause it tar­nishes beau­tiful math­e­mat­ics with images of sick­ness and death, ei­ther!)

So what’s the right way to ex­plain Bayes’ the­o­rem to me?

Like this:

We’ve got a bunch of hy­pothe­ses (states the world could be in) and we’re try­ing to figure out which of them is true (that is, which state the world is ac­tu­ally in). As a con­ces­sion to con­crete­ness (and for ease of draw­ing the pic­tures), let’s say we’ve got three (mu­tu­ally ex­clu­sive and ex­haus­tive) hy­pothe­ses—pos­si­ble world-states—which we’ll call H1, H2, and H3. We’ll rep­re­sent these as blobs in space:

Figure 0

Figure 0

Now, we have some prior no­tion of how prob­a­ble each of these hy­pothe­ses is—that is, each has some prior prob­a­bil­ity. If we don’t know any­thing at all that would make one of them more prob­a­ble than an­other, they would each have prob­a­bil­ity 13. To illus­trate a more typ­i­cal situ­a­tion, how­ever, let’s as­sume we have more in­for­ma­tion than that. Speci­fi­cally, let’s sup­pose our prior prob­a­bil­ity dis­tri­bu­tion is as fol­lows: P(H1) = 30%, P(H2)=50%, P(H3) = 20%. We’ll rep­re­sent this by re­siz­ing our blobs ac­cord­ingly:

Figure 1

Figure 1

That’s our prior knowl­edge. Next, we’re go­ing to col­lect some ev­i­dence and up­date our prior prob­a­bil­ity dis­tri­bu­tion to pro­duce a pos­te­rior prob­a­bil­ity dis­tri­bu­tion. Speci­fi­cally, we’re go­ing to run a test. The test we’re go­ing to run has three pos­si­ble out­comes: Re­sult A, Re­sult B, and Re­sult C. Now, since this test hap­pens to have three pos­si­ble re­sults, it would be re­ally nice if the test just flat-out told us which world we were liv­ing in—that is, if (say) Re­sult A meant that H1 was true, Re­sult B meant that H2 was true, and Re­sult 3 meant that H3 was true. Un­for­tu­nately, the real world is messy and com­plex, and things aren’t that sim­ple. In­stead, we’ll sup­pose that each re­sult can oc­cur un­der each hy­poth­e­sis, but that the differ­ent hy­pothe­ses have differ­ent effects on how likely each re­sult is to oc­cur. We’ll as­sume for in­stance that if Hy­poth­e­sis H1 is true, we have a 12 chance of ob­tain­ing Re­sult A, a 13 chance of ob­tain­ing Re­sult B, and a 16 chance of ob­tain­ing Re­sult C; which we’ll write like this:

P(A|H1) = 50%, P(B|H1) = 33.33...%, P(C|H1) = 16.166...%

and illus­trate like this:

Figure 2

(Re­sult A be­ing rep­re­sented by a tri­an­gle, Re­sult B by a square, and Re­sult C by a pen­tagon.)

If Hy­poth­e­sis H2 is true, we’ll as­sume there’s a 10% chance of Re­sult A, a 70% chance of Re­sult B, and a 20% chance of Re­sult C:

Figure 3

Figure 3

(P(A|H2) = 10% , P(B|H2) = 70%, P(C|H2) = 20%)

Fi­nally, we’ll say that if Hy­poth­e­sis H3 is true, there’s a 5% chance of Re­sult A, a 15% chance of Re­sult B, and an 80% chance of Re­sult C:

Figure 4

Figure 4

(P(A|H3) = 5%, P(B|H3) = 15% P(C|H3) = 80%)

Figure 5 be­low thus shows our knowl­edge prior to run­ning the test:

Figure 5

Note that we have now carved up our hy­poth­e­sis-space more finely; our pos­si­ble world-states are now things like “Hy­poth­e­sis H1 is true and Re­sult A oc­curred”, “Hy­poth­e­sis H1 is true and Re­sult B oc­curred”, etc., as op­posed to merely “Hy­poth­e­sis H1 is true”, etc. The num­bers above the slanted line seg­ments—the like­li­hoods of the test re­sults, as­sum­ing the par­tic­u­lar hy­poth­e­sis—rep­re­sent what pro­por­tion of the to­tal prob­a­bil­ity mass as­signed to the hy­poth­e­sis Hn is as­signed to the con­junc­tion of Hy­poth­e­sis Hn and Re­sult X; thus, since P(H1) = 30%, and P(A|H1) = 50%, P(H1 & A) is there­fore 50% of 30%, or, in other words, 15%.

(That’s re­ally all Bayes’ the­o­rem is, right there, but—shh! -- don’t tell any­one yet!)

Now, then, sup­pose we run the test, and we get...Re­sult A.

What do we do? We cut off all the other branches:

Figure 6

So our up­dated prob­a­bil­ity dis­tri­bu­tion now looks like this:

Figure 7

...ex­cept for one thing: prob­a­bil­ities are sup­posed to add up to 100%, not 21%. Well, since we’ve con­di­tioned on Re­sult A, that means that the 21% prob­a­bil­ity mass as­signed to Re­sult A is now the en­tirety of our prob­a­bil­ity mass -- 21% is the new 100%, you might say. So we sim­ply ad­just the num­bers in such a way that they add up to 100% and the pro­por­tions are the same:

Figure 8

There! We’ve just performed a Bayesian up­date. And that’s what it looks like.

If, in­stead of Re­sult A, we had got­ten Re­sult B,

Figure 9

Figure 9

then our up­dated prob­a­bil­ity dis­tri­bu­tion would have looked like this:

Figure 10

Similarly, for Re­sult C:

Figure 11

Bayes’ the­o­rem is the for­mula that calcu­lates these up­dated prob­a­bil­ities. Us­ing H to stand for a hy­poth­e­sis (such as H1, H2 or H3), and E a piece of ev­i­dence (such as Re­sult A, Re­sult B, or Re­sult C), it says:

P(H|E) = P(H)*P(E|H)/​P(E)

In words: to calcu­late the up­dated prob­a­bil­ity P(H|E), take the por­tion of the prior prob­a­bil­ity of H that is al­lo­cated to E (i.e. the quan­tity P(H)*P(E|H)), and calcu­late what frac­tion this is of the to­tal prior prob­a­bil­ity of E (i.e. di­vide it by P(E)).

What I like about this way of vi­su­al­iz­ing Bayes’ the­o­rem is that it makes the im­por­tance of prior prob­a­bil­ities—in par­tic­u­lar, the differ­ence be­tween P(H|E) and P(E|H) -- vi­su­ally ob­vi­ous. Thus, in the above ex­am­ple, we eas­ily see that even though P(C|H3) is high (80%), P(H3|C) is much less high (around 51%) -- and once you have as­similated this vi­su­al­iza­tion method, it should be easy to see that even more ex­treme ex­am­ples (e.g. with P(E|H) huge and P(H|E) tiny) could be con­structed.

Now let’s use this to ex­am­ine two tricky prob­a­bil­ity puz­zles, the in­fa­mous Monty Hall Prob­lem and Eliezer’s Draw­ing Two Aces, and see how it illus­trates the cor­rect an­swers, as well as how one might go wrong.

The Monty Hall Problem

The situ­a­tion is this: you’re a con­tes­tant on a game show seek­ing to win a car. Be­fore you are three doors, one of which con­tains a car, and the other two of which con­tain goats. You will make an ini­tial “guess” at which door con­tains the car—that is, you will se­lect one of the doors, with­out open­ing it. At that point, the host will open a goat-con­tain­ing door from among the two that you did not se­lect. You will then have to de­cide whether to stick with your origi­nal guess and open the door that you origi­nally se­lected, or switch your guess to the re­main­ing un­opened door. The ques­tion is whether it is to your ad­van­tage to switch—that is, whether the car is more likely to be be­hind the re­main­ing un­opened door than be­hind the door you origi­nally guessed.

(If you haven’t thought about this prob­lem be­fore, you may want to try to figure it out be­fore con­tin­u­ing...)

The an­swer is that it is to your ad­van­tage to switch—that, in fact, switch­ing dou­bles the prob­a­bil­ity of win­ning the car.

Peo­ple of­ten find this coun­ter­in­tu­itive when they first en­counter it—where “peo­ple” in­cludes the au­thor of this post. There are two pos­si­ble doors that could con­tain the car; why should one of them be more likely to con­tain it than the other?

As it turns out, while con­struct­ing the di­a­grams for this post, I “re­dis­cov­ered” the er­ror that led me to in­cor­rectly con­clude that there is a 12 chance the car is be­hind the origi­nally-guessed door and a 12 chance it is be­hind the re­main­ing door the host didn’t open. I’ll pre­sent that er­ror first, and then show how to cor­rect it. Here, then, is the wrong solu­tion:

We start out with a perfectly cor­rect di­a­gram show­ing the prior prob­a­bil­ities:

Figure 12

The pos­si­ble hy­pothe­ses are Car in Door 1, Car in Door 2, and Car in Door 3; be­fore the game starts, there is no rea­son to be­lieve any of the three doors is more likely than the oth­ers to con­tain the car, and so each of these hy­pothe­ses has prior prob­a­bil­ity 13.

The game be­gins with our se­lec­tion of a door. That it­self isn’t ev­i­dence about where the car is, of course—we’re as­sum­ing we have no par­tic­u­lar in­for­ma­tion about that, other than that it’s be­hind one of the doors (that’s the whole point of the game!). Once we’ve done that, how­ever, we will then have the op­por­tu­nity to “run a test” to gain some “ex­per­i­men­tal data”: the host will perform his task of open­ing a door that is guaran­teed to con­tain a goat. We’ll rep­re­sent the re­sult Host Opens Door 1 by a tri­an­gle, the re­sult Host Opens Door 2 by a square, and the re­sult Host Opens Door 3 by a pen­tagon—thus carv­ing up our hy­poth­e­sis space more finely into pos­si­bil­ities such as “Car in Door 1 and Host Opens Door 2” , “Car in Door 1 and Host Opens Door 3″, etc:

Figure 13

Be­fore we’ve made our ini­tial se­lec­tion of a door, the host is equally likely to open ei­ther of the goat-con­tain­ing doors. Thus, at the be­gin­ning of the game, the prob­a­bil­ity of each hy­poth­e­sis of the form “Car in Door X and Host Opens Door Y” has a prob­a­bil­ity of 16, as shown. So far, so good; ev­ery­thing is still perfectly cor­rect.

Now we se­lect a door; say we choose Door 2. The host then opens ei­ther Door 1 or Door 3, to re­veal a goat. Let’s sup­pose he opens Door 1; our di­a­gram now looks like this:

Figure 14

But this shows equal prob­a­bil­ities of the car be­ing be­hind Door 2 and Door 3!

Figure 15

Did you catch the mis­take?

Here’s the cor­rect ver­sion:

As soon as we se­lected Door 2, our di­a­gram should have looked like this:

Figure 16

With Door 2 se­lected, the host no longer has the op­tion of open­ing Door 2; if the car is in Door 1, he must open Door 3, and if the car is in Door 3, he must open Door 1. We thus see that if the car is be­hind Door 3, the host is twice as likely to open Door 1 (namely, 100%) as he is if the car is be­hind Door 2 (50%); his open­ing of Door 1 thus con­sti­tutes some ev­i­dence in fa­vor of the hy­poth­e­sis that the car is be­hind Door 3. So, when the host opens Door 1, our pic­ture looks as fol­lows:

Figure 17

which yields the cor­rect up­dated prob­a­bil­ity dis­tri­bu­tion:

Figure 18

Draw­ing Two Aces

Here is the state­ment of the prob­lem, from Eliezer’s post:

Sup­pose I have a deck of four cards: The ace of spades, the ace of hearts, and two oth­ers (say, 2C and 2D).

You draw two cards at ran­dom.


Now sup­pose I ask you “Do you have an ace?”

You say “Yes.”

I then say to you: “Choose one of the aces you’re hold­ing at ran­dom (so if you have only one, pick that one). Is it the ace of spades?”

You re­ply “Yes.”

What is the prob­a­bil­ity that you hold two aces?

(Once again, you may want to think about it, if you haven’t already, be­fore con­tin­u­ing...)

Here’s how our pic­ture method an­swers the ques­tion:

Since the per­son hold­ing the cards has at least one ace, the “hy­pothe­ses” (pos­si­ble card com­bi­na­tions) are the five shown be­low:

Figure 19

Each has a prior prob­a­bil­ity of 15, since there’s no rea­son to sup­pose any of them is more likely than any other.

The “test” that will be run is se­lect­ing an ace at ran­dom from the per­son’s hand, and see­ing if it is the ace of spades. The pos­si­ble re­sults are:

Figure 20

Now we run the test, and get the an­swer “YES”; this puts us in the fol­low­ing situ­a­tion:

Figure 21

The to­tal prior prob­a­bil­ity of this situ­a­tion (the YES an­swer) is (1/​6)+(1/​3)+(1/​3) = 56; thus, since 16 is 15 of 56 (that is, (1/​6)/​(5/​6) = 15), our up­dated prob­a­bil­ity is 15 -- which hap­pens to be the same as the prior prob­a­bil­ity. (I won’t bother dis­play­ing the fi­nal post-up­date pic­ture here.)

What this means is that the test we ran did not provide any ad­di­tional in­for­ma­tion about whether the per­son has both aces be­yond sim­ply know­ing that they have at least one ace; we might in fact say that the re­sult of the test is screened off by the an­swer to the first ques­tion (“Do you have an ace?”).

On the other hand, if we had sim­ply asked “Do you have the ace of spades?”, the di­a­gram would have looked like this:

Figure 22

which, upon re­ceiv­ing the an­swer YES, would have be­come:

Figure 23

The to­tal prob­a­bil­ity mass al­lo­cated to YES is 35, and, within that, the spe­cific situ­a­tion of in­ter­est has prob­a­bil­ity 15; hence the up­dated prob­a­bil­ity would be 13.

So a YES an­swer in this ex­per­i­ment, un­like the other, would provide ev­i­dence that the hand con­tains both aces; for if the hand con­tains both aces, the prob­a­bil­ity of a YES an­swer is 100% -- twice as large as it is in the con­trary case (50%), giv­ing a like­li­hood ra­tio of 2:1. By con­trast, in the other ex­per­i­ment, the prob­a­bil­ity of a YES an­swer is only 50% even in the case where the hand con­tains both aces.

This is what peo­ple who try to ex­plain the differ­ence by ut­ter­ing the opaque phrase “a ran­dom se­lec­tion was in­volved!” are ac­tu­ally talk­ing about: the differ­ence between

Figure 24



Figure 25

The method ex­plained here is far from the only way of vi­su­al­iz­ing Bayesian up­dates, but I feel that it is among the most in­tu­itive.

(I’d like to thank my sister, Vive-ut-Vi­vas, for help with some of the di­a­grams in this post.)