Probability is in the Mind


Yes­ter­day I spoke of the Mind Pro­jec­tion Fal­lacy, giv­ing the ex­am­ple of the alien mon­ster who car­ries off a girl in a torn dress for in­tended rav­ish­ing—a mis­take which I im­puted to the artist’s ten­dency to think that a woman’s sex­i­ness is a prop­erty of the woman her­self,­i­ness, rather than some­thing that ex­ists in the mind of an ob­server, and prob­a­bly wouldn’t ex­ist in an alien mind.

The term “Mind Pro­jec­tion Fal­lacy” was coined by the late great Bayesian Master, E. T. Jaynes, as part of his long and hard-fought bat­tle against the ac­cursèd fre­quen­tists. Jaynes was of the opinion that prob­a­bil­ities were in the mind, not in the en­vi­ron­ment—that prob­a­bil­ities ex­press ig­no­rance, states of par­tial in­for­ma­tion; and if I am ig­no­rant of a phe­nomenon, that is a fact about my state of mind, not a fact about the phe­nomenon.

I can­not do jus­tice to this an­cient war in a few words—but the clas­sic ex­am­ple of the ar­gu­ment runs thus:

You have a coin.
The coin is bi­ased.
You don’t know which way it’s bi­ased or how much it’s bi­ased. Some­one just told you, “The coin is bi­ased” and that’s all they said.
This is all the in­for­ma­tion you have, and the only in­for­ma­tion you have.

You draw the coin forth, flip it, and slap it down.

Now—be­fore you re­move your hand and look at the re­sult—are you will­ing to say that you as­sign a 0.5 prob­a­bil­ity to the coin hav­ing come up heads?

The fre­quen­tist says, “No. Say­ing ‘prob­a­bil­ity 0.5’ means that the coin has an in­her­ent propen­sity to come up heads as of­ten as tails, so that if we flipped the coin in­finitely many times, the ra­tio of heads to tails would ap­proach 1:1. But we know that the coin is bi­ased, so it can have any prob­a­bil­ity of com­ing up heads ex­cept 0.5.”

The Bayesian says, “Uncer­tainty ex­ists in the map, not in the ter­ri­tory. In the real world, the coin has ei­ther come up heads, or come up tails. Any talk of ‘prob­a­bil­ity’ must re­fer to the in­for­ma­tion that I have about the coin—my state of par­tial ig­no­rance and par­tial knowl­edge—not just the coin it­self. Fur­ther­more, I have all sorts of the­o­rems show­ing that if I don’t treat my par­tial knowl­edge a cer­tain way, I’ll make stupid bets. If I’ve got to plan, I’ll plan for a 5050 state of un­cer­tainty, where I don’t weigh out­comes con­di­tional on heads any more heav­ily in my mind than out­comes con­di­tional on tails. You can call that num­ber what­ever you like, but it has to obey the prob­a­bil­ity laws on pain of stu­pidity. So I don’t have the slight­est hes­i­ta­tion about call­ing my out­come-weight­ing a prob­a­bil­ity.”

I side with the Bayesi­ans. You may have no­ticed that about me.

Even be­fore a fair coin is tossed, the no­tion that it has an in­her­ent 50% prob­a­bil­ity of com­ing up heads may be just plain wrong. Maybe you’re hold­ing the coin in such a way that it’s just about guaran­teed to come up heads, or tails, given the force at which you flip it, and the air cur­rents around you. But, if you don’t know which way the coin is bi­ased on this one oc­ca­sion, so what?

I be­lieve there was a law­suit where some­one alleged that the draft lot­tery was un­fair, be­cause the slips with names on them were not be­ing mixed thor­oughly enough; and the judge replied, “To whom is it un­fair?”

To make the coin­flip ex­per­i­ment re­peat­able, as fre­quen­tists are wont to de­mand, we could build an au­to­mated coin­flip­per, and ver­ify that the re­sults were 50% heads and 50% tails. But maybe a robot with ex­tra-sen­si­tive eyes and a good grasp of physics, watch­ing the aut­oflip­per pre­pare to flip, could pre­dict the coin’s fall in ad­vance—not with cer­tainty, but with 90% ac­cu­racy. Then what would the real prob­a­bil­ity be?

There is no “real prob­a­bil­ity”. The robot has one state of par­tial in­for­ma­tion. You have a differ­ent state of par­tial in­for­ma­tion. The coin it­self has no mind, and doesn’t as­sign a prob­a­bil­ity to any­thing; it just flips into the air, ro­tates a few times, bounces off some air molecules, and lands ei­ther heads or tails.

So that is the Bayesian view of things, and I would now like to point out a cou­ple of clas­sic brain­teasers that de­rive their brain-teas­ing abil­ity from the ten­dency to think of prob­a­bil­ities as in­her­ent prop­er­ties of ob­jects.

Let’s take the old clas­sic: You meet a math­e­mat­i­cian on the street, and she hap­pens to men­tion that she has given birth to two chil­dren on two sep­a­rate oc­ca­sions. You ask: “Is at least one of your chil­dren a boy?” The math­e­mat­i­cian says, “Yes, he is.”

What is the prob­a­bil­ity that she has two boys? If you as­sume that the prior prob­a­bil­ity of a child be­ing a boy is 12, then the prob­a­bil­ity that she has two boys, on the in­for­ma­tion given, is 13. The prior prob­a­bil­ities were: 14 two boys, 12 one boy one girl, 14 two girls. The math­e­mat­i­cian’s “Yes” re­sponse has prob­a­bil­ity ~1 in the first two cases, and prob­a­bil­ity ~0 in the third. Renor­mal­iz­ing leaves us with a 13 prob­a­bil­ity of two boys, and a 23 prob­a­bil­ity of one boy one girl.

But sup­pose that in­stead you had asked, “Is your el­dest child a boy?” and the math­e­mat­i­cian had an­swered “Yes.” Then the prob­a­bil­ity of the math­e­mat­i­cian hav­ing two boys would be 12. Since the el­dest child is a boy, and the younger child can be any­thing it pleases.

Like­wise if you’d asked “Is your youngest child a boy?” The prob­a­bil­ity of their be­ing both boys would, again, be 12.

Now, if at least one child is a boy, it must be ei­ther the old­est child who is a boy, or the youngest child who is a boy. So how can the an­swer in the first case be differ­ent from the an­swer in the lat­ter two?

Or here’s a very similar prob­lem: Let’s say I have four cards, the ace of hearts, the ace of spades, the two of hearts, and the two of spades. I draw two cards at ran­dom. You ask me, “Are you hold­ing at least one ace?” and I re­ply “Yes.” What is the prob­a­bil­ity that I am hold­ing a pair of aces? It is 15. There are six pos­si­ble com­bi­na­tions of two cards, with equal prior prob­a­bil­ity, and you have just elimi­nated the pos­si­bil­ity that I am hold­ing a pair of twos. Of the five re­main­ing com­bi­na­tions, only one com­bi­na­tion is a pair of aces. So 15.

Now sup­pose that in­stead you asked me, “Are you hold­ing the ace of spades?” If I re­ply “Yes”, the prob­a­bil­ity that the other card is the ace of hearts is 13. (You know I’m hold­ing the ace of spades, and there are three pos­si­bil­ities for the other card, only one of which is the ace of hearts.) Like­wise, if you ask me “Are you hold­ing the ace of hearts?” and I re­ply “Yes”, the prob­a­bil­ity I’m hold­ing a pair of aces is 13.

But then how can it be that if you ask me, “Are you hold­ing at least one ace?” and I say “Yes”, the prob­a­bil­ity I have a pair is 1/​5? Either I must be hold­ing the ace of spades or the ace of hearts, as you know; and ei­ther way, the prob­a­bil­ity that I’m hold­ing a pair of aces is 13.

How can this be? Have I mis­calcu­lated one or more of these prob­a­bil­ities?

If you want to figure it out for your­self, do so now, be­cause I’m about to re­veal...

That all stated calcu­la­tions are cor­rect.

As for the para­dox, there isn’t one. The ap­pear­ance of para­dox comes from think­ing that the prob­a­bil­ities must be prop­er­ties of the cards them­selves. The ace I’m hold­ing has to be ei­ther hearts or spades; but that doesn’t mean that your knowl­edge about my cards must be the same as if you knew I was hold­ing hearts, or knew I was hold­ing spades.

It may help to think of Bayes’s The­o­rem:

P(H|E) = P(E|H)P(H) /​ P(E)

That last term, where you di­vide by P(E), is the part where you throw out all the pos­si­bil­ities that have been elimi­nated, and renor­mal­ize your prob­a­bil­ities over what re­mains.

Now let’s say that you ask me, “Are you hold­ing at least one ace?” Be­fore I an­swer, your prob­a­bil­ity that I say “Yes” should be 56.

But if you ask me “Are you hold­ing the ace of spades?”, your prior prob­a­bil­ity that I say “Yes” is just 12.

So right away you can see that you’re learn­ing some­thing very differ­ent in the two cases. You’re go­ing to be elimi­nat­ing some differ­ent pos­si­bil­ities, and renor­mal­iz­ing us­ing a differ­ent P(E). If you learn two differ­ent items of ev­i­dence, you shouldn’t be sur­prised at end­ing up in two differ­ent states of par­tial in­for­ma­tion.

Similarly, if I ask the math­e­mat­i­cian, “Is at least one of your two chil­dren a boy?” I ex­pect to hear “Yes” with prob­a­bil­ity 34, but if I ask “Is your el­dest child a boy?” I ex­pect to hear “Yes” with prob­a­bil­ity 12. So it shouldn’t be sur­pris­ing that I end up in a differ­ent state of par­tial knowl­edge, de­pend­ing on which of the two ques­tions I ask.

The only rea­son for see­ing a “para­dox” is think­ing as though the prob­a­bil­ity of hold­ing a pair of aces is a prop­erty of cards that have at least one ace, or a prop­erty of cards that hap­pen to con­tain the ace of spades. In which case, it would be para­dox­i­cal for card-sets con­tain­ing at least one ace to have an in­her­ent pair-prob­a­bil­ity of 15, while card-sets con­tain­ing the ace of spades had an in­her­ent pair-prob­a­bil­ity of 13, and card-sets con­tain­ing the ace of hearts had an in­her­ent pair-prob­a­bil­ity of 13.

Similarly, if you think a 13 prob­a­bil­ity of be­ing both boys is an in­her­ent prop­erty of child-sets that in­clude at least one boy, then that is not con­sis­tent with child-sets of which the el­dest is male hav­ing an in­her­ent prob­a­bil­ity of 12 of be­ing both boys, and child-sets of which the youngest is male hav­ing an in­her­ent 12 prob­a­bil­ity of be­ing both boys. It would be like say­ing, “All green ap­ples weigh a pound, and all red ap­ples weigh a pound, and all ap­ples that are green or red weigh half a pound.”

That’s what hap­pens when you start think­ing as if prob­a­bil­ities are in things, rather than prob­a­bil­ities be­ing states of par­tial in­for­ma­tion about things.

Prob­a­bil­ities ex­press un­cer­tainty, and it is only agents who can be un­cer­tain. A blank map does not cor­re­spond to a blank ter­ri­tory. Ig­no­rance is in the mind.