0 And 1 Are Not Probabilities

One, two, and three are all in­te­gers, and so is nega­tive four. If you keep count­ing up, or keep count­ing down, you’re bound to en­counter a whole lot more in­te­gers. You will not, how­ever, en­counter any­thing called “pos­i­tive in­finity” or “nega­tive in­finity,” so these are not in­te­gers.

Pos­i­tive and nega­tive in­finity are not in­te­gers, but rather spe­cial sym­bols for talk­ing about the be­hav­ior of in­te­gers. Peo­ple some­times say some­thing like, “5 + in­finity = in­finity,” be­cause if you start at 5 and keep count­ing up with­out ever stop­ping, you’ll get higher and higher num­bers with­out limit. But it doesn’t fol­low from this that “in­finity—in­finity = 5.” You can’t count up from 0 with­out ever stop­ping, and then count down with­out ever stop­ping, and then find your­self at 5 when you’re done.

From this we can see that in­finity is not only not-an-in­te­ger, it doesn’t even be­have like an in­te­ger. If you un­wisely try to mix up in­fini­ties with in­te­gers, you’ll need all sorts of spe­cial new in­con­sis­tent-seem­ing be­hav­iors which you don’t need for 1, 2, 3 and other ac­tual in­te­gers.

Even though in­finity isn’t an in­te­ger, you don’t have to worry about be­ing left at a loss for num­bers. Although peo­ple have seen five sheep, mil­lions of grains of sand, and sep­til­lions of atoms, no one has ever counted an in­finity of any­thing. The same with con­tin­u­ous quan­tities—peo­ple have mea­sured dust specks a mil­lime­ter across, an­i­mals a me­ter across, cities kilo­me­ters across, and galax­ies thou­sands of lightyears across, but no one has ever mea­sured any­thing an in­finity across. In the real world, you don’t need a whole lot of in­finity.1

In the usual way of writ­ing prob­a­bil­ities, prob­a­bil­ities are be­tween 0 and 1. A coin might have a prob­a­bil­ity of 0.5 of com­ing up tails, or the weath­er­man might as­sign prob­a­bil­ity 0.9 to rain to­mor­row.

This isn’t the only way of writ­ing prob­a­bil­ities, though. For ex­am­ple, you can trans­form prob­a­bil­ities into odds via the trans­for­ma­tion O = (P∕(1 - P)). So a prob­a­bil­ity of 50% would go to odds of 0.5/​0.5 or 1, usu­ally writ­ten 1:1, while a prob­a­bil­ity of 0.9 would go to odds of 0.9/​0.1 or 9, usu­ally writ­ten 9:1. To take odds back to prob­a­bil­ities you use P = (O∕(1 + O)), and this is perfectly re­versible, so the trans­for­ma­tion is an iso­mor­phism—a two-way re­versible map­ping. Thus, prob­a­bil­ities and odds are iso­mor­phic, and you can use one or the other ac­cord­ing to con­ve­nience.

For ex­am­ple, it’s more con­ve­nient to use odds when you’re do­ing Bayesian up­dates. Let’s say that I roll a six-sided die: If any face ex­cept 1 comes up, there’s a 10% chance of hear­ing a bell, but if the face 1 comes up, there’s a 20% chance of hear­ing the bell. Now I roll the die, and hear a bell. What are the odds that the face show­ing is 1? Well, the prior odds are 1:5 (cor­re­spond­ing to the real num­ber 15 = 0.20) and the like­li­hood ra­tio is 0.2:0.1 (cor­re­spond­ing to the real num­ber 2) and I can just mul­ti­ply these two to­gether to get the pos­te­rior odds 2:5 (cor­re­spond­ing to the real num­ber 25 or 0.40). Then I con­vert back into a prob­a­bil­ity, if I like, and get (0.4∕1.4) = 2∕7 = ~29%.

So odds are more man­age­able for Bayesian up­dates—if you use prob­a­bil­ities, you’ve got to de­ploy Bayes’s The­o­rem in its com­pli­cated ver­sion. But prob­a­bil­ities are more con­ve­nient for an­swer­ing ques­tions like “If I roll a six-sided die, what’s the chance of see­ing a num­ber from 1 to 4?” You can add up the prob­a­bil­ities of 16 for each side and get 46, but you can’t add up the odds ra­tios of 0.2 for each side and get an odds ra­tio of 0.8.

Why am I say­ing all this? To show that “odd ra­tios” are just as le­gi­t­i­mate a way of map­ping un­cer­tain­ties onto real num­bers as “prob­a­bil­ities.” Odds ra­tios are more con­ve­nient for some op­er­a­tions, prob­a­bil­ities are more con­ve­nient for oth­ers. A fa­mous proof called Cox’s The­o­rem (plus var­i­ous ex­ten­sions and re­fine­ments thereof) shows that all ways of rep­re­sent­ing un­cer­tain­ties that obey some rea­son­able-sound­ing con­straints, end up iso­mor­phic to each other.

Why does it mat­ter that odds ra­tios are just as le­gi­t­i­mate as prob­a­bil­ities? Prob­a­bil­ities as or­di­nar­ily writ­ten are be­tween 0 and 1, and both 0 and 1 look like they ought to be read­ily reach­able quan­tities—it’s easy to see 1 ze­bra or 0 uni­corns. But when you trans­form prob­a­bil­ities onto odds ra­tios, 0 goes to 0, but 1 goes to pos­i­tive in­finity. Now ab­solute truth doesn’t look like it should be so easy to reach.

A rep­re­sen­ta­tion that makes it even sim­pler to do Bayesian up­dates is the log odds—this is how E. T. Jaynes recom­mended think­ing about prob­a­bil­ities. For ex­am­ple, let’s say that the prior prob­a­bil­ity of a propo­si­tion is 0.0001—this cor­re­sponds to a log odds of around −40 deci­bels. Then you see ev­i­dence that seems 100 times more likely if the propo­si­tion is true than if it is false. This is 20 deci­bels of ev­i­dence. So the pos­te­rior odds are around −40 dB + 20 dB = −20 dB, that is, the pos­te­rior prob­a­bil­ity is ~0.01.

When you trans­form prob­a­bil­ities to log odds, 0 goes to nega­tive in­finity and 1 goes to pos­i­tive in­finity. Now both in­finite cer­tainty and in­finite im­prob­a­bil­ity seem a bit more out-of-reach.

In prob­a­bil­ities, 0.9999 and 0.99999 seem to be only 0.00009 apart, so that 0.502 is much fur­ther away from 0.503 than 0.9999 is from 0.99999. To get to prob­a­bil­ity 1 from prob­a­bil­ity 0.99999, it seems like you should need to travel a dis­tance of merely 0.00001.

But when you trans­form to odds ra­tios, 0.502 and 0.503 go to 1.008 and 1.012, and 0.9999 and 0.99999 go to 9,999 and 99,999. And when you trans­form to log odds, 0.502 and 0.503 go to 0.03 deci­bels and 0.05 deci­bels, but 0.9999 and 0.99999 go to 40 deci­bels and 50 deci­bels.

When you work in log odds, the dis­tance be­tween any two de­grees of un­cer­tainty equals the amount of ev­i­dence you would need to go from one to the other. That is, the log odds gives us a nat­u­ral mea­sure of spac­ing among de­grees of con­fi­dence.

Us­ing the log odds ex­poses the fact that reach­ing in­finite cer­tainty re­quires in­finitely strong ev­i­dence, just as in­finite ab­sur­dity re­quires in­finitely strong coun­terev­i­dence.

Fur­ther­more, all sorts of stan­dard the­o­rems in prob­a­bil­ity have spe­cial cases if you try to plug 1s or 0s into them—like what hap­pens if you try to do a Bayesian up­date on an ob­ser­va­tion to which you as­signed prob­a­bil­ity 0.

So I pro­pose that it makes sense to say that 1 and 0 are not in the prob­a­bil­ities; just as nega­tive and pos­i­tive in­finity, which do not obey the field ax­ioms, are not in the real num­bers.

The main rea­son this would up­set prob­a­bil­ity the­o­rists is that we would need to red­erive the­o­rems pre­vi­ously ob­tained by as­sum­ing that we can marginal­ize over a joint prob­a­bil­ity by adding up all the pieces and hav­ing them sum to 1.

How­ever, in the real world, when you roll a die, it doesn’t liter­ally have in­finite cer­tainty of com­ing up some num­ber be­tween 1 and 6. The die might land on its edge; or get struck by a me­teor; or the Dark Lords of the Ma­trix might reach in and write “37” on one side.

If you made a mag­i­cal sym­bol to stand for “all pos­si­bil­ities I haven’t con­sid­ered,” then you could marginal­ize over the events in­clud­ing this mag­i­cal sym­bol, and ar­rive at a mag­i­cal sym­bol “T” that stands for in­finite cer­tainty.

But I would rather ask whether there’s some way to de­rive a the­o­rem with­out us­ing magic sym­bols with spe­cial be­hav­iors. That would be more el­e­gant. Just as there are math­e­mat­i­ci­ans who re­fuse to be­lieve in the law of the ex­cluded mid­dle or in­finite sets, I would like to be a prob­a­bil­ity the­o­rist who doesn’t be­lieve in ab­solute cer­tainty.

1I should note for the more so­phis­ti­cated reader that they do not need to write me with elab­o­rate ex­pla­na­tions of, say, the differ­ence be­tween or­di­nal num­bers and car­di­nal num­bers. I’m fa­mil­iar with the differ­ent set-the­o­retic no­tions of in­finity, but I don’t see a good use for them in prob­a­bil­ity the­ory.