# 0 And 1 Are Not Probabilities

One, two, and three are all in­te­gers, and so is nega­tive four. If you keep count­ing up, or keep count­ing down, you’re bound to en­counter a whole lot more in­te­gers. You will not, how­ever, en­counter any­thing called “pos­i­tive in­finity” or “nega­tive in­finity,” so these are not in­te­gers.

Pos­i­tive and nega­tive in­finity are not in­te­gers, but rather spe­cial sym­bols for talk­ing about the be­hav­ior of in­te­gers. Peo­ple some­times say some­thing like, “5 + in­finity = in­finity,” be­cause if you start at 5 and keep count­ing up with­out ever stop­ping, you’ll get higher and higher num­bers with­out limit. But it doesn’t fol­low from this that “in­finity—in­finity = 5.” You can’t count up from 0 with­out ever stop­ping, and then count down with­out ever stop­ping, and then find your­self at 5 when you’re done.

From this we can see that in­finity is not only not-an-in­te­ger, it doesn’t even be­have like an in­te­ger. If you un­wisely try to mix up in­fini­ties with in­te­gers, you’ll need all sorts of spe­cial new in­con­sis­tent-seem­ing be­hav­iors which you don’t need for 1, 2, 3 and other ac­tual in­te­gers.

Even though in­finity isn’t an in­te­ger, you don’t have to worry about be­ing left at a loss for num­bers. Although peo­ple have seen five sheep, mil­lions of grains of sand, and sep­til­lions of atoms, no one has ever counted an in­finity of any­thing. The same with con­tin­u­ous quan­tities—peo­ple have mea­sured dust specks a mil­lime­ter across, an­i­mals a me­ter across, cities kilo­me­ters across, and galax­ies thou­sands of lightyears across, but no one has ever mea­sured any­thing an in­finity across. In the real world, you don’t need a whole lot of in­finity.1

In the usual way of writ­ing prob­a­bil­ities, prob­a­bil­ities are be­tween 0 and 1. A coin might have a prob­a­bil­ity of 0.5 of com­ing up tails, or the weath­er­man might as­sign prob­a­bil­ity 0.9 to rain to­mor­row.

This isn’t the only way of writ­ing prob­a­bil­ities, though. For ex­am­ple, you can trans­form prob­a­bil­ities into odds via the trans­for­ma­tion O = (P∕(1 - P)). So a prob­a­bil­ity of 50% would go to odds of 0.5/​0.5 or 1, usu­ally writ­ten 1:1, while a prob­a­bil­ity of 0.9 would go to odds of 0.9/​0.1 or 9, usu­ally writ­ten 9:1. To take odds back to prob­a­bil­ities you use P = (O∕(1 + O)), and this is perfectly re­versible, so the trans­for­ma­tion is an iso­mor­phism—a two-way re­versible map­ping. Thus, prob­a­bil­ities and odds are iso­mor­phic, and you can use one or the other ac­cord­ing to con­ve­nience.

For ex­am­ple, it’s more con­ve­nient to use odds when you’re do­ing Bayesian up­dates. Let’s say that I roll a six-sided die: If any face ex­cept 1 comes up, there’s a 10% chance of hear­ing a bell, but if the face 1 comes up, there’s a 20% chance of hear­ing the bell. Now I roll the die, and hear a bell. What are the odds that the face show­ing is 1? Well, the prior odds are 1:5 (cor­re­spond­ing to the real num­ber 15 = 0.20) and the like­li­hood ra­tio is 0.2:0.1 (cor­re­spond­ing to the real num­ber 2) and I can just mul­ti­ply these two to­gether to get the pos­te­rior odds 2:5 (cor­re­spond­ing to the real num­ber 25 or 0.40). Then I con­vert back into a prob­a­bil­ity, if I like, and get (0.4∕1.4) = 2∕7 = ~29%.

So odds are more man­age­able for Bayesian up­dates—if you use prob­a­bil­ities, you’ve got to de­ploy Bayes’s The­o­rem in its com­pli­cated ver­sion. But prob­a­bil­ities are more con­ve­nient for an­swer­ing ques­tions like “If I roll a six-sided die, what’s the chance of see­ing a num­ber from 1 to 4?” You can add up the prob­a­bil­ities of 16 for each side and get 46, but you can’t add up the odds ra­tios of 0.2 for each side and get an odds ra­tio of 0.8.

Why am I say­ing all this? To show that “odd ra­tios” are just as le­gi­t­i­mate a way of map­ping un­cer­tain­ties onto real num­bers as “prob­a­bil­ities.” Odds ra­tios are more con­ve­nient for some op­er­a­tions, prob­a­bil­ities are more con­ve­nient for oth­ers. A fa­mous proof called Cox’s The­o­rem (plus var­i­ous ex­ten­sions and re­fine­ments thereof) shows that all ways of rep­re­sent­ing un­cer­tain­ties that obey some rea­son­able-sound­ing con­straints, end up iso­mor­phic to each other.

Why does it mat­ter that odds ra­tios are just as le­gi­t­i­mate as prob­a­bil­ities? Prob­a­bil­ities as or­di­nar­ily writ­ten are be­tween 0 and 1, and both 0 and 1 look like they ought to be read­ily reach­able quan­tities—it’s easy to see 1 ze­bra or 0 uni­corns. But when you trans­form prob­a­bil­ities onto odds ra­tios, 0 goes to 0, but 1 goes to pos­i­tive in­finity. Now ab­solute truth doesn’t look like it should be so easy to reach.

A rep­re­sen­ta­tion that makes it even sim­pler to do Bayesian up­dates is the log odds—this is how E. T. Jaynes recom­mended think­ing about prob­a­bil­ities. For ex­am­ple, let’s say that the prior prob­a­bil­ity of a propo­si­tion is 0.0001—this cor­re­sponds to a log odds of around −40 deci­bels. Then you see ev­i­dence that seems 100 times more likely if the propo­si­tion is true than if it is false. This is 20 deci­bels of ev­i­dence. So the pos­te­rior odds are around −40 dB + 20 dB = −20 dB, that is, the pos­te­rior prob­a­bil­ity is ~0.01.

When you trans­form prob­a­bil­ities to log odds, 0 goes to nega­tive in­finity and 1 goes to pos­i­tive in­finity. Now both in­finite cer­tainty and in­finite im­prob­a­bil­ity seem a bit more out-of-reach.

In prob­a­bil­ities, 0.9999 and 0.99999 seem to be only 0.00009 apart, so that 0.502 is much fur­ther away from 0.503 than 0.9999 is from 0.99999. To get to prob­a­bil­ity 1 from prob­a­bil­ity 0.99999, it seems like you should need to travel a dis­tance of merely 0.00001.

But when you trans­form to odds ra­tios, 0.502 and 0.503 go to 1.008 and 1.012, and 0.9999 and 0.99999 go to 9,999 and 99,999. And when you trans­form to log odds, 0.502 and 0.503 go to 0.03 deci­bels and 0.05 deci­bels, but 0.9999 and 0.99999 go to 40 deci­bels and 50 deci­bels.

When you work in log odds, the dis­tance be­tween any two de­grees of un­cer­tainty equals the amount of ev­i­dence you would need to go from one to the other. That is, the log odds gives us a nat­u­ral mea­sure of spac­ing among de­grees of con­fi­dence.

Us­ing the log odds ex­poses the fact that reach­ing in­finite cer­tainty re­quires in­finitely strong ev­i­dence, just as in­finite ab­sur­dity re­quires in­finitely strong coun­terev­i­dence.

Fur­ther­more, all sorts of stan­dard the­o­rems in prob­a­bil­ity have spe­cial cases if you try to plug 1s or 0s into them—like what hap­pens if you try to do a Bayesian up­date on an ob­ser­va­tion to which you as­signed prob­a­bil­ity 0.

So I pro­pose that it makes sense to say that 1 and 0 are not in the prob­a­bil­ities; just as nega­tive and pos­i­tive in­finity, which do not obey the field ax­ioms, are not in the real num­bers.

The main rea­son this would up­set prob­a­bil­ity the­o­rists is that we would need to red­erive the­o­rems pre­vi­ously ob­tained by as­sum­ing that we can marginal­ize over a joint prob­a­bil­ity by adding up all the pieces and hav­ing them sum to 1.

How­ever, in the real world, when you roll a die, it doesn’t liter­ally have in­finite cer­tainty of com­ing up some num­ber be­tween 1 and 6. The die might land on its edge; or get struck by a me­teor; or the Dark Lords of the Ma­trix might reach in and write “37” on one side.

If you made a mag­i­cal sym­bol to stand for “all pos­si­bil­ities I haven’t con­sid­ered,” then you could marginal­ize over the events in­clud­ing this mag­i­cal sym­bol, and ar­rive at a mag­i­cal sym­bol “T” that stands for in­finite cer­tainty.

But I would rather ask whether there’s some way to de­rive a the­o­rem with­out us­ing magic sym­bols with spe­cial be­hav­iors. That would be more el­e­gant. Just as there are math­e­mat­i­ci­ans who re­fuse to be­lieve in the law of the ex­cluded mid­dle or in­finite sets, I would like to be a prob­a­bil­ity the­o­rist who doesn’t be­lieve in ab­solute cer­tainty.

1I should note for the more so­phis­ti­cated reader that they do not need to write me with elab­o­rate ex­pla­na­tions of, say, the differ­ence be­tween or­di­nal num­bers and car­di­nal num­bers. I’m fa­mil­iar with the differ­ent set-the­o­retic no­tions of in­finity, but I don’t see a good use for them in prob­a­bil­ity the­ory.

• hmm… I feel even more con­fi­dent about the ex­is­tence of prob­a­bil­ity-zero state­ments than I feel about the ex­is­tence of prob­a­bil­ity-1 state­ments. Be­cause not only do we have log­i­cal con­tra­dic­tions, but we also have in­co­her­ent state­ments (like Husserl’s “the green is ei­ther”).

Can one form sub­jec­tive prob­a­bil­ities over the truth of “the green is ei­ther” at all? I don’t think so, but I re­mem­ber a some-months-ago sug­ges­tion of Robin’s about “im­pos­si­ble pos­si­ble wor­lds,” which might also im­ply the abil­ity to form prob­a­bil­ity es­ti­mates over in­co­heren­cies. (Why not in­co­her­ent wor­lds? One might ask.) So the idea is at least po­ten­tially on the table.

And then it seems ob­vi­ous that we will for­ever, across all space and time, have no ev­i­dence to sup­port an in­co­her­ent propo­si­tion. That’s as good an ap­prox­i­ma­tion of in­finite lack of ev­i­dence as I can come up with. P(“the green is ei­ther”)=0?

• If you as­sign 0 to log­i­cal con­tra­dic­tions, you should as­sign 1 to the nega­tions of log­i­cal con­tra­dic­tions. (Par­tic­u­larly since your con­fi­dence in bi­valence and the power of nega­tion is what al­lowed you to doubt the truth of the con­tra­dic­tion in the first place.) So it’s strange to say that you feel safer ap­peal­ing to 0s than to 1s.

For my part, I have a hard time con­vinc­ing my­self that there’s sim­ply no (epistemic) chance that Gra­ham Priest is right. On the other hand, as­sign­ing any value but 1 to the sen­tence “All bach­e­lors are bach­e­lors” just seems per­verse. It seems as though I could only get that sen­tence wrong if I mi­s­un­der­stand it. But what am I as­sign­ing a prob­a­bil­ity to, if not the truth of the sen­tence as I un­der­stand it?

Another way of say­ing this is that I feel queasy as­sign­ing a nonzero prob­a­bil­ity to “Not all bach­e­lors are bach­e­lors,” (i.e., ¬(p → p)) even though I think it prob­a­bly makes some sense to en­ter­tain as a van­ish­ingly small pos­si­bil­ity “All bach­e­lors are non-bach­e­lors” (i.e., p → ¬p, all bach­e­lors are con­tra­dic­tory ob­jects).

• If a state­ment is log­i­cally in­con­sis­tent with it­self, it should not be part of your hy­poth­e­sis space, and thus should not be as­signed a prob­a­bil­ity at all.

• One an­swer would be that an in­co­her­ent propo­si­tion is not a propo­si­tion, and so doesn’t have any prob­a­bil­ity (not even zero, if zero is a prob­a­bil­ity.)

Another an­swer would be that there is some prob­a­bil­ity that you are wrong that the propo­si­tion is in­co­her­ent (you might be for­get­ting your knowl­edge of English), and there­fore also some prob­a­bil­ity that “the green is ei­ther” is both co­her­ent and true.

• It’s difficult to as­sign prob­a­bil­ity to in­co­her­ent state­ments, be­cause since we can’t mean any­thing by them, we can’t as­sert a refer­ent to the state­ment—in that sense, the prob­a­bil­ity is in­de­ter­mi­nate (ad­di­tion­ally, one could eas­ily imag­ine a lan­guage in which a state­ment such as “the green is ei­ther” has a perfectly co­her­ent mean­ing—and we can’t say that’s not what we meant, since we didn’t mean any­thing). Re­call also that each prob­a­bil­ity zero state­ment im­plies a prob­a­bil­ity one state­ment by its de­nial and vice versa, so one is equally ca­pa­ble of imag­in­ing them, if in a con­trived way.

• Put­ting this in a slightly more co­her­ent way. (I was hav­ing some trou­ble un­der­stand­ing the ex­pla­na­tion, so I broke it down into lay­man’s terms, might make it more eas­ily un­der­stand­able)

If I as­sign P(0) to “Green is ei­ther” Then I as­sign P(1) to the state­ment “Green is not ei­ther”

If you as­sign ab­solute cer­tainty to any one state­ment you are, by defi­ni­tion as­sign­ing ab­solute im­pos­si­bil­ity to all other pos­si­bil­ities.

• Put­ting this in a slightly more co­her­ent way. (I was hav­ing some trou­ble un­der­stand­ing the ex­pla­na­tion, so I broke it down into lay­man’s terms, might make it more eas­ily un­der­stand­able)

If I as­sign P(0) to “Green is ei­ther” Then I as­sign P(1) to the state­ment “Green is not ei­ther”

If you as­sign ab­solute cer­tainty to any one state­ment you are, by defi­ni­tion as­sign­ing ab­solute im­pos­si­bil­ity to all other pos­si­bil­ities.

• j.ed­wards, I think your last sen­tence con­vinced me to with­draw the ob­jec­tion—I can’t very well as­sign a prob­a­bil­ity of 1 to ~”the green is ei­ther” can I? Good point, thanks.

• that anec­dote wasn’t amus­ing at all.

and it wasn’t an anec­dote.

and it doesn’t prove the point. all it shows is that a sin­gle per­son didn’t know their 17 times ta­bles off the top of their head. there’s no rea­son to ex­pect some­one to be as con­fi­dent that 51 is or is not prime than 7 is or is not prime—and any­way, the point of the story should have been that, even­tu­ally, 7 might NOT be prime. which it’s always go­ing to be.

i didn’t get it.

• Prob­a­bil­ities of 0 and 1 are per­haps more like the perfectly massless, perfectly in­elas­tic rods we learn about in high school physics—they are use­ful as part of an ideal­ized model which is of­ten suffi­cient to ac­cu­rately pre­dict real-world events, but we know that they are ideal­iza­tions that will never be seen in real life.

How­ever, I think we can as­sign the prime­ness of 7 a value of “so close to 1 that there’s no point in wor­ry­ing about it”.

• Per­haps the only ap­pro­pri­ate uses for prob­a­bil­ity 0 and 1 are to re­fer to log­i­cal con­tra­dic­tions (eg P & !P) and tau­tolo­gies (P → P), rather than real-world prob­a­bil­ities?

• In stark con­trast to this time last week, I now in­ter­nally be­lieve the ti­tle of this post.

I did en­joy “some­thing, some­where, is hav­ing this thought,” Paul, de­spite all its in­her­ent messi­ness.

‘Green is ei­ther’ doesn’t tell us much. As far as we know it’s a non­sen­si­cal state­ment, but I think that makes it more be­liev­able than ‘green is pur­ple’, which makes sense, but seems ex­tremely wrong. You might as well try to as­sign a prob­a­bil­ity to ‘flarg is nar­dle’. I can demon­strate that green isn’t pur­ple, but not that green isn’t ei­ther, nor that flarg isn’t nar­dle.

Is there any­thing truer than ‘7 is prime’? What’s the truest state­ment any­one can come up with? Can we definitely get no closer to 0 than 1, based on J Ed­wards & Paul, above?

• I think you can still have prob­a­bil­ities sum to 1: prob­a­bil­ity 1 would be the the­o­ret­i­cal limit of prob­a­bil­ity reach­ing in­finite cer­ti­tude. Just like you can in­te­grate over the en­tire real line, i.e -∞ to ∞ even though those num­bers don’t ac­tu­ally ex­ist.

• i didn’t get it.

Easy: it’s a demon­stra­tion of how you can never be cer­tain that you haven’t made an er­ror even on the things you’re re­ally sure about.

It’s a cheap, dirty demon­stra­tion, but one nev­er­the­less.

• You seem to think prob­a­bil­ities of 0 and 1 are mys­te­ri­ous or con­tra­dic­tory when dis­cussing ran­dom­ness; they aren’t. When you’re talk­ing about ran­dom­ness, you need to define your sup­port. that mere ac­tion gives you places where the prob­a­bil­ity is zero. For ex­am­ple: Can the time to run 100m ever be nega­tive? No? Then P(t=0) = 1.

No puz­zle there. But you’re trans­fror­ma­tion to log-odds has some reg­u­lar­ity con­di­tions you’re vi­o­lat­ing in those cases: the trans­form is only defined for prob­a­bil­ities in (0,1). But that doesn’t mean log-odds or prob­a­bil­ities are flawed. Prob­a­bil­ities or 0 and 1 -- like log-odds of plus-and-minus in­finity—are just filling in the bound­aries on the sys­tem you’ve cre­ated. Math­e­mat­i­cally, you want to be able to han­dle limits; that means han­dling limits as a prob­a­bil­ity ap­proaches 0 or 1. That’s it.

This shouldn’t be some huge philo­soph­i­cal puz­zle; it’s merely the need to have any math­e­mat­i­cal sys­tem you use be com­plete. Sir David Cox would be the first to tell you that.

• We cer­tainly can talk about the limit of a func­tion whose codomain is a mea­sure of prob­a­bil­ity be­ing 1; the limit of the prob­a­bil­ity of a propo­si­tion as the amount of ev­i­dence in fa­vor of it ap­proaches in­finity is 1. But that doesn’t mean that 1 is a mea­sure of prob­a­bil­ity. In­finity is valid as the limit of a func­tion yield­ing real num­bers, but in­finity is not a real num­ber.

As for your ex­am­ple with the amount of time it takes to run a par­tic­u­lar dis­tance, I can’t be cer­tain that we won’t find a re­gion of space with strange tem­po­ral effects that al­low you to take a walk and ar­rive at your start­ing point be­fore you left. This would al­low you to run a hun­dred me­ters in nega­tive time, in at least one sense of the word. Get­ting that sort of speed from the run­ner’s point of view would be stranger, but the Dark Lords of the Ma­trix could prob­a­bly make it hap­pen.

• Cu­mu­lant—can you state, with in­finite cer­tainty, that no-one will ever run faster than light?

• Well, it does seem like some­one who trav­els back in time to reach the finish be­fore he got there has… not ac­tu­ally fol­lowed the rules of the 100-me­ter dash.

• By the cur­rent model it is im­pos­si­ble for any­thing to move faster than light*, but what is your con­fi­dence in the cur­rent model? Cer­tainly high, but not in­finite. Lets not mix up the map and the ter­ri­tory. As for run­ning faster than light; cer­tainly un­likely, but not in­finitely so. If you define some­thing as im­pos­si­ble in some model, and given that you want a prob­a­bil­ity within that model, or given that model, I don’t know what hap­pens how­ever...

*With cer­tain com­pli­ca­tions.

[Edit: For­mat­ing]

• By the cur­rent model it is im­pos­si­ble for any­thing to move faster than light*, but what is your con­fi­dence in the cur­rent model? Cer­tainly high, but not in­finite. Lets not mix up the map and the ter­ri­tory. As for run­ning faster than light; cer­tainly un­likely, but not in­finitely so. If you define some­thing as im­pos­si­ble in some model, and given that you want a prob­a­bil­ity within that model, or given that model, I don’t know what hap­pens how­ever...

*With cer­tain com­pli­ca­tions.

• Another way to think about prob­a­bil­ities of 0 and 1 is in terms of code length.

Shan­non told us that if we know the prob­a­bil­ity dis­tri­bu­tion of a stream of sym­bols, then the op­ti­mal code length for a sym­bol X is: l(X) = -log p(X)

If you con­sider that an event has zero prob­a­bil­ity, then there’s no point in as­sign­ing a code to it (codespace is a con­served quan­tity, so if you want to get short codes you can’t waste space on events that never hap­pen). But if you think the event has zero prob­a­bil­ity, and then it hap­pens, you’ve got a prob­lem—sys­tem crash or some­thing.

Like­wise, if you think an event has prob­a­bil­ity of one, there’s no point in send­ing ANY bits. The re­ceiver will also know that the event is cer­tain, so he can just in­sert the sym­bol into the stream with­out be­ing told any­thing (this could hap­pen in a sym­bol stream where three As are always fol­lowed by a fourth). But again, if you think the event is cer­tain and then it turns out not to be, you’ve got a prob­lem: the re­ceiver doesn’t get the code you want to send.

If you re­fuse to as­sign zero or unity prob­a­bil­ities to events, then you have a strong guaran­tee that you will always be able to en­code the sym­bols that ac­tu­ally ap­pear. You might not get good code lengths, but you’ll be able to send your mes­sage. So Eliezer’s stance can be in­ter­preted as an in­sis­tence on mak­ing sure there is a code for ev­ery sym­bol se­quence, re­gard­less of whether that se­quence ap­pears to be im­pos­si­ble.

• But then, do you re­ally want to build a bi­nary trans­mit­ter that is pre­pared to han­dle not only se­quences of 0 and 1, but also the oc­ca­sional “ze­brafish” and “Thurs­day” (imag­ine some­how fit­ting these into an elec­tri­cal sig­nal, or don’t, be­cause the whole point is that it can’t be done)? Such a trans­mit­ter has enor­mously in­creased com­plex­ity to han­dle sig­nals that, well… won’t ever hap­pen. I guess you could say the prob­a­bil­ity is low enough that the ex­pected util­ity of deal­ing with it is not worth it. But what about the chance that a “ze­brafish” in the launch codes will wipe out hu­man­ity? Surely that ex­pected util­ity can­not be ig­nored? (Ex­cept it can!)

• Brent,

From what I un­der­stood on read­ing the Wikipe­dia ar­ti­cle on Bayesian prob­a­bil­ity and in­fer­ring from how he writes (and cor­rect me if I’m wrong), Eliezer is talk­ing about your “sub­jec­tive prob­a­bil­ity.” You are a be­ing, have con­scious­ness, and in­ter­pret in­put as in­for­ma­tion. Given a lot of this in­for­ma­tion, you’ve formed an idea that 7 is prime. You’ve also formed an idea that other peo­ple ex­ist, and that the sky is blue, which also have a high sub­jec­tive prob­a­bil­ity in your mind be­cause you have a lot of di­rect in­for­ma­tion to sus­tain that be­lief.

More­over, if you’ve ever been wrong be­fore, hope­fully you’ve no­ticed that you have been wrong be­fore. That’s a lit­tle in­for­ma­tion that “you are some­times wrong about things that you are very sure of”. So, you might ap­ply this in­for­ma­tion to your for­mula of your prob­a­bil­ity of the idea that “7 is prime”, so you still end up with a high prob­a­bil­ity, but not 1.

Now, you might not think that “you are some­times wrong about things that you are sure of” about ev­ery sin­gle sub­ject, such as prime­ness. But, what if you had the in­for­ma­tion that other hu­mans, smart peo­ple, have at some point in the past, in­cor­rectly un­der­stood the prime­ness of a num­ber (the anec­dote). You might state, that “hu­man be­ings are some­times wrong about the prime­ness of a num­ber,” and “I am a hu­man be­ing.” Again, if you in­clude that in­for­ma­tion in your calcu­la­tion of the prob­a­bil­ity that the idea that “7 is prime” is true, then you end up with a high prob­a­bil­ity, but not 1.

(Oh, but what if you didn’t make the state­ment “hu­man be­ings are some­times wrong about the prime­ness of a num­ber”, but in­stead, “this idiot is some­times wrong about the prime­ness of a num­ber, but I am never” Well, you can. That’s one big prob­lem with Bayesian sub­jec­tive prob­a­bil­ities. How do we gen­er­al­ize? How can we for­mal­ize it so that two peo­ple with the same in­for­ma­tion de­ter­minis­ti­cally get the same prob­a­bil­ity? Log­i­cal (or ob­jec­tive epistemic) prob­a­bil­ity at­tempts to an­swer these ques­tions.)

So, you’re right that it is just “a sin­gle per­son” get­ting it wrong, that his cerainty was in­cor­rect. But that’s Eliezer’s point. We are not supreme be­ings lord­ing over all re­al­ity, we are hu­mans who have mem­o­rized some in­for­ma­tion from the past and made some gen­er­al­iza­tions, in­clud­ing gen­er­al­iza­tions that some­times our gen­er­al­iza­tions are wrong.

• I agree with cu­mu­lant. The math­e­mat­i­cal sub­ject of prob­a­bil­ity is based on mea­sure the­ory, which loses a ton of con­ver­gence the­o­rems if we ex­clude 0 and 1. We can agree that things that are not known a pri­ori can’t have prob­a­bil­ity 0 or 1, but I think we must also agree that “an im­pos­si­ble thing will hap­pen soon” has prob­a­bil­ity 0, be­cause it’s a con­tra­dic­tion. An al­ter­nate uni­verse in which the num­ber 7 (in the same kind of num­ber sys­tem as ours, etc.) is prime is damn-near in­con­ceiv­able, but an al­ter­nate uni­verse in which im­pos­si­ble things are pos­si­ble is purely ab­surd.

If our math­e­mat­i­cal rea­son­ing is co­her­ent enough for it to be mean­ingful to make prob­a­bil­ity as­sign­ments then cer­tainly we are not so fun­da­men­tally flawed that what we con­sider tau­tolo­gies could be false. If you are will­ing to ac­cept that maybe 0 is 1, then you can’t do any of your prob­a­bil­ity ad­just­ments, or use Bayes’ The­o­rem, or any­thing of the sort with­out hav­ing a (pos­si­bly un­stated) caveat that prob­a­bil­ity the­ory might be com­plete non­sense. But what’s the prob­a­bil­ity that prob­a­bil­ity the­ory is non­sense (i.e. false or in­con­sis­tent)? What does that even mean? We can only as­sign a prob­a­bil­ity if that makes sense, so con­di­tioned on the sen­tence mak­ing sense, prob­a­bil­ity the­ory must be non­sense with prob­a­bil­ity 0, no? So av­er­aged over all pos­si­ble uni­verses (those where prob­a­bil­ity the­ory makes sense, and those where it doesn’t) the sen­tence “prob­a­bil­ity makes sense with prob­a­bil­ity 1” bet­ter ap­prox­i­mates the truth value of prob­a­bil­ity mak­ing sense than “prob­a­bil­ity makes sense with prob­a­bil­ity p” for p0. If it’s not, it’s still not worse, but what the hell are we even say­ing?

• Speak­ing of mea­sure the­ory, what prob­a­bil­ity should we as­sign to a uniformly dis­tributed ran­dom real num­ber on the in­ter­val [0, 1] be­ing ra­tio­nal? Some­thing big­ger than 0? Maybe in prac­tice we would never hold a uniform dis­tri­bu­tion over [0, 1] but would as­sign greater prob­a­bil­ity to “spe­cial” num­bers (like, say, 12). But re­gard­less of our prob­a­bil­ity dis­tri­bu­tion, there will ex­ist sub­sets of [0, 1] to which we must as­sign prob­a­bil­ity 0.

The only way I can see around this is to re­fuse to talk about in­finite (or at least un­countable) sets. Are there oth­ers?

• I sus­pect Eliezer would ob­ject to my post claiming that I’m con­fus­ing map and ter­ri­tory, but I don’t think that’s fair. If there’s a map you’re try­ing to use all over the place (and you do seem to), then I claim it makes no sense to put a lit­tle re­gion on the map la­bel­led “maybe this map doesn’t make any sense at all”. If the map seems to make sense and you’re still fol­low­ing it for ev­ery­thing, you’ll have to ig­nore that re­gion any­way. So is it re­ally rea­son­able to claim that “the prob­a­bil­ity that prob­a­bil­ity makes sense is <1″?

Utili­tar­ian:

Mea­sure the­ory gives a clear an­swer to this: it’s 0. Which is fine. For all x, the prob­a­bil­ity that your rv will take the value x is 0. Ac­tu­ally the prob­a­bil­ity that your rv is com­putable is also 0. (Com­putable num­bers are the largest countable class I know of.) What’s false is the tempt­ing state­ment that prob­a­bil­ity 0 events are im­pos­si­ble. It’s only the con­verse that’s true: im­pos­si­ble events have prob­a­bil­ity 0. There’s an­other tempt­ing state­ment that’s false, namely the state­ment that if S is an ar­bi­trary col­lec­tion of dis­joint events, the prob­a­bil­ity of one of them hap­pen­ing is the sum of the prob­a­bil­ities of each one hap­pen­ing. In­stead, this only holds for countable sets S. This is part of the defi­ni­tion of a mea­sure.

• If there’s a map you’re try­ing to use all over the place (and you do seem to), then I claim it makes no sense to put a lit­tle re­gion on the map la­bel­led “maybe this map doesn’t make any sense at all”. If the map seems to make sense and you’re still fol­low­ing it for ev­ery­thing, you’ll have to ig­nore that re­gion any­way.

Janos, are you say­ing that it is in fact im­pos­si­ble that your map in fact doesn’t make any sense? Be­cause I do, in­deed, have a lit­tle sec­tion of my map la­bel­led “maybe this map doesn’t make any sense at all”, and ev­ery now and then, I think about it a lit­tle, be­cause there are so many fun­da­men­tal premises of which I am un­sure even in their defi­ni­tions. (E.g: “the uni­verse ex­ists”, and “but why?”) Just be­cause this area of my map drops out of my ev­ery­day de­ci­sion the­ory due to failure to gen­er­ate co­her­ent ad­vice on prefer­ences, does not mean it is ab­sent from my map. “You must ig­nore” or rather “You should usu­ally ig­nore” is de­ci­sion the­ory, and prob­a­bil­ity the­ory should usu­ally be fire­walled off from prefer­ences.

Com­putable num­bers are the largest countable class I know of.

Either all countable sets are the same size any­way, or you can gen­er­ate a larger set by say­ing “all com­putable re­als plus the halt­ing prob­a­bil­ity”. How about com­putable with var­i­ous or­a­cles?

What’s false is the tempt­ing state­ment that prob­a­bil­ity 0 events are im­pos­si­ble. It’s only the con­verse that’s true: im­pos­si­ble events have prob­a­bil­ity 0.

If you can­not re­pose prob­a­bil­ity 1 in the state­ment “all events to which I as­sign prob­a­bil­ity 0 are im­pos­si­ble” you should ap­ply a cor­rec­tion and stop repos­ing prob­a­bil­ity 0 to those events. Do you mean to say that all im­pos­si­ble events have prob­a­bil­ity 0, plus some more pos­si­ble events also have prob­a­bil­ity 0? This makes no sense, es­pe­cially as a jus­tifi­ca­tion for us­ing “prob­a­bil­ity 0″ in a mean­ingfully cal­ibrated sense.

To use “prob­a­bil­ity 0” with­out a finite ex­pec­ta­tion of be­ing in­finitely sur­prised, you must re­pose prob­a­bil­ity 1 in the be­lief that you use “prob­a­bil­ity 0″ only for ac­tu­ally im­pos­si­ble events; but not nec­es­sar­ily be­lieve that you as­sign prob­a­bil­ity 0 to ev­ery im­pos­si­ble event (satis­fy­ing both con­di­tions im­plies log­i­cal om­ni­science).

I should men­tion that I’m also an in­finite set athe­ist.

• I can ad­mit the pos­si­bil­ity that prob­a­bil­ity doesn’t work, but not have to do any­thing about it. If prob­a­bil­ity doesn’t work and I can’t make ra­tio­nal de­ci­sions, I can ex­pect to be equally screwed no mat­ter what I do, so it can­cels out of the equa­tion.

The defin­able real num­bers are a countable su­per­set of the com­putable ones, I think. (I haven’t stud­ied this for­mally or ex­ten­sively.)

• If you don’t want to as­sume the ex­is­tence of cer­tain propo­si­tions, you’re ask­ing for a prob­a­bil­ity the­ory cor­re­spond­ing to a co-in­tu­tion­is­tic var­i­ant of min­i­mal logic. (Coin­tu­ition­is­tic logic is the logic of af­fir­ma­tively false propo­si­tions, and is some­times called Pop­pe­rian logic.) This is a logic with false, or, and (but not truth), and an op­er­a­tion called co-im­pli­ca­tion, which I will write a <-- b.

Take your event space L to be a dis­tribu­tive lat­tice (with or­der­ing <), which does not nec­es­sar­ily have a top el­e­ment, but does have dual rel­a­tive pseudo-com­ple­ments. Take < to be the or­der­ing on the lat­tice. (a <-- b) if for all x in the lat­tice L,

for all x, b < (a or x) if and only if a <-- b < x

Now, we take a prob­a­bil­ity func­tion to be a func­tion from el­e­ments of L to the re­als, satis­fy­ing the fol­low­ing ax­ioms:

1. P(false) = 0

2. if A < B then P(A) ⇐ P(B)

3. P(A or B) + P(A and B) = P(A) + P(B)

There you go. Prob­a­bil­ity the­ory with­out cer­tainty.

This is not ter­ribly satis­fy­ing, though, since Bayes’s the­o­rem stops work­ing. It fails be­cause con­di­tional prob­a­bil­ities stop work­ing—they arise from a forced nor­mal­iza­tion that oc­curs when you try to con­struct a lat­tice ho­mo­mor­phism be­tween an event space and a con­di­tion­al­ized event space.

That is, in or­di­nary prob­a­bil­ity the­ory (where L is a Boolean alge­bra, and P(true) = 1), you can define a con­di­tion­al­iza­tion space L|A as fol­lows:

L|A = { X in L | X < A } true’ = A false’ = false and’ = and or’ = or not’(X) = not(X) and A P’(X) = P(X)/​P(A)

with a lat­tice ho­mo­mor­phism X|A = X and A

Then, the prob­a­bil­ity of a con­di­tion­al­ized event P’(X|A) = P(X and A)/​P(A), which is just what we’re used to. Note that the defi­ni­tion of P’ is forced by the fact that L|A must be a prob­a­bil­ity space. In the non-cer­tain var­i­ant, there’s no unique defi­ni­tion of P’, so con­di­tional prob­a­bil­ities are not well-defined.

To re­gain some­thing like this for coin­tu­ition­is­tic logic, we can switch to track­ing de­grees of dis­be­lief, rather than de­grees of be­lief. Say that:

1. D(false) = 1

2. for all A, D(A) > 0

3. if A < B then D(A) >= D(B)

4. D(A or B) + D(A and B) = D(A) + D(B)

This will give you the bounds you need to let you need to nail down a con­di­tional dis­be­lief func­tion. I’ll leave that as an ex­er­cise for the reader.

• Hi guys you don’t know me and I pre­fer to stay anony­mous. I look at it back­wards and get the very same re­sult as Eliezer Y. What is to­tal de­gen­er­acy? In prac­tice, it is be­ing to­tal im­per­vi­ous to up­dat­ing, re­gard­less of the mag­ni­tude of the in­for­ma­tion seen (even in­finity). That can only be achieved by uni­tary of nul prob­a­bil­ities as pri­ors. Bayesian up­dat­ing never takes you there (pos­te­ri­ors). And no up­dat­ing can take place from that situ­a­tion. Anonymous

• If the map seems to make sense and you’re still fol­low­ing it for ev­ery­thing, you’ll have to ig­nore that re­gion any­way.

Just cos it’s not a very nice place to visit, doesn’t mean it ain’t on the map. ;)

• “1, 2, and 3 are all in­te­gers, and so is −4. If you keep count­ing up, or keep count­ing down, you’re bound to en­counter a whole lot more in­te­gers. You will not, how­ever, en­counter any­thing called “pos­i­tive in­finity” or “nega­tive in­finity”, so these are not in­te­gers.”

This both­ered me, more to the point, it hit on some stuff I’ve been think­ing about. I re­al­ize I don’t have a very good way to pre­cisely state what I mean by “finite” or “even­tu­ally”

The above, for in­stance, ba­si­cally says “if in­finity is not an in­te­ger, then if I start at an in­te­ger and move an in­te­ger num­ber of steps away from it, I will still be at an in­te­ger that’s not in­finity, there­fore in­finity isn’t an in­te­ger”

But if we al­lowed in­finity to be con­sid­ered an in­te­ger, then we al­low an in­finite num­ber of steps...

How about this: if N is a non in­finite in­te­ger, SN is N’s suc­ces­sor, PN is N’s pre­de­ces­sor, nei­ther SN nor PN will be in­finite. Great, no mat­ter where we start from, we can’t reach an in­finity in one step, so that seems to make this no­tion more solid.

but… if N is an in­finity, then nei­ther SN nor PN (think­ing about or­di­nals now, btw, in­stead of car­di­nals) will be finite. Doh.

So the situ­a­tion seems a bit sym­met­ric here. This is re­ally an­noy­ing to me.

I have as of late been get­ting the no­tion that the no­tions of “finite” and “even­tu­ally” are so tied to the idea of math­e­mat­i­cal in­duc­tion that it’s prob­a­bly best do define the former in terms of the lat­ter… ie, the num­ber of steps from A to A is finite if and only if in­duc­tion ar­gu­ments start­ing from A and go­ing in the di­rec­tion to­ward B ac­tu­ally val­idly prove the rele­vant propo­si­tion for B.

This is a vague no­tion, but near as I can tell, it comes closes to what I ac­tu­ally think I mean when I say some­thing like “finite” or “even­tu­ally reach in a finite num­ber of steps” or some­thing like that.

ie, finite val­ues are ex­actly those crit­ters for which math­e­mat­i­cal in­duc­tion ar­gu­ments can be used on. (maybe this is a bad defi­ni­tion. I’m more stat­ing it as a “here’s my sus­pi­cion of what may be the best ba­sis to re­ally rep­re­sent the con­cept”)

Any­ways, as far as 0,1 not be­ing prob­a­bil­ities… While I agree that one should’t be­lieve a propo­si­tion with prob­a­bil­ity 0 or 1, I’m not sure I’d con­sider them non­prob­a­bil­ities. Per­haps “un­reach­able” prob­a­bil­ities in­stead. Disal­low­ing stuff like sum to 1 nor­mal­iza­tions and so on would seem to re­quire “un­nat­u­ral” hoops to jump through to get around that.

Un­less, of course, some­one has come up with a clean model with­out that. (If so, well, I’m cu­ri­ous too.)

• Eliezer:

I’m not sure what an “in­finite set athe­ist” is, but it seems from your post that you use differ­ent no­tions of prob­a­bil­ity than what I think of as stan­dard mod­ern mea­sure the­ory, which sur­prises me. Utili­tar­ian’s ex­am­ple of a uniform r.v. on [0, 1] is perfect: it must take some value in [0, 1], but for all x it takes value x with prob­a­bil­ity 0. Clearly you can’t say that for all x it’s im­pos­si­ble for the r.v. to take value x, be­cause it must in fact take one of those val­ues. But the prob­a­bil­ities are still 0. Prag­mat­i­cally the way this comes out is that “prob­a­bil­ity 0″ doesn’t im­ply im­pos­si­ble. If you perform an ex­per­i­ment countably-in­finitely many times with the prob­a­bil­ity of a cer­tain out­come be­ing 0 each time, the prob­a­bil­ity of ever get­ting that out­come is 0; in this sense you can say the out­come is al­most im­pos­si­ble. How­ever it’s pos­si­ble that each out­come in­di­vi­d­u­ally is al­most im­pos­si­ble, even though of course the ex­per­i­ment will have an out­come.

You can ob­ject that such ex­per­i­ments are phys­i­cally im­pos­si­ble e.g. be­cause you can only ac­tu­ally mea­sure/​ob­serve countably many out­comes. That’s fine; that just means you can get by with only dis­crete mea­sures. But such as­sump­tions about the real world are not known a pri­ori; I like usual mea­sure the­ory bet­ter, and it seems to do quite a good job of en­com­pass­ing what I would want to mean by “prob­a­bil­ity”, cer­tainly in­clud­ing the dis­crete prob­a­bil­ity spaces in which “prob­a­bil­ity 0″ can safely be in­ter­preted to mean “im­pos­si­ble”.

You’re right, it’s not that hard to come up with larger countable classes of re­als than the com­puta­bles; I just meant that all of the usual, “rolls-off-the-tip-of-your-tongue” classes seem to be sub­sets of the com­puta­bles. But maybe Nick is right, and the defin­ables are broader. I haven’t stud­ied this ei­ther.

And yes, I also some­times think about how as­sump­tions I make about life and the per­cep­ti­ble uni­verse could be wrong, but I do not do this much for math­e­mat­ics that I’ve stud­ied deeply enough, be­cause I’m al­most as con­vinced of its “truth” as I am of my own abil­ity to rea­son, and I don’t see the use in rea­son­ing about what to do if I can’t rea­son. This is dou­bly true if the state­ments I’m con­tem­plat­ing are non­sense un­less the math works.

• Eliezer:

I am cu­ri­ous as to why you asked Peter not to re­peat his stunt.

Also, I would re­ally like to know how con­fi­dent you are in your in­finite set athe­ism and for that mat­ter in your non-stan­dard philos­o­phy of math­e­mat­ics at­ti­tudes in gen­eral.

• Re­gard­ing in­finite set athe­ism:

Is the set of “pos­si­ble land­ing sites of a struck golf ball” finite or in­finite?

In other words, can you finitely pa­ram­e­ter­ize lo­ca­tions in space? Physi­cists nor­mally model “po­si­tion” as n-tu­ples of real num­bers in a co­or­di­nate sys­tem; if they were forced to model po­si­tion dis­cretely, what would hap­pen?

I can claim to see an in­finite set each time I use a ruler...

• Doug S., I be­lieve ac­cord­ing to quan­tum me­chan­ics the small­est unit of length is Planck length and all dis­tances must be finite mul­ti­ples of it.

• Eliezer:

I should men­tion that I’m also an in­finite set athe­ist.

You’ve men­tioned this be­fore, and I have always won­dered: what does this mean? Does it mean that you don’t be­lieve there are any in­finite sets? If so, then you have to be­lieve that a math­e­mat­i­cian who claims the con­trary (and gives the stan­dard proof) is mak­ing a mis­take some­where. What is it?

Frankly, even if you ac­tu­ally are a fini­tist (which I find hard to imag­ine), it doesn’t seem rele­vant to this di­s­uc­s­sion: ev­ery ar­gu­ment you have pre­sented could equally well have been given by some­one who ac­cepts stan­dard math­e­mat­ics, in­clud­ing the ex­is­tence of in­finite sets.

• The na­ture of 0 & 1 as limit cases seem to be fas­ci­nat­ing for the the­o­rists. How­ever, in terms of ‘Over­com­ing Bias’, shouldn’t we be look­ing at more mun­dane con­cep­tions of prob­a­bil­ity ? EY’s posts have drawn at­ten­tion to the idea that the amount of in­for­ma­tion needed to add ad­di­tional cetainty on a propo­si­tion in­creases ex­po­nen­tially while the prob­a­bil­ity in­creases lin­early. This says that in util­i­tar­ian terms, not many situ­a­tions will war­rant chas­ing the ad­di­tional in­for­ma­tion above 99.9% cer­tainty (out­side tech­ni­cal im­ple­men­ta­tions in nu­clear physics, rocket sci­ence or what­ever). 99.9% as a num­ber is taken out of a hat. In hu­man terms, when we say ‘I’m 99.9% sure that 2+2 always =4’, where not talk­ing about 1000 equiv­a­lent state­ments. We’re talk­ing about one state­ment, with a spa­tial rep­re­sen­ta­tion of what ’100% sure’ means with re­spect to that state­ment, and 0.1% of that spa­tial rep­re­sen­ta­tion al­lowed for ‘nig­gling doubts’, of the sort : what have I for­got­ten ? What don’t I know ? What is in­con­ceiv­able for me ? The in­ter­est­ing ques­tion for ‘over­com­ing bias’ is : how do we make that trade­off be­tween seek­ing ad­di­tional in­for­ma­tion on the one hand and ac­cept­ing a limited de­gree of cer­tainty on the other ? As an ex­am­ple (cf. the Evil Lords of the Ma­trix), con­sid­er­ing whether our minds are be­ing con­trol­led by magic mush­rooms from Alpha Pic­toris may some­day in­crease the ‘nig­gling doubt’ range from 0.1% to 5%, but the ev­i­dence would have to be shoved in our faces pretty hard first.

• Doug S., I be­lieve ac­cord­ing to quan­tum me­chan­ics the small­est unit of length is Planck length and all dis­tances must be finite mul­ti­ples of it.

Not in stan­dard quan­tum me­chan­ics. Cer­tain of the many the­o­ries un­sup­ported hy­pothe­ses of quan­tum grav­ity (such as Loop Quan­tum Grav­ity) might say some­thing similar to this, but that doesn’t abol­ish ev­ery in­finite set in the frame­work. The to­tal num­ber of “places where in­finity can hap­pen” in mod­ern mod­els has tended to in­crease, rather than de­crease, over the cen­turies, as mod­els have got­ten more com­plex. One can never prove that na­ture isn’t “aller­gic to in­fini­ties” (the skep­tic can always claim, “wait, but if we looked even closer or farther, maybe we would see a heretofore un­ob­served brick wall”), but this allergy is not some­thing that has been em­piri­cally ob­served.

• I think Eliezer’s “in­finite set athe­ism” is a be­lief that in­finite sets, al­though well-defined math­e­mat­i­cally, do not ex­ist in the “real world”; in other words, that any phys­i­cal phe­nomenon that ac­tu­ally oc­curs can be de­scribed us­ing a finite num­ber of bits. (This can in­clude num­bers with in­finite dec­i­mal ex­pan­sions, as long as they can be gen­er­ated by a finitely long com­puter pro­gram. There­fore, us­ing pi in equa­tions is not pro­hibited, be­cause you’re us­ing the sym­bol “pi” to rep­re­sent the pro­gram, which is finite.)

A con­se­quence of “in­finite set athe­ism” seems to be that the uni­verse is a finite state ma­chine (al­though one that is not nec­es­sar­ily de­ter­minis­tic). Am I un­der­stand­ing this prop­erly?

• What do you mean by “in­finite set athe­ism”? You are es­sen­tially stat­ing that you don’t be­lieve in math­e­mat­i­cal limits—be­cause that is one of the ma­jor con­se­quences of in­finite sets (or se­quences).

## If you don’t be­lieve in those… well, you lose calcu­lus, you lose the den­sity of real num­bers, you lose the need or un­der­stand­ing of man events with prob­a­bil­ity 0 or 1, and you lose the point of Zeno’s Para­dox.

Janos is spot on about mea­sure zero not im­ply­ing im­pos­si­bil­ity. What is the prob­a­bil­ity of a golf ball land­ing at any ex­act point? Zero. But it has to land some­where, so no one point is im­pos­si­ble.

Im­pos­si­bil­ity would mean ab­sence from your sigma alge­bra. What’s that you ask? Without mak­ing this painful, you need three things for prob­a­bil­ity: an idea of what con­sti­tutes “the space of ev­ery­thing”, an idea of what con­sti­tutes pos­si­ble events out of that space which we can con­firm or deny, and an as­sign­ment of num­bers to those events. (This is of­ten LaTeX’ed as (\Omega, \math­cal{F}, P).) The con­ver­sa­tion here seems to be con­fus­ing the fil­tra­tion/​sigma-alge­bra F with the num­bers as­signed to those events by P.

Can we choose which we’re talk­ing about: events or num­bers?

• What is the prob­a­bil­ity of a golf ball land­ing at any ex­act point? Zero.

Wrong.

I don’t know which is more painful: Eliezer’s er­rors, or those of his de­trac­tors.

• Per­haps you could clar­ify what ex­actly is an in­finite set athe­ist in a full post...or maybe it’s only worth a com­ment.

• Cu­mu­lant, I think the idea be­hind “in­finite set athe­ism” is not that limits don’t ex­ist, but that that in­fini­ties are ac­cept­able only as limits ap­proached in a speci­fied way. On this view, limits are not a con­se­quence of in­finite sets, as you con­tend; rather, only the limit ex­ists, and the in­finite set or se­quence is merely a sloppy way of think­ing about the limit.

Eliezer, I’ll sec­ond Matthew’s sug­ges­tion above that you write a post on in­finite set athe­ism; it looks as if we don’t un­der­stand you.

I think I un­der­stand the mo­tive for re­ject­ing in­finite sets (viz., that when­ever you deal with in­finites you get all sorts of ridicu­lously coun­ter­in­tu­itive re­sults—sums com­ing out differ­ent when you reär­range the terms, the Banach-Tarski para­dox, &c., &c.), but I’m not sure you can give up in­finite sets with­out also giv­ing up the real num­bers (as oth­ers have touched on above), which seems very wrong.

• Cale­do­nian: Not wrong. Take the field you’re swing­ing at to be a plane. There are in­finitely many points in that plane; that’s just the den­sity of the re­als.

Now say there is some prob­a­bil­ity den­sity of land­ing spots; and, let’s say no one spot is spe­cial in that it at­tracts golf balls more than points im­me­di­ately nearby (i.e. our pdf is con­tin­u­ous and non-atomic). Right there, you need ev­ery point (as a sin­gle­ton) to have mea­sure 0.

Go pick up Billingsley: mea­sure 0 is not the same as im­pos­si­ble nor does it cause any prob­lems.

• Take the field you’re swing­ing at to be a plane. There are in­finitely many points in that plane; that’s just the den­sity of the re­als.

And the lo­ca­tion that the ball lands on will also be com­posed of in­finitely many re­als. Shall we com­pare the size of two in­finite sets?

• I’d say that the ball is a sphere and con­sider the first point of im­pact (i.e. the tan­gency point of the plane to the sphere). Other­wise, you need to know a lot about the ball and the field where it lands.

You can com­pare in­finite sets. Take the sets A and B, A={1,2,3,...} and B={2,3,4,...}. B is, by con­struc­tion, a sub­set of A. There’s your com­par­i­son; yet, both are in­finite sets.

What as­sump­tions would you make for the golf ball and the field? (To keep things clear, can we define events and prob­a­bil­ities sep­a­rately?)

• Cale­do­nian, ev­ery un­der­grad­u­ate who has ever taken a statis­tics class knows that the prob­a­bil­ity of any sin­gle point in a con­tin­u­ous dis­tri­bu­tion is zero. Prob­a­bil­ities in con­tin­u­ous space are mea­sured on in­ter­vals. Ba­sic calcu­lus...

• I be­lieve ac­cord­ing to quan­tum me­chan­ics the small­est unit of length is Planck length and all dis­tances must be finite mul­ti­ples of it.

This is what I’m given to un­der­stand as well. Doesn’t this take the teeth out of Zeno’s para­dox?

• Prag­mat­i­cally the way this comes out is that “prob­a­bil­ity 0” doesn’t im­ply im­pos­si­ble.

Janos, would you agree that P=0 is a prob­a­bil­ity to the same de­gree that in­finity is a num­ber? Apolo­gies for dou­ble post.

• Cale­do­nian, ev­ery un­der­grad­u­ate who has ever taken a statis­tics class knows that the prob­a­bil­ity of any sin­gle point in a con­tin­u­ous dis­tri­bu­tion is zero.

Gow­der, ev­ery­one who’s ever given the is­sue more than three-sec­onds’-thought knows that no statis­ti­cal re­sult ever in­volves a sin­gle point.

• Usu­ally, if a die lands on edge we say it was a spoiled throw and do it over. Similarly if a Dark Lord writes 37 on the face that lands on top, we com­plain that the Dark Lord is spoiling our game and we don’t count it.

We count 6 pos­si­bil­ities for a 6-sided die, 5 pos­si­bil­ities for a 5-sided die, 2 pos­si­bil­ities for a 2-sided die, and if you have a die with just one face—a spher­i­cal die—what’s the chance that face will come up?

I think it would be in­ter­est­ing to de­velop prob­a­bil­ity the­ory with no bound­aries, with no 0 and 1. It works fine to do it the way it’s done now, and the al­ter­na­tive might turn up some­thing in­ter­est­ing too.

• Ben:

Well, that de­pends on your num­ber sys­tem. For some pur­poses +in­finity is a very use­ful value to have. For in­stance if you con­sider the ex­tended non­nega­tive re­als (i.e. in­clud­ing +in­finity) then ev­ery mea­surable non­nega­tive ex­tended-real-val­ued func­tion on a mea­sure space ac­tu­ally has a well-defined ex­tended-non­nega­tive-real-val­ues in­te­gral. There are all kinds of math­e­mat­i­cal struc­tures where an in­finity el­e­ment (or many) is in­dis­pens­able. It’s a mat­ter of con­text. The ques­tion of what is a “num­ber” is I think very vague given how many in­ter­est­ing num­ber-like no­tions math­e­mat­i­ci­ans have come up with. But un­ques­tion­ably “in­finity” is not a nat­u­ral num­ber, or a real num­ber, or a com­plex num­ber.

Prob­a­bil­ity the­ory, on the other hand, would have to change shape if we com­fortably wanted to ex­clude 0 prob­a­bil­ities. What we now call mea­sures would be wrong for the job. I don’t know how it would look, but I find the stan­dard de­scrip­tion in­tu­itively ap­peal­ing enough that I don’t think it should be changed. It’s prob­a­bly true that for a Bayesian in­fer­ence en­g­ine of some sort, whose pur­pose is to find like­li­hoods of propo­si­tions given ev­i­dence, the “prob­a­bil­ities” it keeps track of shouldn’t be­come 0 or 1. If there’s a rich the­ory there fo­cussing on how to prac­ti­cally do this stuff (and I bet there is, al­though I know noth­ing of it be­yond Bayes’ The­o­rem, which is a sim­ple re­sult) then ig­nor­ing the pos­si­bil­ity of 0s and 1s makes sense there: for ex­am­ple you can use the log odds. But in gen­eral prob­a­bil­ity the­ory? No.

• I think it would be in­ter­est­ing to de­velop prob­a­bil­ity the­ory with no bound­aries, with no 0 and 1. It works fine to do it the way it’s done now, and the al­ter­na­tive might turn up some­thing in­ter­est­ing too.

You might want to check out Kosko’s Fuzzy Think­ing. I haven’t gone any fur­ther into fuzzy logic, yet, but that sounds like some­thing he dis­cussed. Also, he claimed prob­a­bil­ity was a sub­set of fuzzy logic. I in­tend to fol­low that up, but there is only one of me, and I found out a long time ago that they can write it faster than I can read it.

• “On some golf courses, the fair­way is read­ily ac­cessible, and the sand traps are not. The green is ei­ther.”

• Haha, very nice CGD. Shows how much those philoso­phers of lan­guage know about golf. :-)

Although… hmm… in­ter­est­ing. I think that gives us a way to think about an­other prob­a­bil­ity 1 state­ment: state­ments that oc­cupy the en­tire log­i­cal space. Ex­am­ple: “ei­ther there are prob­a­bil­ity 1 state­ments, or there are not prob­a­bil­ity 1 state­ments.” That state­ment seems to be true with prob­a­bil­ity 1...

• Disal­low­ing a sym­bol for “all events” breaks the defi­ni­tion of a prob­a­bil­ity space. It’s prob­a­bly eas­ier to al­low ex­tended re­als and break some field ax­ioms than figure out do rigor­ous prob­a­bil­ity with­out a sigma-alge­bra.

• When re-work­ing this into a book, you need to dou­ble check your con­ver­sions of log odds into deci­bels. By defi­ni­tion, deci­bels are calcu­lated us­ing log base 10, but some of your odds are nat­u­ral log­a­r­ithms, which con­fused the heck out of me when read­ing those para­graphs.

Prob­a­bil­ity .0001 = −40 deci­bels (This is the only cor­rect one in this post, all “deci­bel” figures af­ter­wards are listed as 10 * the nat­u­ral log­a­r­ithm of the odds.) Prob­a­bil­ity 0.502 = 0.035 deci­bels Prob­a­bil­ity 0.503 = 0.052 deci­bels Prob­a­bil­ity 0.9999 = 40 deci­bels Prob­a­bil­ity 0.99999 = 50 decibels

P.S. It’d be nice if you pro­vided an RSS feed for the com­ments on a post, in ad­di­tion to the RSS feed for the posts...

• I can­not be­gin to imag­ine where those num­bers came from. Dangers of “Posted at 1:58 am”, I guess. Fixed.

• Could you re­spond to Neel Kr­ish­naswami’s post above, and this one as well?

• P(A&B)+P(A&~B)+P(~A&B)+P(~A&~B)=1

Isn’t the “1” above a prob­a­bil­ity?

• My in­tu­tion as a math­e­mat­i­cian de­clares that no­body will never de­velop an el­e­gant math­e­mat­i­cal for­mu­la­tion of prob­a­bil­ity the­ory that does not al­low for state­ments that are log­i­cally im­pos­si­ble or cer­tain, such as state­ments of the form p AND NOT p. And it is nec­es­sary, if the the­ory is to be iso­mor­phic to the usual one, that these state­ments have prob­a­bil­ity 0 (if im­pos­si­ble) or 1 (if cer­tain). How­ever, I be­lieve that it is quite rea­son­able to de­clare, as a con­di­tion de­manded of any prior deemed ra­tio­nal, that only truly im­pos­si­ble or cer­tain state­ments have those prob­a­bil­ities. I think that this gives you what you want.

It’s ob­vi­ous that you can make this very de­mand when work­ing with dis­crete prob­a­bil­ity dis­tri­bu­tions. It may not be ob­vi­ous that you can make this de­mand when work­ing with con­tin­u­ous prob­a­bil­ity dis­tri­bu­tions. Cer­tainly the usual the­ory of these, based on so-called ‘mea­sure spaces’ and ‘σ-alge­bras’ (I men­tion those in case they jog the reader’s mem­ory), can­not tol­er­ate this re­quire­ment, at least not if any­thing at all similar to the usual ex­am­ples of con­tin­u­ous dis­tri­bu­tions are al­lowed.

One an­swer is that only dis­crete prob­a­bil­ity dis­tri­bu­tions ap­ply to the real world, in which one can never make mea­sure­ments with in­finite pre­ci­sion or ob­serve an in­finite se­quence of events. Even if the world has in­finite size or is con­tin­u­ous to in­finites­i­mal scales, you will never ob­serve that, so you don’t need to pre­dict any­thing about that.

How­ever, even if you don’t buy this ar­gu­ment, never fear! There is a math­e­mat­i­cal the­ory of prob­a­bil­ity based on ‘pointless mea­sure spaces’ and ‘ab­stract σ-alge­bras’. In this the­ory, it again makes perfect sense to de­mand that any prior must as­sign prob­a­bil­ity 0 or 1 only to im­pos­si­ble or cer­tain events. The idea is that if some­thing can never be ob­served, even in prin­ci­ple, then it is effec­tively im­pos­si­ble, and the ab­stract pointless the­ory al­lows one to treat it as such.

Then I agree that one should re­quire, as a con­di­tion on con­sid­er­ing a prior to be ra­tio­nal, that it should as­sign prob­a­bil­ity 0 only to these im­pos­si­ble events and as­sign prob­a­bil­ity 1 only to their cer­tain com­ple­ments.

• PS: cu­mu­lant-nim­bus above gives a brief sum­mary of the usual ap­proach to mea­sure the­ory. The pointless ap­proach that I ad­vo­cate can be sug­gested from that as fol­lows: taboo \Omega. Neel Kr­ish­na­murti’s com­ment is im­plic­itly us­ing the pointless ap­proach; his event space is cu­mu­lant-nim­bus’s \math­cal{F}, and he works en­tirely in terms of events.

• Thanks for the link. It sounds like Yud­kowsky is ar­gu­ing some­thing quite close to Cromwell’s Rule, with a slight tech­ni­cal differ­ence. From the Wikipe­dia ar­ti­cle:

...the use of prior prob­a­bil­ities of 0 or 1 should be avoided, ex­cept when ap­plied to state­ments that are log­i­cally true or false.

Yud­kowsky would ar­gue that for­mal logic is not part of the ter­ri­tory, but rather part of our map (per­haps sur­vey­ing equip­ment would be a good anal­ogy, since the com­pass anal­ogy is already taken by “moral com­pass”). As such, not even for­mal math­e­mat­i­cal logic should be pre­sumed to have 100% cer­tainty.

Of course, this raises the prob­lem of con­stantly hav­ing to in­clude the term p(math is fun­da­men­tally flawed) ev­ery­where. in­stead of just writ­ing p(heads) when calcu­lat­ing the odds of a coin flip or flips, now we’d have to use p(heads | ~math is fun­da­men­tally flawed). As a mat­ter of sheer con­ve­nience, it would be eas­ier to just add it to the list of ax­ioms sup­port­ing the fun­da­men­tal the­o­rems that the rest of math­e­mat­ics is built on.

But that’s just se­man­tics, I sup­pose. Wikipe­dia has a cou­ple more in­ter­est­ing tid­bits, that I’ve fished out for fu­ture read­ers:

The refer­ence is to Oliver Cromwell. Cromwell wrote to the synod of the Church of Scot­land on Au­gust 5 1650, in­clud­ing a phrase that has be­come well known and fre­quently quoted:

“I be­seech you, in the bow­els of Christ, think it pos­si­ble that you may be mis­taken.”

As Lindley puts it, as­sign­ing a prob­a­bil­ity should “leave a lit­tle prob­a­bil­ity for the moon be­ing made of green cheese; it can be as small as 1 in a mil­lion, but have it there since oth­er­wise an army of as­tro­nauts re­turn­ing with sam­ples of the said cheese will leave you un­moved.” Similarly, in as­sess­ing the like­li­hood that toss­ing a coin will re­sult in ei­ther a head or a tail fac­ing up­wards, there is a pos­si­bil­ity, albeit re­mote, that the coin will land on its edge and re­main in that po­si­tion.

• I’m kinda sur­prised that it’s only been men­tioned once in the com­ments (I only just dis­cov­ered this site, re­ally re­ally great, by the way) and one from 2010 at that, but it seems to me that “a mag­i­cal sym­bol to stand for “all pos­si­bil­ities I haven’t con­sid­ered” ” does ex­ist: the sym­bol “~” (i.e. not). Even the com­menter who does men­tion it makes things com­pli­cated for him­self: P(Q or ~Q)=1 is the sim­plest ex­am­ple of a propo­si­tion with prob­a­bil­ity 1.

The propo­si­tion is of course a tau­tol­ogy. I do think (but I’m not sure) that that is the only sort of state­ment that re­ceives prob­a­bil­ity 1. This is in sync with Eliezer’s “amount of ev­i­dence” in­ter­pre­ta­tion. A bayesian up­date can only gen­er­ate 1 if the ini­tial propo­si­tion was of prob­a­bil­ity 1 or if the ev­i­dence was tau­tolog­i­cal (i.e. if Q then Q or, slightly less lame, if “Q or R” and “~R” then Q, where “Q or R” and “~R” are the ev­i­dence).

Skim­ming the com­ments, I saw two other pro­pos­als for “sure bets”, the run­ner who clocked a nega­tive time and the golf ball land­ing in a par­tic­u­lar spot. That last one de­gen­er­ated pretty quickly into a dis­cus­sion about how many points there are in a field and on a ball. I think that’s typ­i­cal of such ar­gu­ments: it de­pends on your model. Once you have your model speci­fied the prob­a­bil­ity be­comes 1 (or not) if the state­ment is (or isn’t) tau­tolog­i­cal in the model. If the model isn’t speci­fied, then nei­ther is the state­ment (what is a pre­cise point?) and hence the prob­a­bil­ity. Ask the next man what the prob­a­bil­ity is of a run­ner clock­ing a nega­tive time and he’ll rightly re­spond: “Huh?” (un­less he is a par­tic­u­larly obfus­ca­tory know-it-all, in which case he might start blab­ber­ing about the speed of light. But then too, he makes a claim be­cause he can as­cribe mean­ing to the ques­tion, that is, he picks his model). So these are also tau­tolog­i­cal ex­am­ples.

I think Eliezer’s hold up pretty well for propo­si­tion that aren’t tau­tolog­i­cal and hence em­piri­cal in na­ture: they re­quire ev­i­dence and only tau­tolog­i­cal ev­i­dence will suffice for cer­tainty.

About the prob­lem of in­sert­ing 0′s in cer­tain stan­dard the­o­rems: I don’t see a prob­lem with Bayes’ the­o­rem (I’m cu­ri­ous about other ex­am­ples). Di­vid­ing by 0 is not defined, so the prob­a­bil­ity of it rain­ing when hell freezes over is not defined. That seems like a satis­fac­tory ar­range­ment.

• Jaynes avoids P(A|B) for “prob­a­bil­ity of A given ev­i­dence B” and P(B) for “prob­a­bil­ity of B”, prefer­ring P(A|BX) and P(B|X) where X is one’s back­ground knowl­edge. This and the above leads nat­u­rally to the ques­tion of ~X: the situ­a­tion in which one’s “back­ground knowl­edge” is false.

As­sume that back­ground knowl­edge X is the con­junc­tion of a finite num­ber of propo­si­tions. ~X is true if any of these propo­si­tions is false. If we can fac­tor X into YZ where Y is the por­tion we sus­pect of be­ing false — that is, if we can iso­late for test­ing a por­tion of those be­liefs we pre­vi­ously treated as “back­ground knowl­edge” — then we can ask about P(A|BYZ) and P(A|B·~Y·Z).

• Thanks for the anal­y­sis, MathijsJ! It made perfect sense and re­solved most of my ob­jec­tions to the ar­ti­cle.

I was will­ing to ac­cept that we can­not reach ab­solute cer­tainty by ac­cu­mu­lat­ing ev­i­dence, but I also came up with mul­ti­ple log­i­cal state­ments that un­de­ni­ably seemed to have prob­a­bil­ity 1. Read­ing your post, I re­al­ized that my ex­am­ples were all tau­tolo­gies, and that your sug­ges­tion to al­low cer­tainty only for tau­tolo­gies re­solved the dis­crep­ancy.

The Wikipe­dia ar­ti­cle timtyler linked to seems to sup­port this: “Cromwell’s rule [...] states that one should avoid us­ing prior prob­a­bil­ities of 0 or 1, ex­cept when ap­plied to state­ments that are log­i­cally true or false.” This matches your anal­y­sis—you can only be cer­tain of tau­tolo­gies.

Also, your dis­cus­sion of mod­els neatly re­solves the dis­tinc­tion be­tween, say, a math­e­mat­i­cally-defined die (which can be cer­tain to end up show­ing an in­te­ger be­tween 1 and 6) and a real-world die (which can­not quite be known for sure to have ex­actly six sta­ble states).

Eliezer makes his po­si­tion pretty clear: “So I pro­pose that it makes sense to say that 1 and 0 are not in the prob­a­bil­ities; just as nega­tive and pos­i­tive in­finity, which do not obey the field ax­ioms, are not in the real num­bers.”

It’s true—you can­not ever reach a prob­a­bil­ity of 1 if you start at 0.5 and ac­cu­mu­late ev­i­dence, just as you can­not reach in­finity if you start at 0 and add in­te­ger val­ues. And the in­verse is true, too—you can­not ac­cu­mu­late ev­i­dence against a tau­tol­ogy and bring its prob­a­bil­ity down to any­thing less than 1. But this doesn’t mean a prob­a­bil­ity of 1 is an in­co­her­ent con­cept or any­thing.

Eliezer: if you’re go­ing to say that 0 and 1 are not prob­a­bil­ities, you need to come up with a new term for them. They haven’t gone away com­pletely just be­cause we can’t reach them.

Edit a year and a half later: I agree with the ar­ti­cle as writ­ten, par­tially as a re­sult of read­ing How to Con­vince Me That 2 + 2 = 3, and par­tially as a re­sult of con­clud­ing that “tau­tolo­gies that have prob­a­bil­ity 1 but no bear­ing on re­al­ity” is a use­less con­cept, and that there­fore, “prob­a­bil­ity 1″ is a use­less con­cept.

• For any state of in­for­ma­tion X, we have P(A or not A | X) = 1 and P(A and not A | X) = 0. We have to have 0 and 1 as prob­a­bil­ities for prob­a­bil­ity the­ory even to work. I think you’re tak­ing a rea­son­able idea—that P(A | X) should be nei­ther 0 nor 1 when A is a state­ment about the con­crete phys­i­cal world—and try­ing to ap­ply it be­yond its ap­pli­ca­ble do­main.

• Con­sider the set of all pos­si­ble hy­pothe­ses. This is a countable set, as­sum­ing I ex­press hy­pothe­ses in nat­u­ral lan­guage. It is po­ten­tially in­finite as well, though in prac­tice a finite mind can­not ac­co­mo­date in­fin­tely-long hy­pothe­ses. To each hy­poth­e­sis, I can try to as­sign a prob­a­bil­ity, on the ba­sis of available ev­i­dence. Th­ese prob­a­bil­ities will be be­tween zero and one. What is the prob­a­bil­ity that a ra­tio­nal mind will as­sign at least one hy­poth­e­sis the sta­tus of ab­solute cer­tainty? Either this is one (there is definitely such a hy­poth­e­sis), or zero (there is definitely not such a hy­poth­e­sis, which can­not be, be­cause the hy­poth­e­sis “there is definitely not such a hy­poth­e­sis” is then a coun­terex­am­ple), or some­where in be­tween (there may be, some­where, a hy­poth­e­sis that a ra­tio­nal mind would re­gard as be­ing ab­solutely cer­tain). So I can­not ac­cept your hy­poth­e­sis that there does not ex­ist, any­where, ever, a hy­poth­e­sis that I should re­gard as be­ing ab­solutely cer­tain.

• Self-refer­en­tial hy­pothe­ses do not always map to truth val­ues, and “a ra­tio­nal mind will as­sign at least one hy­poth­e­sis the sta­tus of ab­solute cer­tainty” is self-refer­en­tial. The con­tra­dic­tion you’ve en­coun­tered arises from us­ing a state­ment iso­mor­phic to “this state­ment is false” and re­quiring it to have a truth value, not to a prob­lem with ex­clud­ing 0 and 1 as prob­a­bil­ities.

• Yes 0 and 1 are not prob­a­bil­ities. They’re truth or false­ness val­ues. it’s nec­es­sary to make a third ‘truth value’ for things that are un­prov­able, and pos­si­bly a fourth for things that are untestable.

• Dig­ging up an old thread here, but an in­ter­est­ing point I want to bring up: a friend of mine claims that he in­ter­nally as­signs prob­a­bil­ity 1 (i.e. an undis­prov­able be­lief) only to one state­ment: that the uni­verse is co­her­ent. Be­cause if not, then mn­er­gar­blewtf. Is it rea­son­able to say that even though no state­ment can ac­tu­ally have prob­a­bil­ity 1 if you’re a true Bayesian, it’s rea­son­able to in­ter­nally es­tab­lish an ax­iom which, if negated, would just make the uni­verse com­pletely stupid and not worth liv­ing in any more?

• There’s a lot of logic to that. For ex­tremely un­likely pos­si­bil­ities you can of­ten get away with set­ting their prob­a­bil­ity to 0 to make the calcu­la­tions a lot sim­pler. For pos­si­bil­ities where pre­dicted util­ity is in­de­pen­dent of your ac­tions (like “re­al­ity is just com­pletely ran­dom”) it can also be worth­while set­ting their prob­a­bil­ity to 0 (ie. ig­nor­ing them), since they’re ap­prox­i­mately a con­stant term in ex­pected util­ity. Th­ese are good ways of ap­prox­i­mat­ing ac­tual ex­pected util­ity so you can still mostly make the right de­ci­sions, which bounded ra­tio­nal­ity re­quires.

• What is P(A|A)?

• What do you mean by “|A”? It’s well-defined in math­e­mat­ics, sure, but in real life, surely the fur­thest you can go is “|ex­pe­rience/​per­cep­tion of ev­i­dence for A”.

Also, there’s also the prob­a­bil­ity that the par­tic­u­lar ver­sion of logic you’re us­ing is wrong.

• What do you mean by “|A”? It’s well-defined in math­e­mat­ics, sure, but in real life, surely the fur­thest you can go is “|ex­pe­rience/​per­cep­tion of ev­i­dence for A”.

How far you can go de­pends on what you mean by “go”.

It’s perfectly pos­si­ble to calcu­late, say, P(I see the coin come up heads | the coin is flipped once, it is fair, and I see the out­come), and ac­tu­ally much more difficult to calcu­late P(I see the coin come up heads | I have ex­pe­rience/​per­cep­tion of ev­i­dence for the facts that the coin is flipped once, it is fair, and I see the out­come).

• “I see” is what I meant by per­cep­tion/​ex­pe­rience of ev­i­dence. When­ever I “see” some­thing, there’s always a non-zero chance of my brain de­ceiv­ing me. The only thing you can re­ally have to base your de­ci­sions on is P(I see the coin come up heads | I see/​know the coin is flipped once, I know it is fair, and I see the out­come). P(the coin comes up heads|the coin is flipped once, it is fair and I know the out­come) is pos­si­ble and easy to calcu­late, but not com­pletely ac­cu­rate to the world we live in.

• A char­i­ta­ble para­phrase of “The uni­verse is co­her­ent” could be a state­ment of the uni­ver­sal val­idity of non-con­tra­dic­tion: For ev­ery p, not (p and not p). How­ever, given the ex­is­tence of para­con­sis­tent logic and philoso­phers who take di­alethism se­ri­ously, I can­not as­sign prob­a­bil­ity 1 to the claim that no as­pect of the uni­verse re­quires a con­tra­dic­tion in its de­scrip­tion.

I would go even fur­ther to say that I am quite more cer­tain of many other claims (such as “1+1=2” and “2+2=4″) than of such gen­eral and ab­stract propo­si­tions as “the uni­verse is co­her­ent” or even “there are no true con­tra­dic­tions”.

• I don’t think he goes quite that far—he as­signs no state­ments prob­a­bil­ity 0 or 1 within our own logic sys­tem, even (P and ¬P), be­cause he be­lieves it to be pos­si­ble (though not very likely) that some other logic sys­tem might su­per­sede our own.

His be­lief is that it is not pos­si­ble for ALL sys­tems of logic to be in­cor­rect, i.e. that (it is im­pos­si­ble to rea­son cor­rectly about the uni­verse) is nec­es­sar­ily false.

• No, it’s not. It’s the same fun­da­men­tal mis­take that a lot of re­li­gious rhetoric about “faith” and “mean­ing” is founded on: that want­ing some­thing to be true counts as ev­i­dence that it is true. There’s no rea­son to think that the uni­verse de­pends for any of its prop­er­ties on whether some­one finds it stupid or not, or worth liv­ing in.

I’d also sug­gest you try to draw your friend out a bit on what it means ex­actly for the uni­verse to be “co­her­ent.” Can that no­tion be ex­pressed for­mally? What would we ex­pect to see if we lived in an in­co­her­ent uni­verse?

Ob­vi­ously, I’m du­bi­ous that the “co­her­ence” of the uni­verse is in any proper sense a philo­soph­i­cal or sci­en­tific idea—it sounds a lot more like an aes­thetic one.

• I think he just means “co­her­ent” as “one which we can ac­tu­ally model based on our ob­ser­va­tions”, i.e. one in which this whole ex­er­cise (ra­tio­nal­ity) makes any sense.

He ex­pects that the uni­verse be in­co­her­ent with prob­a­bil­ity zero, and doesn’t think there would be any sen­si­ble ob­ser­va­tions if this were the case (or any ob­ser­va­tion be­ing pos­si­ble if this were the case).

ETA: Mer­riam-Web­ster Defi­ni­tion of COHERENT

1 a : log­i­cally or aes­thet­i­cally or­dered or in­te­grated : con­sis­tent b : hav­ing clar­ity or in­tel­ligi­bil­ity : un­der­stand­able

So, un­der­stand­able and con­sis­tent: a uni­verse which philos­o­phy, math­e­mat­ics and sci­ence can ap­ply to in any mean­ingful way.

• The (“Bayesian”) frame­work ex­plored in these es­says re­places the two Carte­sian op­tions, af­fir­ma­tion and de­nial, by a con­tinuum of judg­men­tal prob­a­bil­ities in the in­ter­val from 0 to 1, end­points in­cluded, or—what comes to the same thing—a con­tinuum of judg­men­tal odds in the in­ter­val from 0 to in­finity, end­points in­cluded. Zero and 1 are prob­a­bil­ities no less than 12 and 99100 are. Prob­a­bil­ity 1 cor­re­sponds to in­finite odds, 1:0. That’s a rea­son for think­ing in terms of odds: to re­mem­ber how mo­men­tous it may be to as­sign prob­a­bil­ity 1 to a hy­poth­e­sis.”

Richard Jeffrey, “Prob­a­bil­ity and the art of judge­ment”.

I leave it as an ex­er­cise to cor­rectly state the re­la­tion­ships be­tween Eliezer’s ar­ti­cle, the Jeffrey quote, and the value of P(A|A).

(Note: Jeffrey is not to be con­fused with Jeffreys, al­though both were Bayesian prob­a­bil­ity the­o­rists.)

• In­ter­est­ing Log-Odds pa­per by Brian Lee and Ja­cob San­ders, Novem­ber 2011.

• “When you work in log odds, the dis­tance be­tween any two de­grees of un­cer­tainty equals the amount of ev­i­dence you would need to go from one to the other. That is, the log odds gives us a nat­u­ral mea­sure of spac­ing among de­grees of con­fi­dence.”

That ob­ser­va­tion is so use­ful and in­tu­ition friendly it prob­a­bly de­serves it’s own blog post, and a promi­nent place in your book.

• 4 Jan 2013 8:32 UTC
0 points

For­give me if this sounds con­de­scend­ing, but isn’t say­ing “0 and 1 are not prob­a­bil­ities be­cause they won’t let you up­date your knowl­edge” ba­si­cally the same as say­ing “you can’t know some­thing be­cause know­ing makes you un­able to learn”? If we as­sign tau­tolo­gies as hav­ing prob­a­bil­ity 1, then any­thing re­ducible to a tau­tol­ogy should have prob­a­bil­ity 1 (and similarly, all con­tra­dic­tions and things re­ducible to con­tra­dic­tions should have prob­a­bil­ity 0). For any ar­bi­trar­ily large N, if you put 2 ap­ples next to 2 ap­ples and re­peat the test N times, you’ll get 4 ap­ples N out of N times, no less (dis­count­ing molec­u­lar break­downs in the ap­ples or other pos­si­ble in­terfer­ences).

• You shouldn’t as­sign tau­tolo­gies prob­a­bil­ity 1 ei­ther be­cause your no­tion of what a tau­tol­ogy is might be a hal­lu­ci­na­tion.

• This con­fuses ob­ject level and meta level. In prob­a­bil­ity the­ory, P(-A|A) = 0 and P(A|A) = 1, how­ever un­cer­tain you may be about Cox’s the­o­rem, or about whether you are ac­tu­ally think­ing about the same A each time it ap­pears in those for­mu­las. No-one, as far as I know, has ever con­structed a the­ory of prob­a­bil­ity in which these are as­signed any­thing else but 0 and 1. That is not to say that it can­not be done, only that it has not been done. Un­til that is done, 0 and 1 are prob­a­bil­ities.

The ti­tle of the ar­ti­cle is a rhetor­i­cal flour­ish to con­vey the idea elab­o­rated in its body, that to as­sert a prob­a­bil­ity, as a mea­sure of be­lief, of 0 or 1 is to as­sert that no pos­si­ble ev­i­dence could up­date that be­lief, that 0 and 1 are prob­a­bil­ities that you should not find your­self as­sign­ing to mat­ters about which there could be any real dis­pute, and to sug­gest odds ra­tios or their log­a­r­ithms as a bet­ter con­cept when deal­ing with prac­ti­cal mat­ters as­so­ci­ated with very low or very high prob­a­bil­ities. There is a very large differ­ence be­tween say­ing that the prob­a­bil­ity of win­ning a lot­tery is tiny and say­ing that it can­not hap­pen at all; with enough par­ti­ci­pants it is al­most cer­tain to hap­pen to some­one. That differ­ence is made clear by the log-odds scale, which puts the chance of a lot­tery ticket at 60 or more deci­bels be­low zero, not in­finitely far be­low. In a world with 7 billion peo­ple, billion-to-1 chances hap­pen ev­ery day.

As an ex­am­ple of even tinier prob­a­bil­ities which are still de­tectably differ­ent from zero, con­sider a typ­i­cal com­puter. A billion tran­sis­tors in its CPU, clocked a billion times a sec­ond, run­ning for a con­ve­niently round length of time, a mil­lion sec­onds, which is about 12 days. Com­put­ers these days can eas­ily do that with­out a sin­gle hard­ware er­ror, which means that for ev­ery one of a mil­lion billion billion switch­ing events, a tran­sis­tor opened or closed ex­actly as de­signed. A mil­lion billion billion is about 1.5 times Avo­gadro’s num­ber. The cor­re­spond­ing log-odds is −240 deci­bels. And yet hard­ware glitches can still hap­pen.

And P(A|A) is still 1, not any finite num­ber of deci­bels.

• So you are say­ing that state­ment “0 and 1 are not prob­a­bil­ities” has prob­a­bil­ity of 1?

• Nope. He’s say­ing that based on his best anal­y­sis, it ap­pears to be the case.

• O = (P /​ (1 - P))

prob­a­bil­ities and odds are isomorphic

This is un­defined for P = 1. If you claim that that func­tion is a real-val­ued bi­jec­tion be­tween prob­a­bil­ities and odds then P = 1 doesn’t work so you’re beg­ging the ques­tion. Always take care to not di­vide by zero.

Whether or not real-world events can have a prob­a­bil­ity of 0 or 1 is a differ­ent ques­tion than “are 0 and 1 prob­a­bil­ities?”. They most cer­tainly are.

• If I roll a die, then one of the events that can hap­pen will hap­pen. That’s just say­ing that if S is my sam­ple space, then P(S) = 1. Similarly, P(~S) = 0, which is just say­ing that im­pos­si­ble things won’t hap­pen. The former state­ment is an ax­iom in the stan­dard math­e­mat­i­cal treat­ments of the sub­ject. Th­ese state­ments may be triv­ial, but I dis­trust any math­e­mat­ics that can’t han­dle triv­ial cases.

Re­ject­ing 1 as a prob­a­bil­ity would be catas­trophic when you’re deal­ing with dis­crete spaces. If you’re the sort to re­ject in­finity, then it would fol­low that all prob­a­bil­ity spaces are dis­crete. At that point prob­a­bil­ity loses its rigor. Prefer­ence for odds or log odds just means that you have to live with us­ing the ex­tended re­als with spe­cial con­ven­tions for the in­fini­ties.

• You can re­ject in­finity with­out be­ing able to enu­mer­ate ev­ery pos­si­bil­ity. Your sam­ple space will never prac­ti­cally con­tain all the pos­si­bil­ities. (How many times has some­thing you never thought of hap­pened?) There are 2^(how­ever many bits of in­put come into my brain) pos­si­bil­ities for me to ob­serve for any pe­riod of time, and I can never think about all of them. Any ex­plicit sam­ple space is go­ing to miss pos­si­bil­ities. S is not well-defined.

I think the point of the post was that 1 shouldn’t be used for prac­ti­cal cases.

• Real life is com­plex enough that there is merit to the philo­soph­i­cal po­si­tion that one should re­frain from as­sign­ing prob­a­bil­ities of 0 or 1 to non­triv­ial events. Cat­e­gor­i­cally deny­ing that any event can have prob­a­bil­ity 0 or 1 is an ex­treme po­si­tion (which, ap­plied to it­self, would re­ally mean that a given event would have a high prob­a­bil­ity of not oc­cur­ring with prob­a­bil­ity 0 or 1).

From the purely math­e­mat­i­cal stand­point, re­mov­ing 0 and 1 from the set of pos­si­ble prob­a­bil­ities breaks the cur­rent foun­da­tions of the the­ory. The ex­is­tence of a sam­ple space con­tain­ing all pos­si­bil­ities does not de­pend on whether we hu­mans can com­pre­hend them all. If the sam­ple space of all pos­si­bil­ities ex­ists and P(S) < 1, then a lot of the­o­rems break down. That’s where you live with ideal­iza­tions like ab­solute cer­tainty (or al­most cer­tainty in the in­finite case) or else find some­thing other than prob­a­bil­ity to use to model the real world.

• In the­ory, if you could list ev­ery pos­si­ble ob­ser­va­tion you could make, that will have a 1 prob­a­bil­ity. It would take in­finite time, be­cause the fol­low­ing class of out­comes:

my brain band­width is in­creased to X bits, and X ran­dom bits are my next input

has an in­finite car­di­nal­ity. I could get into how Godel means you can’t even in prin­ci­ple de­scribe all pos­si­ble out­comes in a finite amount of space, even by refer­enc­ing classes like I did, but I’ll leave that up to you.

There was a sug­gested fix to your prob­lem in the post, why isn’t that good enough for you?

If you made a mag­i­cal sym­bol to stand for “all pos­si­bil­ities I haven’t con­sid­ered”, then you could marginal­ize over the events in­clud­ing this mag­i­cal sym­bol, and ar­rive at a mag­i­cal sym­bol “T” that stands for in­finite cer­tainty.

Sounds like he agrees that S has prob­a­bil­ity 1.

Note: I agree that the way he “proves” the claim is not very good. He ba­si­cally tries to switch your in­tu­ition by switch­ing the word­ing of the ques­tion. Not too rigor­ous.

• When I say that the pos­si­bil­ities can be listed in prin­ci­ple, what I mean is that there some set S that con­tains them and make no refer­ence to any prac­ti­cal prob­lems with de­scribing or stor­ing its el­e­ments. Like the points and lines of ge­om­e­try, it’s a Pla­tonic ideal­iza­tion.

There was a sug­gested fix to your prob­lem in the post, why isn’t that good enough for you?

Be­cause talk of mag­i­cal sym­bols is a good sign that the pas­sage was meant to ridicule the use of in­finity. The very next para­graph seeks to ex­punge such “mag­i­cal sym­bols” from prob­a­bil­ity the­ory.

• If he has a rigor­ous way to ground prob­a­bil­ity the­ory with­out 0 and 1, I’m fine with it. He seemed to be say­ing that he wishes there was such a way, but un­til some­one de­vel­ops one, he’s stuck with mag­i­cal sym­bols. He ac­knowl­edges all your prob­lems in the end of the post.

• This ar­ti­cle is largely in­co­her­ent. The main jus­tifi­ca­tion is the abuse of an in­valid trans­for­ma­tions: y=x/​(1-x) is not the bi­jec­tion that he as­serts it is, be­cause it’s not a func­tion that maps [0,1] onto R. It’s a func­tion that maps [0,1] onto [1,\intfy] as a sub­set of the topolog­i­cal clo­sure of R. And that’s okay, but you can’t say “well I don’t like the topolog­i­cal clo­sure of R, so I’ll just use R and claim that 1 is where the prob­lem is.”

Ad­di­tion­ally, his dis­cus­sion of log odds and such is perfectly fine, but ig­nores the fact that there are places where you do need to have an odds of 0:1, or a log odds of nega­tive in­finity. Prob­a­bil­ity the­ory stops work­ing when you throw out 0 and 1, it’s as sim­ple as that.

Even if you don’t want to han­dle tau­tolo­gies or con­tra­dic­tions, there are other ways to get P(X)=0 or 1. The prob­a­bil­ity that a real num­ber cho­sen uniformly from the real in­ter­val [0,1] is 0. It has to be. It’s a prov­able fact un­der ZFC and to de­cide oth­er­wise is to say that you’re more at­tached to the idea of 0 and 1 not be­ing prob­a­bil­ities than you are to the fact that math­e­mat­ics is con­sis­tent and if you re­ally be­lieve that, well, there’s ab­solutely noth­ing I have to say to you.

This is one of those situ­a­tions where EY just demon­strates he knows very lit­tle math­e­mat­ics.

• y=x/​(1-x) is not the bi­jec­tion that he as­serts it is, [...]. It’s a func­tion that maps [0,1] onto [1,\intfy] as a sub­set of the topolog­i­cal clo­sure of R.

How is that not a bi­jec­tion? Speci­fi­cally, a bi­jec­tion be­tween the sets and , which seems ex­actly to be the claim EY is mak­ing.

On a broader point, EY was not call­ing into ques­tion the cor­rect­ness or con­sis­tency of math­e­mat­i­cal con­cepts or claims but whether they have any use­ful mean­ing in re­al­ity. He was not talk­ing about the map, he was talk­ing about the ter­ri­tory and how we may im­prove the map to bet­ter re­flect the ter­ri­tory.

• As some­one who doesn’t know much be­yond ba­sic statis­tics, in what way are 0 or 1 prob­a­bil­ities? Isn’t it just ax­io­matic truth at that point? In that sense say­ing zero and one are prob­a­bil­ities is just say­ing ‘cer­tain’ or ‘im­pos­si­ble’ as far as I un­der­stand it. Si­tu­a­tions where an event will definitely or definitely not oc­cur doesn’t seem to be con­sis­tent with the idea of ran­dom­ness which I’ve un­der­stood prob­a­bil­ity to re­volve around.

I sup­pose the al­ter­na­tive would be that we’d have to as­sume ev­ery math­e­mat­i­cal proof has in­finite ev­i­dence if we wanted to get any­where pro­duc­tive- af­ter all ax­ioms are as­sumed to be true. It doesn’t make much sense to need ev­i­dence in that sce­nario- ex­cept per­haps the prob­a­bil­ity of er­ror and mis­take? That isn’t par­tic­u­larly calcu­la­ble and would ac­tu­ally change from per­son to per­son.

Us­ing one and zero makes sense to me as a mat­ter of as­sumed or proven truths, but I’m still un­sure how that makes it a prob­a­bil­ity.

• For­mally, prob­a­bil­ity is defined via ar­eas. The ba­sic idea is that the prob­a­bil­ity of pick­ing an el­e­ment from a set A out of a set B is the ra­tio of the ar­eas of A to B, where “area” can be defined not only for things like squares but also things like lines, or ac­tu­ally al­most ev­ery* sub­set of R. So, lets say you want to ran­domly se­lect a real num­ber from the in­ter­val [0,1] and want to know the odds it falls in a set, S. The area of [0,1] is 1, so the an­swer is just the area of S.

If S={0}, then S has area zero. If S=[0,1), then S has area 1. Not only are both of these the­o­ret­i­cal pos­si­bil­ities, they are prac­ti­cal ones too. There are real world ex­am­ples of prob­a­bil­ity zero events (the only one that comes to mind in­volves QM though so I don’t want to bother with the de­tails).

Now, no­tice that this isn’t the same thing as “im­pos­si­ble”. In­stead, it means more like “it won’t hap­pen I promise even by the time the uni­verse ends”. The way I tend to think about prob­a­bil­ity zero events is that they are so un­likely they are be­yond the reach of the prin­ci­ple that as the num­ber of tri­als in­creases, events be­come ex­pected. For any nonzero prob­a­bil­ity, there is a num­ber of tri­als, n, such that once you do it n times the ex­pected value be­comes greater than 1. That’s not the case with prob­a­bil­ity zero events. Prob­a­bil­ity 1 events can then be thought of as the nega­tion of prob­a­bil­ity 0 events.

*not ac­tu­ally “al­most ev­ery” in a for­mal sense, but “al­most any” in a “un­less you go try to build a set that you can’t mea­sure it prob­a­bly has a well defined area” sense

• That seems a solid enough ex­pla­na­tion, but how can some­thing of prob­a­bil­ity zero have a chance to oc­cur? How then do you rep­re­sent an im­pos­si­ble out­come? It seems like oth­er­wise ‘zero’ is equiv­a­lent to ‘ab­surdly low’. That doesn’t quite jive with my un­der­stand­ing.

• I think one of the clear­est ex­po­si­tions on these is­sues is ET Jaynes. The first three chap­ters (which is some of the rele­vant part) can be found at http://​​bayes.wustl.edu/​​etj/​​prob/​​book.pdf.

• Im­pos­si­ble things also have a prob­a­bil­ity of zero. I to­tally un­der­stand that this seems a bit un­in­tu­itive, and the un­der­ly­ing struc­ture (which in­cludes things like in­fini­ties of differ­ent sizes) is gen­er­ally pretty un­in­tu­itive at first. Which is kinda just say­ing “sorry, I can’t ex­plain the in­tu­ition,” which is un­for­tu­nately true.

• I’m just go­ing to think of it as tak­ing the limit as ev­i­dence ap­proaches in­finity. Be­cause a prob­a­bil­ity next to zero and zero are iden­ti­cal, zero then is a prob­a­bil­ity?

• Si­tu­a­tions where an event will definitely or definitely not oc­cur doesn’t seem to be con­sis­tent with the idea of ran­dom­ness which I’ve un­der­stood prob­a­bil­ity to re­volve around.

“Event” is a very broad no­tion. Let’s say, for ex­am­ple, that I roll two dice. The sam­ple space is just a col­lec­tion of pairs (a, b) where “a” is what die 1 shows and “b” is what die 2 shows. An event is any sub-col­lec­tion of the sam­ple space. So, the event that the num­bers sum to 7 is the col­lec­tion of all such pairs where a + b = 7. The prob­a­bil­ity of this event is sim­ply the frac­tion of the sam­ple space it oc­cu­pies.

If I rol­led eight dice, then they’ll never sum to seven and I say that that event oc­curs with prob­a­bil­ity 0. If I se­cretly rol­led an un­known num­ber of dice, you could rea­son­ably ask me the prob­a­bil­ity that they sum to seven. If I an­swer “0”, that just means that I rol­led more than one and fewer than eight dice. It doesn’t make the pro­cess less ran­dom nor the ques­tion less rea­son­able.

If you treat an event as some ques­tion you can ask about the re­sult of a ran­dom pro­cess, then 1 and 0 make a lot more sense as prob­a­bil­ities.

For the math­e­mat­i­cal the­ory of prob­a­bil­ity, there are plenty of tech­ni­cal rea­sons why you want to re­tain 1 and 0 as prob­a­bil­ities (and once you get into con­tin­u­ous dis­tri­bu­tions, it turns out that prob­a­bil­ity 1 just means “al­most cer­tain”).

• This is what I meant by some­thing be­ing a proven truth- within the rules set one can find out­comes which are ax­io­mat­i­cally im­pos­si­ble or nec­es­sary. The pro­cess it­self may be ran­dom, but call­ing it ran­dom when some­thing im­pos­si­ble didn’t hap­pen seems odd to me. The very idea that 1 may be not-quite-cer­tain is more than a lit­tle baf­fling, and I sus­pect is the heart of the is­sue.

• The very idea that 1 may be not-quite-cer­tain is more than a lit­tle baf­fling, and I sus­pect is the heart of the is­sue.

If 1 isn’t quite cer­tain then nei­ther is 0 (if some­thing hap­pens with prob­a­bil­ity 1, then the prob­a­bil­ity of it not hap­pen­ing is 0). It’s one of those things that pops up when deal­ing with in­finity.

It’s best illus­trated with an ex­am­ple. Let’s say we play a game where we flip a coin and I pay you $1 if it’s heads and you pay me$1 if it’s tails. With prob­a­bil­ity 1, one of us will even­tu­ally go broke (see Gam­bler’s ruin). It’s easy think of a se­quence of coin flips where this never hap­pens; for ex­am­ple, if heads and tails al­ter­nated. The the­ory holds that such a se­quence oc­curs with prob­a­bil­ity 0. Yet this does not make it im­pos­si­ble.

It can be thought of as the re­sult of a limit­ing pro­cess. If I looked at se­quences of N of coin flips, counted the ones where no one went broke and di­vided this by the to­tal num­ber of pos­si­ble se­quences, then as I let N go to in­finity this ra­tio would go to zero. This event oc­cu­pies an re­gion with area 0 in the sam­ple space.

• Eliezer isn’t ar­gu­ing with the math­e­mat­ics of prob­a­bil­ity the­ory. He is say­ing that in the sub­jec­tive sense, peo­ple don’t ac­tu­ally have ab­solute cer­tainty. This would mean that math­e­mat­i­cal prob­a­bil­ity the­ory is an im­perfect for­mal­iza­tion of peo­ple’s sub­jec­tive de­grees of be­lief. It would not nec­es­sar­ily mean that it is im­pos­si­ble in prin­ci­ple to come up with a bet­ter for­mal­iza­tion.

• Eliezer isn’t ar­gu­ing with the math­e­mat­ics of prob­a­bil­ity the­ory. He is say­ing that in the sub­jec­tive sense, peo­ple don’t ac­tu­ally have ab­solute cer­tainty.

Errr… as I read EY’s post, he is cer­tainly talk­ing about the math­e­mat­ics of prob­a­bil­ity (or about the for­mal frame­work in which we op­er­ate on prob­a­bil­ities) and not about some “sub­jec­tive sense”.

The claim of “peo­ple don’t ac­tu­ally have ab­solute cer­tainty” looks iffy to me, any­way. The im­me­di­ate two ques­tions that come to mind are (1) How do you know? and (2) Not even a sin­gle hu­man be­ing?

• Of course if no one has ab­solute cer­tainty, this very fact would be one of the things we don’t have ab­solute cer­tainty about. This is en­tirely con­sis­tent.

• If we’re ask­ing what the au­thor “re­ally meant” rather than just what would be cor­rect, it’s on record.

The ar­gu­ment for why zero and one are not prob­a­bil­ities is not, “All ob­jects which are spe­cial cases should be cast out of math­e­mat­ics, so get rid of the real zero be­cause it re­quires a spe­cial case in the field ax­ioms”, it is, “ce­teris paribus, can we do this with­out the spe­cial case?” and a bit of fur­ther in­tu­ition about how 0 and 1 are the equiv­a­lents of in­finite prob­a­bil­ities, where do­ing our calcu­la­tions with­out in­fini­ties when pos­si­ble is ce­teris paribus re­garded as a good idea by cer­tain sorts of math­e­mat­i­ci­ans. E.T. Jaynes in “Prob­a­bil­ity The­ory: The Logic of Science” shows how many prob­a­bil­ity-the­o­retic er­rors are com­mit­ted by peo­ple who as­sume limits di­rectly into their calcu­la­tions, with­out first show­ing the finite calcu­la­tion and then fi­nally tak­ing its limit. It is not un­rea­son­able to won­der when we might get into trou­ble by us­ing in­finite odds ra­tios. Fur­ther­more, real hu­man be­ings do seem to of­ten do very badly on ac­count of claiming to be in­finitely cer­tain of things so it may be prag­mat­i­cally im­por­tant to be wary of them.

I… can’t re­ally recom­mend read­ing the en­tire thread at the link, it’s kind of flame-war-y and not very illu­mi­nat­ing.

• I think the is­sue at hand is that 0 and 1 aren’t spe­cial cases at all, but very im­por­tant for the math of prob­a­bil­ity the­ory to work (try and con­struct a prob­a­bil­ity mea­sure where some sub­set doesn’t have prob­a­bil­ity 1 or 0).

This is in­cred­ibly nec­es­sary for the math­e­mat­i­cal idea of prob­a­bil­ity ,and EY seems to be con­fus­ing “are 0 and 1 prob­a­bil­ities rele­vant to Bayesian agents?” with “are 0 and 1 prob­a­bil­ities?” (yes, they are, un­avoid­ably, not as a spe­cial case!).

• It seems that EY po­si­tion boils down to

Prag­mat­i­cally speak­ing, the real ques­tion for peo­ple who are not AI pro­gram­mers is whether it makes sense for hu­man be­ings to go around declar­ing that they are in­finitely cer­tain of things. I think the an­swer is that it is far men­tally healthier to go around think­ing of things as hav­ing ‘tiny prob­a­bil­ities much larger than one over googol­plex’ than to think of them be­ing ‘im­pos­si­ble’.

And that’s a weak claim. EY’s ideas of what is “men­tally healthier” are, ba­si­cally, his per­sonal prefer­ences. I, for ex­am­ple, don’t find any men­tal health benefits in think­ing about one over googol­plex prob­a­bil­ities.

• Cromwell’s Rule is not EY’s in­ven­tion, and rel­a­tively un­con­tro­ver­sial for em­piri­cal propo­si­tions (as op­posed to tau­tolo­gies or the like).

If you don’t ac­cept treat­ing prob­a­bil­ities as be­liefs and vice versa, then this whole con­ver­sa­tion is just a re­ally long and un­nec­es­sar­ily cir­cuitous way to say “re­mem­ber that you can be wrong about stuff”.

• The part that is new com­pared to Cromwell’s rule is that Yud­kowsky doesn’t want to give prob­a­bil­ity 1 to log­i­cal state­ments (53 is a prime num­ber).

Be­cause he doesn’t want to treat 1 as a prob­a­bil­ity, you can’t ex­pect com­plete sets of events to have to­tal prob­a­bil­ity 1, de­spite them be­ing tau­tolo­gies. Be­cause he doesn’t want prob­a­bil­ity 0, how do you han­dle the empty set? How do you as­sign prob­a­bil­ities to state­ments like “A and B” where A and B are log­i­cal ex­clu­sive? (the coin lands heads AND the coin lands tails).

Re­mov­ing 0 and 1 from the math of prob­a­bil­ity breaks most of the stan­dard ma­nipu­la­tions. Again, it’s best to just say “be care­ful with 0 and 1 when work­ing with odds ra­tios.”

• No­body is say­ing EY in­vented Cromwell’s Rule, that’s not the is­sue.

The is­sue is that “0 and 1 are not use­ful sub­jec­tive cer­tain­ties for a Bayesian agent” is a very differ­ent state­ment than “0 and 1 are not prob­a­bil­ities at all”.

• You’re right, I mis­read your sen­tence about “his per­sonal prefer­ences” as refer­ring to the whole claim, rather than speci­fi­cally the part about what’s “men­tally healthy”. I don’t think we dis­agree on the ob­ject level here.

• The claim of “peo­ple don’t ac­tu­ally have ab­solute cer­tainty” looks iffy to me, any­way. The im­me­di­ate two ques­tions that come to mind are (1) How do you know? and (2) Not even a sin­gle hu­man be­ing?

The way I view that state­ment is: “In our for­mal­iza­tion, agents with ab­solutely cer­tain be­liefs can­not change those be­liefs, we want our for­mal­iza­tion to cap­ture our in­tu­itive sense of how an ideal agent would up­date its be­liefs, a for­mal­iza­tion with a qual­ity of fa­nat­i­cism does not cap­ture our in­tu­itive sense of how an ideal agent would up­date its be­liefs, there­fore we do not want a qual­ity of fa­nat­i­cism.”

And what state of the world would cor­re­spond to the state­ment “Some peo­ple have ab­solute cer­tainty.” ? Do you think that we can take some highly ad­vanced and en­tirely fic­tional neu­roimag­ing tech­nol­ogy, look at a brain and mean­ingfully say, “There’s a be­lief with prob­a­bil­ity 1.” ?

And on the other hand, I’m not afraid to talk about folk cer­tainty, where the prop­er­ties of an ideal math­e­mat­i­cal sys­tem are less rele­vant, where ev­ery­one can re­main bliss­fully log­i­cally un­cer­tain to the fact that be­liefs with prob­a­bil­ity 1 and 0 im­ply un­de­sir­able con­se­quences in for­mal sys­tems that pos­sess them, and say things like “I be­lieve that ab­solutely.” I am not afraid to say some­thing like, “That per­son will not stop be­liev­ing that for as long as he lives,” and mean that I pre­dict with high con­fi­dence that that per­son will not stop be­liev­ing that for as long as he lives.

And once you be­lieve that the for­mal­iza­tion is try­ing to cap­ture our in­tu­itive sense of an ideal agent, and de­cide whether or not that qual­ity of fa­nat­i­cism cap­tures it, and de­cide whether or not you’re go­ing to be a stick­ler about folk lan­guage, then I don’t think that any ques­tion or con­fu­sion around that claim re­mains.

• Peo­ple are not “ideal agents”. If you speci­fi­cally con­struct your for­mal­iza­tion to fit your ideas of what an ideal agent should and should not be able to do, this for­mal­iza­tion will be a poor fit to ac­tual, live hu­man be­ings.

So ei­ther you make a sys­tem for ideal agents—in which case you’ll still run into some prob­lems be­cause, as has been pointed out up­thread, stan­dard prob­a­bil­ity math stops work­ing if you dis­al­low ze­ros and ones—or you make a sys­tem which is ap­pli­ca­ble to our im­perfect world with im­perfect hu­mans.

• I don’t see why both aren’t use­ful. If you want a de­scrip­tive model in­stead of a nor­ma­tive one, try prospect the­ory.

I just don’t see this ar­ti­cle as an ax­iom that says prob­a­bil­ities of 0 and 1 aren’t al­lowed in prob­a­bil­ity the­ory. I see it as a warn­ing not to put 0s and 1s in your AI’s prior. You’re not chang­ing the math so much as pick­ing good pri­ors.

• I think he’s just ac­knowl­edg­ing the minute(?) pos­si­bil­ity that our ap­par­ently flawless rea­son­ing could have a blind spot. We could be in a Ma­trix, or have some­thing tam­per­ing with our minds, etcetera, such that the im­plied as­ser­tion:

If this ap­pears ab­solutely cer­tain to me

Then it must be true

is in­defen­si­ble.

• There are two differ­ent things.

David_Bolin said (em­pha­sis mine): “He is say­ing that in the sub­jec­tive sense, peo­ple don’t ac­tu­ally have ab­solute cer­tainty.” I am in­ter­pret­ing this as “peo­ple never sub­jec­tively feel they have ab­solute cer­tainty about some­thing” which I don’t think is true.

You are say­ing that from an ex­ter­nal (“ob­jec­tive”) point of view, peo­ple can not (or should not) be ab­solutely sure that their be­liefs/​con­clu­sions/​maps are true. This I eas­ily agree with.

• It should prob­a­bly be defined by cal­ibra­tion: do some peo­ple have a type of be­lief where they are always right?

• Self-refer­en­tial and an­thropic things would prob­a­bly qual­ify, e.g. “I be­lieve I ex­ist”.

• You can phrase state­ments of log­i­cal de­duc­tion such that they have no premises and only con­clu­sions. If we let S be the set of log­i­cal prin­ci­ples un­der which our log­i­cal sys­tem op­er­ates and T be some sen­tence that en­tails Y, then S AND T im­plies Y is some­thing that I have ab­solute cer­tainty in, even if this world is an illu­sion, be­cause the premise of the im­pli­ca­tion con­tains all the rules nec­es­sary to de­rive the re­sult.

A less for­mal ex­am­ple of this would be the sen­tence: If the rules of logic as I know them hold and the ax­ioms of math­e­mat­ics are true, then it is the case that 2+2=4

• A real math­e­mat­i­cian got in a de­bate with EY over this post, and made some re­ally good points: https://​​np.red­dit.com/​​r/​​bad­math­e­mat­ics/​​com­ments/​​2bazyc/​​0_and_1_are_not_prob­a­bil­ities_any_more_than/​​cj43y8k

Maybe this doesn’t stand up math­e­mat­i­cally, but I re­ally like the in­tu­ition of log odds in­stead of prob­a­bil­ity. And this post ex­plained it quite well. And the main point that you shouldn’t be­lieve in ab­solute cer­tain­ties is still true. An ideal AI us­ing prob­a­bil­ity the­ory would prob­a­bly use log odds, and not have a 0 or 1.

• /​r/​bad­math­e­mat­ics is shut­tered now, ap­par­ently.

“This com­mu­nity has be­come some­thing of a shit­show. Set­ting bad­math to pri­vate while we try to de­cide on a way for­ward with the sub­red­dit.”

Oh no, re­ally? Who would have thought that the sorts of peo­ple who have learned to en­joy in­dulging con­tempt would even­tu­ally turn on each other.

I re­ally wanted to see that ar­gu­ment though, tell me, to what ex­tent was it an ar­gu­ment? Cause I feel like if a per­son in our school wanted to set­tle this, they’d just dis­t­in­guish the prac­ti­cal cases EY’s talk­ing about from the math­e­mat­i­cal cases the con­ver­sants are talk­ing about and ev­ery­one would im­me­di­ately wake up and re­al­ise how im­ma­te­rial the dis­agree­ment always was (though some of them might de­cide to be mad about that in­stead), but also, maybe Eleizer kind of likes get­ting peo­ple riled up about this so maybe dis­pers­ing the con­fu­sion never crossed his mind. Con­tempt vam­pires meet con­tempt ben­der. Kisme­sis is forged.

I shouldn’t con­tribute to this “fight”, but I can’t re­sist. I’d have recom­mended he bring up how the brunt of the causal net­work for­mal­iza­tion ex­plic­itly dis­al­lows cer­tain or im­pos­si­ble events on the math level once you cross into a cer­tain level of so­phis­ti­ca­tion (I for­get where the thresh­old was, but I re­mem­ber think­ing “well the bayesian net­works that sup­ports 0s and 1s sounds pretty darn limited and I’m go­ing to give up on them just as my el­ders ad­vised.”)

Ul­ti­mately, the “can’t be 0 or 1” re­stric­tion is pretty ob­vi­ously needed for a lot of the for­mu­las to work ro­bustly (you can’t even use the defi­ni­tion of con­di­tional prob­a­bil­ity with­out re­strict­ing the prior of the ev­i­dence! Cause there’s a di­vi­sion in it! There are lots of di­vi­sions in prob­a­bil­ity the­ory!)

So I pro­pose that we give a name to that re­stric­tion, and I offer the name “cre­dences”. (Cur­rently, it seems the word “cre­dence” is just as­signed to a bad over­load of “prob­a­bil­ity” that uses per­cent no­ta­tion in­stead of nor­mal range. I doubt any­one will miss it.)

A prob­a­bil­ity is a cre­dence iff it is nei­ther 0 nor 1. A prac­ti­cal real-world right and justly rad­i­cally skep­ti­cal bayesian rea­soner should prob­a­bly re­strict a large, well-delineated sub­set of its ev­i­dence weights to be­ing cre­dences.

And now we can talk about cre­dences and there’s no need for any more con­fu­sion, if we want.

• It’s back btw. If it ever goes down again you can prob­a­bly get it on way­back ma­chine. And yes the /​r/​bad* sub­red­dits are full of ter­rible academia snob­bery. Bad­math­e­mat­ics is the best of the bunch be­cause math­e­mat­ics is at least kind of ob­jec­tive. So they mostly talk about philos­o­phy of math­e­mat­ics.

The prob­lem is for­mal mod­els of prob­a­bil­ity the­ory have prob­lems with log­i­cal un­cer­tainty. You can’t as­sign a nonzero prob­a­bil­ity to a false log­i­cal state­ment. All the rea­son­ing about prob­a­bil­ity the­ory is around mod­el­ling un­cer­tainty in the un­kown ex­ter­nal world. This is an early at­tempt to think about log­i­cal un­cer­tainty. Which MIRI has now pub­lished pa­pers on and tried to for­mal­ize.

Just call­ing them “log odds” is fine and they are widely used in real work.

Btw what does “Re­sponse to pre­vi­ous ver­sion” mean? Was this ar­ti­cle sig­nifi­cantly ed­it­ted? It doesn’t seem so con­fronta­tional read­ing it now.

• We pub­lished new ver­sions of a lot of se­quences posts a few months ago. If you click on the “Re­sponse to pre­vi­ous ver­sion” text, you can read the origi­nal text that the com­ment was refer­ring to.

• Wait, these old posts have been ed­ited? I don’t see the “Re­sponse to pre­vi­ous ver­sion” link. I’d like to read the origi­nals, as they were writ­ten, in chronolog­i­cal or­der… there are other ways to con­sume the com­pendium if I so de­sired.

• Yeah, they were ed­ited as part of the pro­cess of com­piling Ra­tion­al­ity: AI to Zom­bies. Usu­ally that just in­volved adding some sources, clean­ing up some sen­tences and fix­ing some ty­pos.

The “Re­sponse to pre­vi­ous ver­sion” link is at the top of ev­ery com­ment that was posted on the pre­vi­ous ver­sion of the post. See here:

• I see it now. Is there some way to make the origi­nal ar­ti­cle the de­fault View? Or a link to the prior ver­sion at the top of the ar­ti­cle?

• You can click on the date-stamp at the top of the post and se­lect the ear­liest ver­sion from there.