# Does Probability Theory Require Deductive or Merely Boolean Omniscience?

It is of­ten said that a Bayesian agent has to as­sign prob­a­bil­ity 1 to all tau­tolo­gies, and prob­a­bil­ity 0 to all con­tra­dic­tions. My ques­tion is… ex­actly what sort of tau­tolo­gies are we talk­ing about here? Does that in­clude all math­e­mat­i­cal the­o­rems? Does that in­clude as­sign­ing 1 to “Every bach­e­lor is an un­mar­ried male”?1 Per­haps the only tau­tolo­gies that need to be as­signed prob­a­bil­ity 1 are those that are Boolean the­o­rems im­plied by atomic sen­tences that ap­pear in the prior dis­tri­bu­tion, such as: “S or ~ S”.

It seems that I do not need to as­sign prob­a­bil­ity 1 to Fer­mat’s last con­jec­ture in or­der to use prob­a­bil­ity the­ory when I play poker, or try to pre­dict the color of the next ball to come from an urn. I must as­sign a prob­a­bil­ity of 1 to “The next ball will be white or it will not be white”, but Fer­mat’s last the­o­rem seems to be quite ir­rele­vant. Per­haps that’s be­cause these spe­cial­ized puz­zles do not re­quire suffi­ciently gen­eral prob­a­bil­ity dis­tri­bu­tions; per­haps, when I try to build a gen­eral Bayesian rea­soner, it will turn out that it must as­sign 1 to Fer­mat’s last the­o­rem.

Imag­ine a (com­pletely im­prac­ti­cal, ideal, and es­o­teric) first or­der lan­guage, who’s par­tic­u­lar sub­jects were dis­crete point-like re­gions of space-time. There can be an ar­bi­trar­ily large num­ber of points, but it must be a finite num­ber. This lan­guage also con­tains a long list of pred­i­cates like: is blue, is within the vol­ume of a car­bon atom, is within the vol­ume of an elephant, etc. and gen­er­ally any pred­i­cate type you’d like (in­clud­ing n place pred­i­cates).2 The atomic propo­si­tions in this lan­guage might look some­thing like: “5, 0.487, −7098.6, 6000s is Blue” or “(1, 1, 1, 1s), (-1, −1, −1, 1s) con­tains an elephant.” The first of these propo­si­tions says that a cer­tain point in space-time is blue; the sec­ond says that there is an elephant be­tween two points at one sec­ond af­ter the uni­verse starts. Pre­sum­ably, at least the de­no­ta­tional con­tent of most en­glish propo­si­tions could be ex­pressed in such a lan­guage (I think, math­e­mat­i­cal claims aside).

Now imag­ine that we col­lect all of the atomic propo­si­tions in this lan­guage, and as­sign a joint dis­tri­bu­tion over them. Maybe we choose max en­tropy, doesn’t mat­ter. Would do­ing so re­ally re­quire us to as­sign 1 to ev­ery math­e­mat­i­cal the­o­rem? I can see why it would re­quire us to as­sign 1 to ev­ery tau­tolog­i­cal Boolean com­bi­na­tion of atomic propo­si­tions [for in­stance: “(1, 1, 1, 1s), (-1, −1, −1, 1s) con­tains an elephant OR ~((1, 1, 1, 1s), (-1, −1, −1, 1s) con­tains an elephant)], but that would fol­low nat­u­rally as a con­se­quence of filling out the joint dis­tri­bu­tion. Similarly, all the Boolean con­tra­dic­tions would be as­signed zero, just as a con­se­quence of filling out the joint dis­tri­bu­tion table with a set of re­als that sum to 1.

A similar ar­gu­ment could be made us­ing in­tu­itions from al­gorith­mic prob­a­bil­ity the­ory. Imag­ine that we know that some data was pro­duced by a dis­tri­bu­tion which is out­put by a pro­gram of length n in a bi­nary pro­gram­ming lan­guage. We want to figure out which dis­tri­bu­tion it is. So, we as­sign each bi­nary string a prior prob­a­bil­ity of 2^-n. If the lan­guage al­lows for com­ments, then sim­pler dis­tri­bu­tions will be out­put by more pro­grams, and we will add the prob­a­bil­ity of all pro­grams that print that dis­tri­bu­tion.3 Sure, we might need an or­a­cle to figure out if a given pro­gram out­puts any­thing at all, but we would not need to as­sign a prob­a­bil­ity of 1 to Fer­mat’s last the­o­rem (or at least I can’t figure out why we would). The data might be all of your sen­sory in­puts, and n might be Gra­ham’s num­ber; still, there’s no rea­son such a dis­tri­bu­tion would need to as­sign 1 to ev­ery math­e­mat­i­cal the­o­rem.

Con­clu­sion:

A Bayesian agent does not re­quire math­e­mat­i­cal om­ni­science, or log­i­cal (if that means any­thing more than Boolean) om­ni­science, but merely Boolean om­ni­science. All that Boolean om­ni­science means is that for what­ever atomic propo­si­tions ap­pear in the lan­guage (e.g., the lan­guage that forms the set of propo­si­tions that con­sti­tute the do­main of the prob­a­bil­ity func­tion) of the agent, any tau­tolog­i­cal Boolean com­bi­na­tion of those propo­si­tions must be as­signed a prob­a­bil­ity of 1, and any con­tra­dic­tory Boolean com­bi­na­tion of those propo­si­tions must be as­signed 0. As far as I can tell, the whole no­tion that Bayesian agents must as­sign 1 to tau­tolo­gies and 0 to con­tra­dic­tions comes from the fact that when you fill out a table of joint dis­tri­bu­tions (or fol­low the Ko­mol­gorov ax­ioms in some other way) all of the Boolean the­o­rems get a prob­a­bil­ity of 1. This does not im­ply that you need to as­sign 1 to Fer­mat’s last the­o­rem, even if you are rea­son­ing prob­a­bil­is­ti­cally in a lan­guage that is very ex­pres­sive.4

Some Ways To Prove This Wrong:

Show that a re­ally ex­pres­sive se­man­tic lan­guage, like the one I gave above, im­plies PA if you al­low Boolean op­er­a­tions on its atomic propo­si­tions. Alter­na­tively, you could show that Solomonoff in­duc­tion can ex­press PA the­o­rems as propo­si­tions with prob­a­bil­ities, and that it as­signs them 1. This is what I tried to do, but I failed on both oc­ca­sions, which is why I wrote this.

[1] There are also in­ter­est­ing ques­tions about the role of tau­tolo­gies that rely on syn­onymy in prob­a­bil­ity the­ory, and whether they must be as­signed a prob­a­bil­ity of 1, but I de­cided to keep it to math­e­mat­ics for the sake of this post.

[2] I think this lan­guage is ridicu­lous, and openly ad­mit it has next to no real world ap­pli­ca­tion. I stole the idea for the lan­guage from Car­nap.

[3] This is a slop­pily pre­sented ap­prox­i­ma­tion to Solomonoff in­duc­tion as n goes to in­finity.

[4] The ar­gu­ment above is not a math­e­mat­i­cal proof, and I am not sure that it is air­tight. I am post­ing this to the dis­cus­sion board in­stead of a full-blown post be­cause I want feed­back and crit­i­cism. !!!HOWEVER!!! if I am right, it does seem that folks on here, at MIRI, and in the Bayesian world at large, should start be­ing more care­ful when they think or write about log­i­cal om­ni­science.

• Sure, we might need an or­a­cle to figure out if a given pro­gram out­puts any­thing at all, but we would not need to as­sign a prob­a­bil­ity of 1 to Fer­mat’s last the­o­rem (or at least I can’t figure out why we would).

Fer­mat’s Last The­o­rem states that no three pos­i­tive in­te­gers a, b, and c can satisfy the equa­tion a^n + b^n = c^n for any in­te­ger value of n greater than two. Con­sider a pro­gram that iter­ates over all pos­si­ble val­ues of a, b, c, n look­ing for coun­terex­am­ples for FLT, then if it finds one, calls a sub­rou­tine that even­tu­ally prints out X (where X is your cur­rent ob­ser­va­tion). In or­der to do Solomonoff in­duc­tion, you need to query a halt­ing or­a­cle on this pro­gram. But know­ing whether this pro­gram halts or not is equiv­a­lent to know­ing whether FLT is true or false.

• Let’s for­get about the or­a­cle. What about the pro­gram that out­puts X only if 1 + 1 = 2, and else prints 0? Let’s call it A(1,1). The for­mal­ism re­quires that P(X|A(1,1)) = 1, and it re­quires that P(A(1,1)) = 2 ^-K(A(1,1,)), but does it need to know that “1 + 1 = 2” is some­how proven by A(1,1) print­ing X?

In ei­ther case, you’ve shown me some­thing that I ex­plic­itly doubted be­fore: one can prove any prov­able the­o­rem if they have ac­cess to a Solomonoff agent’s dis­tri­bu­tion, and they know how to make a pro­gram that prints X iff the­o­rem S is prov­able. All they have to do is check the prob­a­bil­ity the agent as­signs to X con­di­tional on that pro­gram.

• Awe­some. I’m pretty sure you’re right; that’s the most con­vinc­ing coun­terex­am­ple I’ve come across.

I have a weak doubt, but I think you can get rid of it:

let’s name the pro­gram FTL()

I’m just not sure this means that the the­o­rem it­self is as­signed a prob­a­bil­ity. Yes, I have an or­a­cle, but it doesn’t as­sign a prob­a­bil­ity to a pro­gram halt­ing; it tells me whether it halts or not. What the Solo­moff for­mal­ism re­quires is that “if (halts(FTL()) == true) then P(X|FTL()) = 1” and “if (halts(FTL()) == false) then P(X|FTL()) = 0″ and “P(FTL()) = 2^-K(FTL())”. Where in all this is the prob­a­bil­ity of Fer­mat’s last the­o­rem? Hav­ing an or­a­cle may im­ply know­ing whether or not FTL is a the­o­rem, but it does not im­ply that we must as­sign that the­o­rem a prob­a­bil­ity of 1. (Or maybe, it does and I’m not see­ing it.)

Edit: Come to think of it… I’m not sure there’s a rele­vant differ­ence be­tween know­ing whether a pro­gram that out­puts True iff the­o­rem S is prov­able will end up halt­ing, and as­sign­ing prob­a­bil­ity 1 to the­o­rem S. It does seem that I must as­sign 1 to state­ments of the form “A or ~ A” or else it won’t work; whereas if the the­o­rem S is is not in the do­main of our prob­a­bil­ity func­tion, noth­ing seems to go wrong.

In ei­ther case, this prob­a­bly isn’t the stan­dard rea­son for be­liev­ing in, or think­ing about log­i­cal om­ni­science be­cause the con­cept of log­i­cal om­ni­science is prob­a­bly older than Solomonoff in­duc­tion. (I am of course only re­al­iz­ing that in hind­sight; now that I’ve seen a pow­er­ful counter ex­am­ple to my ar­gu­ment.)

• Ba­si­cally the prob­lem is that a Bayesian should not be able to change its prob­a­bil­ities with­out new ev­i­dence, and if you as­sign a prob­a­bil­ity other than 1 to a math­e­mat­i­cal truth, you will run into prob­lems when you de­duce that it fol­lows of ne­ces­sity from other things that have a prob­a­bil­ity of 1.

• Why can’t the de­duc­tion be the ev­i­dence? If I start with a 50-50 prior that 4 is prime, I can then use the sub­se­quent ob­ser­va­tion that I’ve found a fac­tor to up­date down­wards. This feels like it re­lies on the rea­soner’s em­bed­ding though, so maybe it’s cheat­ing, but it’s not clear and non-con­fus­ing to me why it doesn’t count.

• How do you ex­press, Fer­mat’s last the­o­rem for in­stance, as a boolean com­bi­na­tion of the lan­guage I gave, or as a boolean com­bi­na­tion of pro­grams? Boolean alge­bra is not strong enough to de­rive, or even ex­press all of math.

edit: Let’s start sim­ple. How do you ex­press 1 + 1 = 2 in the lan­guage I gave, or as a boolean com­bi­na­tion of pro­grams?

• Prob­a­bil­ity that there are two elephants given one on the left and one on the right.

In any case, if your lan­guage can’t ex­press Fer­mat’s last the­o­rem then of course you don’t as­sign a prob­a­bil­ity of 1 to it, not be­cause you as­sign it a differ­ent prob­a­bil­ity, but be­cause you don’t as­sign it a prob­a­bil­ity at all.

• I agree. I am say­ing that we need not as­sign it a prob­a­bil­ity at all. Your solu­tion as­sumes that there is a way to ex­press “two” in the lan­guage. Also, the propo­si­tion you made is more like “one elephant and an­other elephant makes two elephants” not “1 + 1 = 2”.

I think we’d be bet­ter off try­ing to find a way to ex­press 1 + 1 = 2 as a boolean func­tion on pro­grams.

• I think we’d be bet­ter off try­ing to find a way to ex­press 1 + 1 = 2 as a boolean func­tion on pro­grams.

This goes into the “shit LW peo­ple say” col­lec­tion :-)