Does Probability Theory Require Deductive or Merely Boolean Omniscience?

It is of­ten said that a Bayesian agent has to as­sign prob­a­bil­ity 1 to all tau­tolo­gies, and prob­a­bil­ity 0 to all con­tra­dic­tions. My ques­tion is… ex­actly what sort of tau­tolo­gies are we talk­ing about here? Does that in­clude all math­e­mat­i­cal the­o­rems? Does that in­clude as­sign­ing 1 to “Every bach­e­lor is an un­mar­ried male”?1 Per­haps the only tau­tolo­gies that need to be as­signed prob­a­bil­ity 1 are those that are Boolean the­o­rems im­plied by atomic sen­tences that ap­pear in the prior dis­tri­bu­tion, such as: “S or ~ S”.

It seems that I do not need to as­sign prob­a­bil­ity 1 to Fer­mat’s last con­jec­ture in or­der to use prob­a­bil­ity the­ory when I play poker, or try to pre­dict the color of the next ball to come from an urn. I must as­sign a prob­a­bil­ity of 1 to “The next ball will be white or it will not be white”, but Fer­mat’s last the­o­rem seems to be quite ir­rele­vant. Per­haps that’s be­cause these spe­cial­ized puz­zles do not re­quire suffi­ciently gen­eral prob­a­bil­ity dis­tri­bu­tions; per­haps, when I try to build a gen­eral Bayesian rea­soner, it will turn out that it must as­sign 1 to Fer­mat’s last the­o­rem.

Imag­ine a (com­pletely im­prac­ti­cal, ideal, and es­o­teric) first or­der lan­guage, who’s par­tic­u­lar sub­jects were dis­crete point-like re­gions of space-time. There can be an ar­bi­trar­ily large num­ber of points, but it must be a finite num­ber. This lan­guage also con­tains a long list of pred­i­cates like: is blue, is within the vol­ume of a car­bon atom, is within the vol­ume of an elephant, etc. and gen­er­ally any pred­i­cate type you’d like (in­clud­ing n place pred­i­cates).2 The atomic propo­si­tions in this lan­guage might look some­thing like: “5, 0.487, −7098.6, 6000s is Blue” or “(1, 1, 1, 1s), (-1, −1, −1, 1s) con­tains an elephant.” The first of these propo­si­tions says that a cer­tain point in space-time is blue; the sec­ond says that there is an elephant be­tween two points at one sec­ond af­ter the uni­verse starts. Pre­sum­ably, at least the de­no­ta­tional con­tent of most en­glish propo­si­tions could be ex­pressed in such a lan­guage (I think, math­e­mat­i­cal claims aside).

Now imag­ine that we col­lect all of the atomic propo­si­tions in this lan­guage, and as­sign a joint dis­tri­bu­tion over them. Maybe we choose max en­tropy, doesn’t mat­ter. Would do­ing so re­ally re­quire us to as­sign 1 to ev­ery math­e­mat­i­cal the­o­rem? I can see why it would re­quire us to as­sign 1 to ev­ery tau­tolog­i­cal Boolean com­bi­na­tion of atomic propo­si­tions [for in­stance: “(1, 1, 1, 1s), (-1, −1, −1, 1s) con­tains an elephant OR ~((1, 1, 1, 1s), (-1, −1, −1, 1s) con­tains an elephant)], but that would fol­low nat­u­rally as a con­se­quence of filling out the joint dis­tri­bu­tion. Similarly, all the Boolean con­tra­dic­tions would be as­signed zero, just as a con­se­quence of filling out the joint dis­tri­bu­tion table with a set of re­als that sum to 1.

A similar ar­gu­ment could be made us­ing in­tu­itions from al­gorith­mic prob­a­bil­ity the­ory. Imag­ine that we know that some data was pro­duced by a dis­tri­bu­tion which is out­put by a pro­gram of length n in a bi­nary pro­gram­ming lan­guage. We want to figure out which dis­tri­bu­tion it is. So, we as­sign each bi­nary string a prior prob­a­bil­ity of 2^-n. If the lan­guage al­lows for com­ments, then sim­pler dis­tri­bu­tions will be out­put by more pro­grams, and we will add the prob­a­bil­ity of all pro­grams that print that dis­tri­bu­tion.3 Sure, we might need an or­a­cle to figure out if a given pro­gram out­puts any­thing at all, but we would not need to as­sign a prob­a­bil­ity of 1 to Fer­mat’s last the­o­rem (or at least I can’t figure out why we would). The data might be all of your sen­sory in­puts, and n might be Gra­ham’s num­ber; still, there’s no rea­son such a dis­tri­bu­tion would need to as­sign 1 to ev­ery math­e­mat­i­cal the­o­rem.

Con­clu­sion:

A Bayesian agent does not re­quire math­e­mat­i­cal om­ni­science, or log­i­cal (if that means any­thing more than Boolean) om­ni­science, but merely Boolean om­ni­science. All that Boolean om­ni­science means is that for what­ever atomic propo­si­tions ap­pear in the lan­guage (e.g., the lan­guage that forms the set of propo­si­tions that con­sti­tute the do­main of the prob­a­bil­ity func­tion) of the agent, any tau­tolog­i­cal Boolean com­bi­na­tion of those propo­si­tions must be as­signed a prob­a­bil­ity of 1, and any con­tra­dic­tory Boolean com­bi­na­tion of those propo­si­tions must be as­signed 0. As far as I can tell, the whole no­tion that Bayesian agents must as­sign 1 to tau­tolo­gies and 0 to con­tra­dic­tions comes from the fact that when you fill out a table of joint dis­tri­bu­tions (or fol­low the Ko­mol­gorov ax­ioms in some other way) all of the Boolean the­o­rems get a prob­a­bil­ity of 1. This does not im­ply that you need to as­sign 1 to Fer­mat’s last the­o­rem, even if you are rea­son­ing prob­a­bil­is­ti­cally in a lan­guage that is very ex­pres­sive.4

Some Ways To Prove This Wrong:

Show that a re­ally ex­pres­sive se­man­tic lan­guage, like the one I gave above, im­plies PA if you al­low Boolean op­er­a­tions on its atomic propo­si­tions. Alter­na­tively, you could show that Solomonoff in­duc­tion can ex­press PA the­o­rems as propo­si­tions with prob­a­bil­ities, and that it as­signs them 1. This is what I tried to do, but I failed on both oc­ca­sions, which is why I wrote this.


[1] There are also in­ter­est­ing ques­tions about the role of tau­tolo­gies that rely on syn­onymy in prob­a­bil­ity the­ory, and whether they must be as­signed a prob­a­bil­ity of 1, but I de­cided to keep it to math­e­mat­ics for the sake of this post.

[2] I think this lan­guage is ridicu­lous, and openly ad­mit it has next to no real world ap­pli­ca­tion. I stole the idea for the lan­guage from Car­nap.

[3] This is a slop­pily pre­sented ap­prox­i­ma­tion to Solomonoff in­duc­tion as n goes to in­finity.

[4] The ar­gu­ment above is not a math­e­mat­i­cal proof, and I am not sure that it is air­tight. I am post­ing this to the dis­cus­sion board in­stead of a full-blown post be­cause I want feed­back and crit­i­cism. !!!HOWEVER!!! if I am right, it does seem that folks on here, at MIRI, and in the Bayesian world at large, should start be­ing more care­ful when they think or write about log­i­cal om­ni­science.