Using the universal prior for logical uncertainty (retracted)

(I’m not sure about any of this)

(Yup! Vanessa Kowalski pointed out a cru­cial er­ror, see her com­ment be­low. The idea about the uni­ver­sal prior pick­ing up all com­putable reg­u­lar­i­ties is un­proved. I’m leav­ing the post as it was, but please read it as a his­tor­i­cal doc­u­ment only.)

Abram Dem­ski had an old pa­per de­scribing a naive way to use the uni­ver­sal prior for log­i­cal un­cer­tainty. He pointed out some prob­lems with it, but I’m not sure the prob­lems are real. I think the ap­proach will work sur­pris­ingly well.

The idea is that, since the uni­ver­sal prior is just a prob­a­bil­ity dis­tri­bu­tion over bit se­quences, we don’t have to use it to pre­dict bits one by one. We can also up­date it on facts like “the sev­enth bit is 1” with­out men­tion­ing the pre­vi­ous six bits. To choose which bits to up­date on, we can me­chan­i­cally search through all pos­si­ble proofs in Peano ar­ith­metic (PA) by in­creas­ing length. When­ever we prove a sen­tence with Gödel num­ber N, we tell the prior that the Nth bit is 1; when­ever we dis­prove one, we tell the prior that the bit is 0. After each up­date, the prior’s opinions about all other bits (and thus all other sen­tences in PA) will shift, hope­fully in a mean­ingful way.

It’s easy to see that the prob­a­bil­ity of any prov­able sen­tence will con­verge to 1, be­cause at some point it be­comes 1 by fiat, and dis­prov­able sen­tences will like­wise go to 0. How fast is the con­ver­gence? I think it will be ex­tremely fast, far out­pac­ing our slow proof search and sub­sum­ing any pos­si­ble hu­man or com­puter rea­son­ing about PA—even though the prior knows noth­ing about PA and only re­ceives opaque bits.

To see why, re­call that the uni­ver­sal prior can pick up any com­putable reg­u­lar­i­ties. For ex­am­ple, if you feed it a se­quence where all even-num­bered bits are 1 and odd-num­bered bits are hard to pre­dict, it will quickly be­come very cer­tain that all fu­ture even-num­bered bits will be 1. Similarly, if there’s some de­cid­able set of Gödel num­bers that for some rea­son cor­re­spond to prov­able sen­tences (no mat­ter how long their proofs are), the uni­ver­sal prior will be­come very cer­tain of that af­ter re­ceiv­ing enough data. The same goes for reg­u­lar­i­ties in re­la­tion­ships be­tween bits, like “for all A and B, if A is prov­able and A→B is prov­able, then B is prov­able as well”. All such short­cuts will be learned in full gen­er­al­ity as quickly as pos­si­ble.

This ap­proach to log­i­cal un­cer­tainty can work well for ques­tions that can be set­tled by PA, like digits of π. What about sen­tences that aren’t prov­able or dis­prov­able in PA, like Con(PA)? It’s tempt­ing to try and make the pro­cess con­verge for all of those as well, but I don’t want to do that, be­cause many such sen­tences are ba­si­cally hal­lu­ci­na­tions. Who cares what their prob­a­bil­ities are.

Why does this ap­proach mat­ter? Isn’t the uni­ver­sal prior un­com­putable? But it’s ap­prox­imable, so any ap­prox­i­ma­tion scheme could be used in the same way as log­i­cal in­duc­tion (an­other com­putable pro­cess with un­com­putably slow con­ver­gence). Also the uni­ver­sal prior is well-stud­ied and has many nice prop­er­ties.

I don’t want to diminish the amaz­ing work that the log­i­cal in­duc­tion folks have done. In some ways (e.g. re­flec­tion) it works bet­ter than the uni­ver­sal prior. But it’s in­ter­est­ing to me that the idea of prob­a­bil­is­tic rea­son­ing about fu­ture the­o­rems based on a small amount of known the­o­rems can be cap­tured so well by a stan­dard ap­proach, similar to how UDT can be al­most com­pletely cov­ered by a one-liner com­bi­na­tion of stan­dard game the­ory tools.