Using the universal prior for logical uncertainty (retracted)

(I’m not sure about any of this)

(Yup! Vanessa Kowalski pointed out a crucial error, see her comment below. The idea about the universal prior picking up all computable regularities is unproved. I’m leaving the post as it was, but please read it as a historical document only.)

Abram Demski had an old paper describing a naive way to use the universal prior for logical uncertainty. He pointed out some problems with it, but I’m not sure the problems are real. I think the approach will work surprisingly well.

The idea is that, since the universal prior is just a probability distribution over bit sequences, we don’t have to use it to predict bits one by one. We can also update it on facts like “the seventh bit is 1” without mentioning the previous six bits. To choose which bits to update on, we can mechanically search through all possible proofs in Peano arithmetic (PA) by increasing length. Whenever we prove a sentence with Gödel number N, we tell the prior that the Nth bit is 1; whenever we disprove one, we tell the prior that the bit is 0. After each update, the prior’s opinions about all other bits (and thus all other sentences in PA) will shift, hopefully in a meaningful way.

It’s easy to see that the probability of any provable sentence will converge to 1, because at some point it becomes 1 by fiat, and disprovable sentences will likewise go to 0. How fast is the convergence? I think it will be extremely fast, far outpacing our slow proof search and subsuming any possible human or computer reasoning about PA—even though the prior knows nothing about PA and only receives opaque bits.

To see why, recall that the universal prior can pick up any computable regularities. For example, if you feed it a sequence where all even-numbered bits are 1 and odd-numbered bits are hard to predict, it will quickly become very certain that all future even-numbered bits will be 1. Similarly, if there’s some decidable set of Gödel numbers that for some reason correspond to provable sentences (no matter how long their proofs are), the universal prior will become very certain of that after receiving enough data. The same goes for regularities in relationships between bits, like “for all A and B, if A is provable and A→B is provable, then B is provable as well”. All such shortcuts will be learned in full generality as quickly as possible.

This approach to logical uncertainty can work well for questions that can be settled by PA, like digits of π. What about sentences that aren’t provable or disprovable in PA, like Con(PA)? It’s tempting to try and make the process converge for all of those as well, but I don’t want to do that, because many such sentences are basically hallucinations. Who cares what their probabilities are.

Why does this approach matter? Isn’t the universal prior uncomputable? But it’s approximable, so any approximation scheme could be used in the same way as logical induction (another computable process with uncomputably slow convergence). Also the universal prior is well-studied and has many nice properties.

I don’t want to diminish the amazing work that the logical induction folks have done. In some ways (e.g. reflection) it works better than the universal prior. But it’s interesting to me that the idea of probabilistic reasoning about future theorems based on a small amount of known theorems can be captured so well by a standard approach, similar to how UDT can be almost completely covered by a one-liner combination of standard game theory tools.