Asymptotic Logical Uncertainty: Solomonoff Induction Inspired Approach

This is post is part of the Asymptotic Logical Uncertainty series. This is about a failed attempt to satisfy the Benford test. I will not prove things about this approach. The only reason I am including this approach is because the fact that it does not work is surprising and insightful.

Let Geometric() be the function which outputs with probability .

Fix a prefix-free encoding of probabilistic Turing machines, and let denote the number of bits used to encode . Let be the function which outputs the Turing machine with probability proportional to .

Fix , a deterministic Turing Machine (for example ). Fix a time complexity function and another time complexity function . Consider the following algorithm.

 1: n=0
 2: N=1
 3: M=RTM(3)
 4: g=Geometric(1/2)
 5: loop
 6:     if 4^n*R(N)*log(R(N))*N<T(N) then
 7:         run E for one more time step
 8:         if E outputs another bit then
 9:             n=n+1
10:             repeat
11:                 M=RTM(3)
12:                 g=Geometric(1/2)
13:             until M^{gR}(i)=E(i) for all i<=n
14:     output M^{gR}(N)
15:     N=N+1

This code is inspired by Solomonoff Induction. We cannot directly apply Solomonoff induction, because the lack of a time bound. Solomonoff induction would pick out the program , but that will not allow us to predict quickly. We have to restrict the run times of the programs to ensure that they can compute their th bit in time T(N).

When trying to predict , we compute the first values of for some much smaller than . We then sample probabilistic Turing machines until we find one that quickly gets all of these first bits correct. We use that Turing machine to compute our guess at .

The purpose of the geometric random variable is to make the time we allow the sampled Turing machines to run more flexible. The sampled Turing machines can take any amount of time in but get an extra penalty for the size of the constant. The fact that we use instead of is to make the proof that it satisfies the weak Benford test work, would probably work too. Line 6 is only to make sure it runs in time .

passes the weak Benford test, but probably fails the Benford test.

First, let me explain why we thought this approach would work. If you sample a Turing machine that is very likely to get all of the bits correct, but gives suboptimal probabilities on the Benford test sentences, then you can modify that program to form a program that outputs 1 with probability for the Benford test sentences, and follows otherwise.

This modified program will be just as likely to get all the non-Benford sentences correct, and more likely to get the the Benford sentences correct. Further, this improvement on the Benford sentences is unbounded as you consider more and more sentences, and therfore eventually pays for all of the cost associated with replacing with .

The problem is that while is more likely to get the Benford sentences correct, we are not optimizing for programs that are likely to get the Benford sentences correct. Instead, we are optimizing for programs that are likely to get the Benford sentences correct conditioned on getting all the other questions correct.

At first, this may not seem like much of problem. The programs are entangling their results on some inputs with their results on other inputs. The problem comes from the fact that sometimes the program tries to entangle its results with a much much earlier sentence, and when the program was answering that earlier sentence it had much less time to think.

To give a concrete example of why this is a problem, consider the sentence: =”The first digit of is a 1 if and only if the first digit of is a 1.”

The probability we should assign to this sentence is . However, by the time we have to assign a probability to the th Benford sentence, we may have already calculated the first digit of and seen that it is a 1. In which case, to maximize the probability of getting all the bits correct, we have to have our answer to the th Benford sentence match our answer to .

The correct thing to do would be to update the probability assigned to when observing that the first digit of is a 1. However, that is not possible, so the next best thing is to change the answer to the th Benford sentence, which causes it to give the wrong answer.

I believe that the fact that this program does not work can be extended to an impossibility result saying that any program which loops through Turing machines, gives them a score based on how many answers they get correct, and samples according to their complexity and their score, must fail the Benford test.