V_V comments on Computable Universal Prior

V_V 11 Dec 2015 14:54 UTC
8 points
0

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0.

This is exactly Solomonoff induction, assuming that the programming language is prefix-free.

I figure there must be some reason people don’t do this already, or else there’s a bunch of people doing it. I’d be real happy to find out about either.

Because it’s uncomputable: there are infinitely many programs to consider, and there are programs that don’t halt and for many of them, you can’t prove that they don’t halt.

If you set a maximum program length and a maximum execution time, then inference becomes computable, albeit combinatorially hard: maximum a posteriori inference can be easily reduced to MAX-SAT or ILP and is therefore NP-complete (optimization version), while posterior evaluation is #P-complete.

It’s known that many NP-hard problems are practically solvable for real-world instances. In this case, there are indeed people (primarily at Microsoft, I think) who managed to make it work, but only for short program lengths in limited application domains: Survey paper.
- solipsist 12 Dec 2015 1:06 UTC
  5 points
  0
  Parent
  Spit balling hacks around this:
  - Weigh hypotheses based on how many steps it takes for them to be computed on a dovetailing of all Turing machines. This would probably put too much weight on programs that are large but fast to compute.
  - Weigh hypotheses on how much space it takes to compute them. So dovetail all turing machines of size up to n limited to n bits of space for at most 2^n steps. This has the nice property that the prior is, like the hypotheses, space limited (using about twice as much space as the hypothesis).
  - Find some quantum algorithm that uses n^k qubits and polynomial time to magically evaluate all programs of length of n in some semi-reasonable way. If such a beast exists (which I doubt), it has the nice property that it “considers” all reasonably sized hypotheses yet runs in polynomial space and time.
  - Given n bits of evidence about the universe, consider all programs of length up to k*log(n) run for at most n^k2 steps. This has the nice property that it runs in polynomial time and space.