Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn’t require knowing Occam’s razor
Note: I don’t think this idea is original, but I couldn’t find a good post going over the implications.
I used to think that Solomonoff induction was a bit arbitrary for the following reason: it assigned a 100% probability to the universe being computable. I’m pretty sure the universe is computable (ignoring randomness), but nowhere near 100% sure. Who’s to say we won’t find a halting oracle floating in space for no reason? That seems like a pretty simple hypothesis. Why the focus on recursive languages? You have to make some choice of how descriptions work (you can’t assign positive probability to every infinite bit string), but that didn’t change the feelings of arbitrariness.
But then I realized this understanding of why to use Solomonoff induction is incorrect. We do not use it because of the physical church-turing thesis, we use it because of the original church-turing thesis:
L.C.M.s [logical computing machines: Turing’s expression for Turing machines] can do anything that could be described as ‘rule of thumb’ or ‘purely mechanical’. - Alan Turing
Because what matters is not whether the universe is computable, but whether our methods of reasoning are computable. Or in other words, whether the map is computable. Solomonoff’s induction is at least as “good” as any computable inference method (up to a constant), regardless of the complexity of the universe. So if you, as a human, are trying to come up with a systematic way to predict things (even uncomputable things), Solomonoff’s induction is better. Here is the precise statement:
Theorem: Let be some probability distributions on infinite sequences of bits such that inferring the next bit from a prefix is computable. The likelihood ratio from to Solomonoff induction’s prior is bounded above by some finite constant (despite the sequence containing infinitely many bits), and this constant is independent of the sequence of bits.
Proof sketch: (Note: this is already a well-known result.) There is a program that is a decoder for an entropy code over finite bit strings based on . Given any finite bit string , we can find a such that . The length of is approximately (where is the probability of a bit string starting with according to ). The probability of a string starting with under Solomonoff induction’s prior is greater than . So regardless of the content or length of |x|, the ratio is bounded by (which only depends on ).
A nice thing about this result is that it makes no assumptions about the sequence of bits. The theorem is not just saying Solomonoff induction is good v.s. D on most sequences, or even almost all of them. It is good for every particular sequence of bits, because there is an upper bound on the likelihood ratio that is independent of the sequence. There is no adversarial example.
(Notice how we never invoked Occam’s razor to argue that Solomonoff induction is superior. We can instead go the other way; Occam’s razor is good because it’s an informal version of an ideal inference procedure.)
Solomonoff induction v.s. a human in an uncomputable world
How does this shake out in an uncomputable universe? What if you’re convinced that there is some orb in the universe emitting the digits of Chaitin’s constant or something? We’ll let Alice be a reasoner using Solomonoff induction, and Bob be a human.
Bob: Ah yes, I have found it! An orb emitting the digits of Chaitin’s constant is floating around in our solar system!
Alice: How do you figure?
Bob: I calculated the first two digits, and they matched!
Alice: Surprising! But not that surprising (about six and half bits).
Bob: I was surprised too, but now I can do something better than you. I can predict the digits that orb will emit.
Alice: How do you plan to predict an uncomputable sequence, given that you’re a human?
Bob: Oh yeah…
Alice: In fact, if you’re correct it will eventually look like a uniformly random sequence to you since it is algorithmically random. So I’ll be able to keep up with you just fine.
Harsh! But maybe Bob’s hypothesis will let him use the orb for something?
Bob: A ha! My hypothesis has allowed me to predict theorems proven by mathematicians before they actually proved them.
Alice: How so?
Bob: I can use the digits emitted by the orb to do computations that predict theorems. And my observations are verified when mathematicians prove it the normal way.
Alice: Yeah, I noticed the orb can do that too.
Bob: Doesn’t that support the idea that the orb is emitting the digits of Chaitin’s constant?
Alice: You could understand some of my current hypotheses as making an assumption that the orb does this, and exploiting it to learn things about other parts of the environment. However, the code is not commented, so this is just a matter of interpretation. I don’t have a hypothesis that predicts in advance that the orb emits the digits of Chaitin’s constant. But technically you don’t either, since you yourself don’t know the digits.
If the environment has uncomputable features, Alice’s hypotheses act like oracle machines. She can’t predict those features, but she can use them to predict other features of the environment. Going back to our theorem, if Bob can use the oracle, so can Alice because she has Bob as one of her hypotheses.
Hypotheses as methods of reasoning instead of as possible worlds
We often think of Solomonoff induction’s hypotheses as being possible worlds. I will argue for a different intuition; the hypotheses are forms of reasoning.
Any systematic method to make predictions used by any human, in the past, present, or future, is one of the hypotheses. If the human somehow has uncomputable intuitions (unlikely, but possible), we can just treat that as part of the environment and Solomonoff induction will figure out how to exploit it.
The hypotheses are possible maps instead of possible territories!
Solomonoff induction is great because it’s hypotheses include all forms of systematic reasoning (by the church-turing thesis). Even if we believe the universe is uncomputable and we are right, Solomonoff induction will make predictions just as well as us. In particular, it will act like an oracle machine with respect to the uncomputable parts of the environment, but only after seeing them. But since they are uncomputable, we can’t systematically predict them in advance either. If the universe is uncomputable, the litany of Tarski becomes a physically impossible desire!
I do have one unanswered question though: so anyways, how likely is it that the universe has a halting oracle? 🤔