# New Philosophical Work on Solomonoff Induction

I don’t know to what extent MIRI’s current research engages with Solomonoff induction, but some of you may find recent work by Tom Sterkenburg to be of interest. Here’s the abstract of his paper Solomonoff Prediction and Occam’s Razor:

Algorithmic information theory gives an idealised notion of compressibility that is often presented as an objective measure of simplicity. It is suggested at times that Solomonoff prediction, or algorithmic information theory in a predictive setting, can deliver an argument to justify Occam’s razor. This article explicates the relevant argument and, by converting it into a Bayesian framework, reveals why it has no such justificatory force. The supposed simplicity concept is better perceived as a specific inductive assumption, the assumption of effectiveness. It is this assumption that is the characterising element of Solomonoff prediction and wherein its philosophical interest lies.

How is it that Solomonoff Induction, and by extension Occam’s Razor, is justified in the first place? Why is it that hypotheses with higher Kolmogorov complexity are less likely to be true than those with lower Kolmogorov complexity? If it is justified by that fact that it has “worked” in the past, does that not require Solomonoff induction to justify that has worked, in the sense that you need to verify that your memories are true, and thus requires circular reasoning?

See: You only need faith in two things and the comment on the binomial monkey prior (a theory which says that the ‘past’ does not predict the ‘future’).

You could argue that there exists a more fundamental assumption, hidden in the supposed rules of probability, about the validity of the evidence you’re updating on. Here I can only reply that we’re trying to explain the data regardless of whether or not it “is true,” and point to the fact that you’re clearly willing to act like this endeavor has value.

There are more hypotheses with a high complexity than with a low complexity, so it is mathematically necessary to assign lower probabilities to high complexity cases than to low complexity cases (broadly speaking and in general—obviously you can make particular exceptions) if you want your probabilities to sum to 1, because you are summing an infinite series, and to get it to come to a limit, the terms in the series must be generally decreasing.

But in the infinite series of possibilities summing to 1, why should the hypotheses with the highest probability be the ones with the lowest complexity, as opposed to having each consecutive hypothesis having an arbitrary complexity level?

Almost all hypotheses have high complexity. Therefore most high-complexity hypotheses must have low probability.

(To put it differently: let p(n) be the total probability of all hypotheses with complexity n, where I assume we’ve defined complexity in some way that makes it always a positive integer. Then the sum of the p(n) converges, which implies that the p(n) tend to 0. So for large n the total probability of

allhypotheses of complexity n must be small, never mind the probability of any particular one.)Note: all this tells you only about what happens in the limit. It’s all consistent with there being some particular high-complexity hypotheses with high probability.

But why should the probability for lower-complexity hypotheses be any lower?

It shouldn’t, it should be higher.

If you just meant ”… be any higher?” then the answer is that if the probabilities of the higher-complexity hypotheses tend to zero, then for any particular low-complexity hypothesis H all but finitely many of the higher-complexity hypotheses have lower probability. (That’s just part of what “tending to zero” means.)

gjm’s explanation is correct.

Very interesting, thanks for sharing.

This paper makes me think again how amazing it is that science made any progress at all, before the middle part of the 20th century. Science is completely based on induction, and nobody understood induction in any kind of rigorous way until about 1968, but still people managed to make scientific progress. Occam, Bacon, Hume, Popper and others were basically just hand-waving; thankfully this hand-waving was nearly enough correct that it enabled science, but it was still hand-waving.

I don’t think it’s fair to say that “nobody understood induction in any kind of rigorous way until about 1968.” The linked paper argues that Solomonoff prediction does not justify Occam’s razor, but rather that it gives us a specific inductive assumption. And such inductive assumptions had previously been rigorously studied by Carnap among others.

But even if we grant that assumption, I don’t see why we should find it surprising that science made progress without having a rigorous understanding of induction. In general, successfully engaging in some activity doesn’t require having a rigorous understanding of that activity, and making inductive inferences is something that comes very natural to human beings.

Moreover, it seems that algorithmic information theory has (at best) had extremely limited impact on actual scientific practice in the decades since the field was born. So even if it does constitute the first rigorous understanding of induction, the lesson seems to be that scientific progress does not require such an understanding.