Contra Anton 🏴‍☠️ on Kolmogorov complexity and recursive self improvement

Twitter user @atroyn claims that recursive self-improvement is impossible because of Kolmogorov complexity. Quoting most of[1] the argument here:

here is an argument against the possibility of recursive self improvement of any ‘intelligent’ computer program, based on kolmogorov complexity.

intelligence is the ability to make correct predictions about the state of the world given available information.

each program which makes predictions about the world has a kolmogorov complexity corresponding to the length of the shortest string which can express that program

for a given program p call this complexity k.

(unfortunately k(p) is in general uncomputable, the proof reduces to the halting problem, but that’s not important here)

more intelligence (in our definition) implies the ability to predict more of the world more accurately, i.e. to express more of the world’s complexity—this implies that a more intelligent program p2 necessarily has more complexity than a less intelligent p1

to see that this is necessarily so, note that if we could predict the world equally accurately as p1′s prediction with a program p0 with k0 < k1, then we have a contradiction since k1 was supposed to be the minimal expression of intelligence at that level

in order to get recursive self improvement, you need a program p1 which is capable of emitting p2 which is better able to predict the world than p1 - i.e., we need p1 to emit p2 such that k2 > k1

but this is a contradiction.


The mistake here is the assumption that a program that models the world better necessarily has a higher Kolmogorov complexity. Originally, Kolmogorov complexity measured the complexity of bit strings. But we’re talking about predictors here, things that observe the world and spit out probability distributions over observed outcomes. In the context of predictors, Kolmogorov complexity measures the complexity of a function from observations to predictions.

In the case of ideal Bayesian reasoning, we can nail down such a function just by specifying a prior, eg. the Solomonoff prior. (Plus, an approximation scheme to keep things computable, I guess.) This doesn’t take a very large program to implement. But a non-ideal reasoner will screw up in many cases, and there’s information contained in the exact way it screws up for each set of observations. Such reasoners can have an almost arbitrarily high Kolmogorov complexity, and they’re all worse than the ideal Bayesian program.

In other words, the successor program has Kolmogorov complexity less than or equal to that of its predecessor, but so what? That doesn’t imply that it’s worse.

(Also, Kolmogorov complexity doesn’t care about how much time a program takes to run at all, but in the real world it’s an important consideration, and a target for self-improvement.)

That concludes this post: without the assumption that higher Kolmogorov complexity is better, the whole argument falls apart.

  1. ↩︎

    The rest of the thread briefly touches on the issue of how an AI could know that its successor would necessarily be an improvement. The discussion there is kind of doomed since it’s done with the goal of showing that the successor has lower or equal Kolmogorov complexity than the original, which is uninteresting, though we can see right away that it must be true, assuming that the original writes the successor before observing the world at all. But there’s an interesting version of the question, which asks about the set of axioms used by the systems to reason about the world, rather than the Kolmogorov complexity. See this paper by Yudkowsky and Herreshoff for details.