A quick sketch on how the Curry-Howard Isomorphism kinda appears to connect Algorithmic Information Theory with ordinal logics

The following is sorta-kinda carried on from a recent comments thread, where I was basically saying I wasn’t gonna yack about what I’m thinking until I spent the time to fully formalized it. Well, Luke got interested in it, and I spewed the entire sketch and intuition to him, and he asked me to put it up where others can participate. So the following is it.

Basically, Algorithmic Information Theory as started by Solomonoff and Kolmogorov, and then continued by Chaitin, contains a theorem called Chaitin’s Incompleteness Theorem, which says (in short, colloquial terms) “you can’t prove a 20kg theorem with 10kg of axioms”. Except it says this in fairly precise mathematical terms, all of which are based in the undecidability of the Halting Problem. To possess “more kilograms” of axioms is mathematically equivalent to being able to computationally decide the halting behavior of “more kilograms” of Turing Machines, or to be able to compress strings to smaller sizes.

Now consider the Curry-Howard Isomorphism, which says that logical systems as computation machines and logical systems as mathematical logics are, in certain precise ways, the same thing. Now consider ordinal logic as started in Turing’s PhD thesis, which starts with ordinary first-order logic and extends it with axioms saying “First-order logic is consistent”, “First-order logic extended with the previous axiom is consistent”, all the way up to the limiting countable infinity Omega (and then, I believe but haven’t checked, further into the transfinite ordinals).

In a search problem with partial information, as you gain more information you’re closing in on a smaller and smaller portion of your search space. Thus, Turing’s ordinal logics don’t violate Goedel’s Second Incompleteness Theorem: they specify more axioms, and therefore specify a smaller “search space” of models that are, up to any finite ordinal level, standard models of first-order arithmetic (and therefore genuinely consistent up to precisely that finite ordinal level). Goedel’s Completeness Theorem says that theorems of a first-order theory/​language are provable iff they are true in every model of that first-order theory/​language. The clearest, least mystical, presentation of Goedel’s First Incompleteness Theorem is: nonstandard models of first-order arithmetic exist, in which Goedel Sentences are false. The corresponding statement of Goedel’s Second Incompleteness Theorem follows: nonstandard models of first-order arithmetic, which are inconsistent, exist. To capture only the consistent standard models of first-order arithmetic, you need to specify the additional axiom “First-order arithmetic is consistent”, and so on up the ordinal hierarchy.

Back to learning and AIT! Your artificial agent, let us say, starts with a program 10kg large. Through learning, it acquires, let us say, 10kg of empirical knowledge, giving it 20kg of “mass” in total. Depending on how precisely we can characterize the bound involved in Chaitin’s Incompleteness Theorem (he just said, “there exists a constant L which is a function of the 10kg”, more or less), we would then have an agent whose empirical knowledge enables it to reason about a 12kg agent. It can’t reason about the 12kg agent plus the remaining 8kg of empirical knowledge, because that would be 20kg and it’s only a 20kg agent now even with its strongest empirical data, but it can formally prove universally-quantified theorems about how the 12kg agent will behave as an agent (ie: its goal functions, the soundness of its reasoning under empirical data, etc.). So it can then “trust” the 12kg agent, hand its 10kg of empirical data over, and shut itself down, and then “come back online” as the new 12kg agent and learn from the remaining 8kg of data, thus being a smarter, self-improved agent. The hope is that the 12kg agent, possessing a stronger mathematical theory, can generalize more quickly from its sensory data, thus enabling it to accumulate empirical knowledge more quickly and generalize more precisely than its predecessor, thus speeding it through the process of compressing all available information provided by its environment and achieving the reasoning power of something like a Solomonoff Inducer (ie: which has a Turing Oracle to give accurate Kolmogorov complexity numbers).

This is the sketch and the intuition. As a theory, it does one piece of very convenient work: it explains why we can’t solve the Halting Problem in general (we do not possess correct formal systems of infinite size with which to reason about halting), but also explains precisely why we appear to be able to solve it in so many of the cases we “care about” (namely: we are reasoning about programs small enough that our theories are strong enough to decide their halting behavior—and we discover new formal axioms to describe our environment).

So yeah. I really have to go now. Mathematical input and criticism is very welcomed; the inevitable questions to clear things up for people feeling confusion about what’s going on will be answered eventually.