Non-axiomatic math reasoning with naive Bayes

ETA: this post is WRONG. Read on only if you like to find mistakes in other people’s reasoning. Sorry.

So I’ve been thinking how to teach machines to reason about integers without running into Goedelian limitations. A couple days ago I suddenly got this painfully obvious idea that I’m sure others have already developed, but I don’t know how to search for prior work, so I’m posting it here.

Here’s a simple statement about the integers:

5 > 4

It’s easy enough to check by direct calculation if your CPU can do arithmetic. Here’s a more complex statement:

For any x, x+1 > x

How do you check this? Let’s denote the whole statement as S, and the individual statements “x + 1 > x” as S(x). The statements S(1), S(2) and so on are all directly checkable like the first one. For simplicity we can make the “naive Bayesian” assumption that they’re all independent, so P(S) = P(S(1))*P(S(2))*… (ETA: this is shady, though it doesn’t break the construction. See AlephNeil’s comment and my reply for a slightly better way.) After manually checking that S(1) holds, we update its probability to 1, so our estimate of P(S) increases. Then we try S(2), etc. After a while P(S) will become as close to 1 as we want, if S is true.

We can also use a different quantifier:

There exists x such that x*x = 9

You can check this one by naive Bayesian reasoning too, except “exists” corresponds to probabilistic “or”, whereas “for any” corresponded to probabilistic “and”. So you check S(1), S(2) and so on, revising P(S) downward at each step, until you stumble on the right x and P(S) jumps to 1.

Here’s a still more complex statement:

For any x, there exists y such that x ≥ y*y and x < (y+1)*(y+1)

How do you check this one? If we again denote it as S and the individual parts as S(x), we already know how to approximately check S(x) for each value of x, because it’s a statement of the previous “level”! So we know how to check the whole S approximately, too.

Hopefully you get the general idea by now: add as many quantifiers as you wish. The fancy math name for this method of organizing statements is the arithmetical hierarchy. The calculations become expensive, but in the limit of enough computing power the system will get closer and closer to correct reasoning about the integers. As time passes, it will come to believe pretty strongly in all the axioms of PA (or at least the correct ones :-)), the consistency of PA (if it’s consistent), P≠NP (if it’s true), etc. The Goedelian limitation doesn’t apply because we don’t get proofs, only ever-stronger corroborations. The arithmetical hierarchy doesn’t cover tricky second-order questions like the continuum hypothesis, but perhaps we can live without those?

The flip side is that you need a whole lot of computational resources, in fact you need so much that no computable function can estimate how long it’ll take our system to converge to the “actual truth”. Given an arithmetical statement that’s guaranteed to have a counterexample, the number of CPU cycles required to find the counterexample grows uncomputably fast with the size of the statement, I believe. So it’s pretty far from a practical AI, and more like Solomonoff induction—something to be approximated.

Some optimizations immediately spring to mind. If you spend some time calculating and come to strongly believe a bunch of statements involving quantifiers, you can adopt them as heuristics (prospective axioms) while investigating other statements, search for proofs or contradictions using ordinary first-order logic, etc. Such tricks seem to fit nicely into the Bayesian framework as long as you’re willing to introduce some “naivety” here and there.

This sort of non-axiomatic reasoning looks like a good approximation of our math intuitions. The axioms of PA sound reasonable to me only because I can try some examples in my mind and everything works out. So I was pretty excited to find a simple computational process that, given enough time, settles on the axioms of PA and much more besides. What do you think?