A basis for pattern-matching in logical uncertainty

Previous logical uncertainty robot designs (e.g. here, here, here) have relied on proving theorems that relate to the statement (e.g. “the trillionth digit of pi is 5”) under inspection (where a useful theorem would be e.g. “there are 10 possible mutually exclusive digits [1,2,3,4,5,6,7,8,9,0] the trillionth digit of pi could be.”). This is nice. But it doesn’t do pattern-matching—if you tell a theorem-proving robot that a statement is true for the first thousand integers, it doesn’t even suspect that the statement might be true for the 1001st too. The statements are just independent black boxes.

In order to go further, our robot needs to peek inside the black box. But how, and why should it see patterns?

We tell our robot these facts: “3 is ‘odd’. 5 is ‘odd’. 7 is ‘odd’. 11 is ‘odd’.”

The robot asks “Could you just tell me what this concept ‘odd’ means explicitly? I don’t know English very well.”

“No,” we say. “You’ll have to guess.”

For robots that don’t make up information out of thin air, guesses about ‘odd’-ness will obey the maximum-entropy principle, which roughly means “give everything equal probability until you learn more.”

“Well, robot, what do you think is the probability that 9 is ‘odd’, given what we’ve told you?”

“50 percent. Beep boop.”

“But all those other things were odd, weren’t they?”

“Those other things were not the number 9. Imagine going number by number and labeling each number as ‘odd’ or ‘not odd’. There are equal numbers of possible labelings with 9 odd and 9 not odd, regardless of what those other numbers are. Since by the principle of maximum entropy I start out with all labelings equally likely, 9 has a 50% chance of being ‘odd’.”

“But isn’t it just sensible to think that the next number we give you will be ‘odd’ if all the others were? What of the appeal of simplicity?”

“What’s so great about simplicity? I’m trying not to make up information out of thin air here.”

What do we tell our robot to make it guess that 9 is probably odd? Well, we want it to think that patterns of odd-ness that are simpler are more likely. So how about we let the robot peek at our definition of ‘odd’, just long enough to see that for every integer we can test if it’s odd, and how complicated the test is.

Even if our robot has no clue what the constituent symbols were (or, more to the point, not enough processing power to fully explore the ramifications in a more difficult and realistic scenario), knowing the mere number of characters defining ‘odd’ puts an upper bound on the Kolmogorov complexity of how the numbers are labeled. Heck, even just learning that we can write down the oddness-test is huge, since there are an infinite number of infinite-complexity rules.

Once the robot knows that the complexity of ‘odd’ is below a certain smallish value, low-complexity hypotheses like “all numbers are ‘odd’” and “odd numbers are ‘odd’” start to outweigh1 bigger hypotheses like “all numbers except {long list including 9} are ‘odd’.” These simple hypotheses contain just the sorts of patterns we sometimes want the robot to see, like ‘odd’-ness.

We tell our robot these facts: “3 is odd. 5 is odd. 7 is odd. 11 is odd. A number is ‘odd’ if a 14-character predicate is true.”

The robot asks “Could you just tell me what this concept ‘odd’ means explicitly?”

“No,” we say. “You’ll have to guess. What do you think is the probability that 9 is ‘odd’, given what we’ve told you?”

“65.1 percent.”

“Hah, got you! When we said ‘odd’ we secretly meant prime!”

I suspect that this kind of peeking handles most of the cases where we want something like pattern-matching (specifically, minimum-message-length prediction of patterns) in logical uncertainty. The obvious un-handled case—the choice of axioms, or logical systems, or that sort of thing, seems more like model uncertainty than logical uncertainty—that is, the question is not really “is second-order logic true,” it’s “does second-order logic correspond to the world I want to model,” which is beyond the scope of this Discussion article.

1 How do you turn the number of symbols we wrote it down with into a distribution over possible rules? It’s easiest to just maximum-entropy all valid rules with the correct number of symbols. Since many rules can be compressed, the set of rules with some number of symbols is smeared over lower possible Kolmogorov complexities, I think with an exponential preference towards lower complexity. But on the other hand, humans don’t hand out maximum-entropy samples of definitions—we’ll need probabilistic logic to do probabilistic logic. (Yo dawg, I heard you liked recursion, so we put this joke in itself so you can [this joke] while you [this joke].)