A basis for pattern-matching in logical uncertainty

Pre­vi­ous log­i­cal un­cer­tainty robot de­signs (e.g. here, here, here) have re­lied on prov­ing the­o­rems that re­late to the state­ment (e.g. “the trillionth digit of pi is 5”) un­der in­spec­tion (where a use­ful the­o­rem would be e.g. “there are 10 pos­si­ble mu­tu­ally ex­clu­sive digits [1,2,3,4,5,6,7,8,9,0] the trillionth digit of pi could be.”). This is nice. But it doesn’t do pat­tern-match­ing—if you tell a the­o­rem-prov­ing robot that a state­ment is true for the first thou­sand in­te­gers, it doesn’t even sus­pect that the state­ment might be true for the 1001st too. The state­ments are just in­de­pen­dent black boxes.

In or­der to go fur­ther, our robot needs to peek in­side the black box. But how, and why should it see pat­terns?

We tell our robot these facts: “3 is ‘odd’. 5 is ‘odd’. 7 is ‘odd’. 11 is ‘odd’.”

The robot asks “Could you just tell me what this con­cept ‘odd’ means ex­plic­itly? I don’t know English very well.”

“No,” we say. “You’ll have to guess.”

For robots that don’t make up in­for­ma­tion out of thin air, guesses about ‘odd’-ness will obey the max­i­mum-en­tropy prin­ci­ple, which roughly means “give ev­ery­thing equal prob­a­bil­ity un­til you learn more.”

“Well, robot, what do you think is the prob­a­bil­ity that 9 is ‘odd’, given what we’ve told you?”

“50 per­cent. Beep boop.”

“But all those other things were odd, weren’t they?”

“Those other things were not the num­ber 9. Imag­ine go­ing num­ber by num­ber and la­bel­ing each num­ber as ‘odd’ or ‘not odd’. There are equal num­bers of pos­si­ble la­bel­ings with 9 odd and 9 not odd, re­gard­less of what those other num­bers are. Since by the prin­ci­ple of max­i­mum en­tropy I start out with all la­bel­ings equally likely, 9 has a 50% chance of be­ing ‘odd’.”

“But isn’t it just sen­si­ble to think that the next num­ber we give you will be ‘odd’ if all the oth­ers were? What of the ap­peal of sim­plic­ity?”

“What’s so great about sim­plic­ity? I’m try­ing not to make up in­for­ma­tion out of thin air here.”

What do we tell our robot to make it guess that 9 is prob­a­bly odd? Well, we want it to think that pat­terns of odd-ness that are sim­pler are more likely. So how about we let the robot peek at our defi­ni­tion of ‘odd’, just long enough to see that for ev­ery in­te­ger we can test if it’s odd, and how com­pli­cated the test is.

Even if our robot has no clue what the con­stituent sym­bols were (or, more to the point, not enough pro­cess­ing power to fully ex­plore the ram­ifi­ca­tions in a more difficult and re­al­is­tic sce­nario), know­ing the mere num­ber of char­ac­ters defin­ing ‘odd’ puts an up­per bound on the Kol­mogorov com­plex­ity of how the num­bers are la­beled. Heck, even just learn­ing that we can write down the odd­ness-test is huge, since there are an in­finite num­ber of in­finite-com­plex­ity rules.

Once the robot knows that the com­plex­ity of ‘odd’ is be­low a cer­tain small­ish value, low-com­plex­ity hy­pothe­ses like “all num­bers are ‘odd’” and “odd num­bers are ‘odd’” start to out­weigh1 big­ger hy­pothe­ses like “all num­bers ex­cept {long list in­clud­ing 9} are ‘odd’.” Th­ese sim­ple hy­pothe­ses con­tain just the sorts of pat­terns we some­times want the robot to see, like ‘odd’-ness.

We tell our robot these facts: “3 is odd. 5 is odd. 7 is odd. 11 is odd. A num­ber is ‘odd’ if a 14-char­ac­ter pred­i­cate is true.”

The robot asks “Could you just tell me what this con­cept ‘odd’ means ex­plic­itly?”

“No,” we say. “You’ll have to guess. What do you think is the prob­a­bil­ity that 9 is ‘odd’, given what we’ve told you?”

“65.1 per­cent.”

“Hah, got you! When we said ‘odd’ we se­cretly meant prime!”

I sus­pect that this kind of peek­ing han­dles most of the cases where we want some­thing like pat­tern-match­ing (speci­fi­cally, min­i­mum-mes­sage-length pre­dic­tion of pat­terns) in log­i­cal un­cer­tainty. The ob­vi­ous un-han­dled case—the choice of ax­ioms, or log­i­cal sys­tems, or that sort of thing, seems more like model un­cer­tainty than log­i­cal un­cer­tainty—that is, the ques­tion is not re­ally “is sec­ond-or­der logic true,” it’s “does sec­ond-or­der logic cor­re­spond to the world I want to model,” which is be­yond the scope of this Dis­cus­sion ar­ti­cle.

1 How do you turn the num­ber of sym­bols we wrote it down with into a dis­tri­bu­tion over pos­si­ble rules? It’s eas­iest to just max­i­mum-en­tropy all valid rules with the cor­rect num­ber of sym­bols. Since many rules can be com­pressed, the set of rules with some num­ber of sym­bols is smeared over lower pos­si­ble Kol­mogorov com­plex­ities, I think with an ex­po­nen­tial prefer­ence to­wards lower com­plex­ity. But on the other hand, hu­mans don’t hand out max­i­mum-en­tropy sam­ples of defi­ni­tions—we’ll need prob­a­bil­is­tic logic to do prob­a­bil­is­tic logic. (Yo dawg, I heard you liked re­cur­sion, so we put this joke in it­self so you can [this joke] while you [this joke].)