Efficient Induction

(By which I mean, in­duc­tion over effi­cient hy­pothe­ses.)

A stan­dard prior is “uniform” over the class of com­putable func­tions (ie, uniform over in­finite strings which have a pre­fix which com­piles). Why is this a nat­u­ral choice of prior? Well, we’ve looked at the uni­verse for a re­ally long time, and it seems to be com­putable. The Church-Tur­ing the­sis says we have no rea­son to choose a big­ger sup­port for our prior.

Why stop there? There is a nat­u­ral gen­er­al­iza­tion of the Church-Tur­ing the­sis, nat­u­rally called the ex­tended Church-Tur­ing the­sis, which as­serts that the uni­verse is com­putable in polyno­mial time. In fact, we have a strong sus­pi­cion that physics should be lo­cal, which means more or less pre­cisely that up­dates can be performed us­ing a lin­ear size log­i­cal cir­cuit. Maybe we should re­strict our prior fur­ther, look­ing only for small cir­cuits.

(As we re­al­ized only slightly later, this ex­tended hy­poth­e­sis is al­most cer­tainly false, be­cause in the real world prob­a­bil­ities are quan­tum. But if we re­place “cir­cuit” by “quan­tum cir­cuit,” which is a much less ar­bi­trary change than it seems at face value, then we are good. Are fur­ther changes forth­com­ing? I don’t know, but I sus­pect not.)

So we have two nice ques­tions. First, what does a prior over effi­ciently com­putable hy­pothe­ses look like? Se­cond, what sort of ob­ser­va­tion about the world could cause you to mod­ify Solomonoff in­duc­tion? For that mat­ter, what sort of phys­i­cal ev­i­dence ever con­vinced us that Solomonoff in­duc­tion was a good idea in the first place, rather than a broader prior? I sus­pect that both of these ques­tions have been tack­led be­fore, but google has failed me so now I will re­peat some ob­ser­va­tions.

Defin­ing pri­ors over “polyno­mi­ally large” ob­jects is a lit­tle more sub­tle than usual Solomonoff in­duc­tion. In some sense we need to pe­nal­ize a hy­poth­e­sis both for its de­scrip­tion com­plex­ity and its com­pu­ta­tional com­plex­ity. Here is a first try:

A hy­poth­e­sis con­sists of an ini­tial state of some length N, and a log­i­cal cir­cuit of size M which takes N bits to K+N bits. The uni­verse evolves by re­peat­edly ap­ply­ing the log­i­cal cir­cuit to com­pute both a “next ob­ser­va­tion” (the first K bits) and the new state of the uni­verse (the last N bits). The prob­a­bil­ity of a hy­poth­e­sis drops off ex­po­nen­tially with its length.

How rea­son­able is this? Why, not at all. The size of our hy­poth­e­sis is the size of the uni­verse, so it is go­ing to take an awful lot of ob­ser­va­tions to sur­mount the im­prob­a­bil­ity of liv­ing in a large uni­verse. So what to do? Well, the rea­son this con­tra­dicts in­tu­ition is that we ex­pect our phys­i­cal the­ory (as well as the ini­tial state of our sys­tem) to be uniform in some sense, so that it can hope to be de­scribed con­cisely even if the uni­verse is large. Well luck­ily for us the no­tion of unifor­mity already ex­ists for cir­cuits, and in fact ap­pears to be the cor­rect no­tion. In­stead of work­ing with cir­cuits di­rectly, we spec­ify a pro­gram which out­puts that cir­cuit (so if the laws of physics are uniform across space, the pro­gram just has to be told “tile this sim­ple up­date rule at ev­ery point.”) So now it goes like this:

A hy­poth­e­sis con­sists of a pro­gram which out­puts an ini­tial state of some finite length N, and a pro­gram which out­puts a log­i­cal cir­cuit of size M which takes N bits to K+N bits. The ob­ser­va­tions are defined as be­fore. The prob­a­bil­ity of a hy­poth­e­sis drops off ex­po­nen­tially with its length.

How rea­son­able is this? Well it doesn’t ex­ploit the con­jec­tured com­pu­ta­tional effi­ciency of the uni­verse at all. There are three mea­sures of com­plex­ity, and we are only us­ing one of them. We have the length of the hy­poth­e­sis, the size of the uni­verse, and the com­pu­ta­tional com­plex­ity of the up­date rule. At least we now have these quan­tities in hand, so we can hope to in­cor­po­rate them in­tel­li­gently. One solu­tion is to place an ex­plicit bound on the com­plex­ity of the up­date rule in terms of the size of the uni­verse. It is easy to see that this ap­proach is doomed to fail. An al­ter­na­tive ap­proach is to ex­plic­itly in­clude terms de­pen­dent on all three com­plex­ity mea­sures in the prior prob­a­bil­ity for each hy­poth­e­sis.

There are some aes­thet­i­cally pleas­ing solu­tions which I find re­ally at­trac­tive. For ex­am­ple, make the hy­poth­e­sis a space bounded Tur­ing ma­chine and also re­quire it to spec­ify the ini­tial state of its R/​W tape. More sim­ply but less aes­thet­i­cally, you could just pe­nal­ize a hy­poth­e­sis based on the log­a­r­ithm of its run­ning time (since this bounds the size of its out­put), or on log M. I think this scheme gives known phys­i­cal the­o­ries very good de­scrip­tion com­plex­ity. Over­all, it strikes me as an in­ter­est­ing way of think­ing about things.

I don’t re­ally know how to an­swer the sec­ond ques­tion (what sort of ob­ser­va­tion would cause us to re­strict our prior in this way). I don’t re­ally know where the de­scrip­tion com­plex­ity prior comes from ei­ther; it feels ob­vi­ous to me that the uni­verse is com­putable, just like it feels ob­vi­ous that the uni­verse is effi­ciently com­putable. I don’t trust these feel­ings of ob­vi­ous­ness, since they are com­ing from my brain. I guess the other jus­tifi­ca­tion is that we might as well stick to com­putable hy­pothe­ses, be­cause we can’t use stronger hy­pothe­ses to gen­er­ate pre­dic­tions (our con­scious thoughts at least ap­pear­ing to be com­putable). The same logic does have some­thing to say about effi­ciency: we can’t use in­effi­cient hy­pothe­ses to gen­er­ate pre­dic­tions, so we might as well toss them out. But this would lead us to just keep all suffi­ciently effi­cient hy­pothe­ses and throw out the rest, which doesn’t work very well (since the effi­ciency of a hy­poth­e­sis de­pends upon the size of the uni­verse it posits; this ba­si­cally in­volves putting a cap on the size of the uni­verse). I don’t know of a similar jus­tifi­ca­tion which tells us to pe­nal­ize hy­pothe­ses based on their com­plex­ity. The only heuris­tic is that it has worked well for screen­ing out phys­i­cal the­o­ries so far. Thats a pretty good thing to have go­ing for you at least.

In sum, think­ing about pri­ors over more re­stricted sets of hy­pothe­ses can be in­ter­est­ing. If any­one knows of a source which ap­proaches this prob­lem more care­fully, I would be in­ter­ested to learn.