The Strong Occam’s Razor

This post is a sum­mary of the differ­ent po­si­tions ex­pressed in the com­ments to my pre­vi­ous post and el­se­where on LW. The cen­tral is­sue turned out to be as­sign­ing “prob­a­bil­ities” to in­di­vi­d­ual the­o­ries within an equiv­alence class of the­o­ries that yield iden­ti­cal pre­dic­tions. Pre­sum­ably we must pre­fer shorter the­o­ries to their longer ver­sions even when they are equiv­a­lent. For ex­am­ple, is “physics as we know it” more prob­a­ble than “Odin cre­ated physics as we know it”? Is the Hamil­to­nian for­mu­la­tion of clas­si­cal me­chan­ics apri­ori more prob­a­ble than the La­grangian for­mu­la­tion? Is the defi­ni­tion of re­als via Dedekind cuts “truer” than the defi­ni­tion via bi­nary ex­pan­sions? And are these all re­ally the same ques­tion in dis­guise?

One at­trac­tive an­swer, given by shok­wave, says that our in­tu­itive con­cept of “com­plex­ity penalty” for the­o­ries is re­ally an in­com­plete for­mal­iza­tion of “con­junc­tion penalty”. The­o­ries that re­quire ad­di­tional premises are less likely to be true, ac­cord­ing to the eter­nal laws of prob­a­bil­ity. Ad­ding premises like “Odin cre­ated ev­ery­thing” makes a the­ory less prob­a­ble and also hap­pens to make it longer; this is the en­tire rea­son why we in­tu­itively agree with Oc­cam’s Ra­zor in pe­nal­iz­ing longer the­o­ries. Un­for­tu­nately, this an­swer seems to be based on a con­cept of “truth” granted from above—but what do differ­ing de­grees of truth ac­tu­ally mean, when two the­o­ries make ex­actly the same pre­dic­tions?

Another in­trigu­ing an­swer came from JGWeiss­man. Ap­par­ently, as we learn new physics, we tend to dis­card in­con­ve­nient ver­sions of old for­mal­isms. So elec­tro­mag­netic po­ten­tials turn out to be “more true” than elec­tro­mag­netic fields be­cause they carry over to quan­tum me­chan­ics much bet­ter. I like this an­swer be­cause it seems to be very well-in­formed! But what shall we do af­ter we dis­cover all of physics, and still have mul­ti­ple equiv­a­lent for­mal­isms—do we have any rea­son to be­lieve sim­plic­ity will still work as a de­cid­ing fac­tor? And the ques­tion re­mains, which defi­ni­tion of real num­bers is “cor­rect” af­ter all?

Eliezer, bless him, de­cided to take a more naive view. He merely pointed out that our in­tu­itive con­cept of “truth” does seem to dis­t­in­guish be­tween “physics” and “God cre­ated physics”, so if our cur­rent for­mal­iza­tion of “truth” fails to tell them apart, the flaw lies with the for­mal­ism rather than with us. I have a lot of sym­pa­thy for this an­swer as well, but it looks rather like a mys­tery to be solved. I never ex­pected to be­come en­tan­gled in a con­tro­versy over the no­tion of truth on LW, of all places!

A fi­nal and most in­trigu­ing an­swer of all came from sat­urn, who al­luded to a po­si­tion held by Eliezer and sharp­ened by Nesov. After think­ing it over for awhile, I gen­er­ated a good con­tender for the most con­fused ar­gu­ment ever ex­pressed on LW. Namely, I’m go­ing to com­pletely ig­nore the is-ought dis­tinc­tion and use moral­ity to prove the “strong” ver­sion of Oc­cam’s Ra­zor—that shorter the­o­ries are more “likely” than equiv­a­lent longer ver­sions. You ready? Here goes:

Imag­ine you have the op­tion to put a hu­man be­ing in a sealed box where they will be tor­tured for 50 years and then in­cin­er­ated. No ob­ser­va­tional ev­i­dence will ever leave the box. (For added cer­tainty, fling the box away at near light­speed and let the ex­pan­sion of the uni­verse en­sure that you can never reach it.) Now con­sider the fol­low­ing phys­i­cal the­ory: as soon as you seal the box, our laws of physics will make a lo­cal­ized ex­cep­tion and the vic­tim will spon­ta­neously van­ish from the box. This the­ory makes ex­actly the same ob­ser­va­tional pre­dic­tions as your cur­rent best the­ory of physics, so it lies in the same equiv­alence class and you should give it the same cre­dence. If you’re still re­luc­tant to push the but­ton, it looks like you already are a be­liever in the “strong Oc­cam’s Ra­zor” say­ing sim­pler the­o­ries with­out lo­cal ex­cep­tions are “more true”. QED.

It’s not clear what, if any­thing, the above ar­gu­ment proves. It prob­a­bly has no con­se­quences in re­al­ity, be­cause no mat­ter how se­duc­tive it sounds, skip­ping over the is-ought dis­tinc­tion is not per­mit­ted. But it makes for a nice koan to med­i­tate on weird mat­ters like “prob­a­bil­ity as prefer­ence” (due to Nesov and Wei Dai) and other mys­ter­ies we haven’t solved yet.

ETA: Hal Fin­ney pointed out that the UDT ap­proach—as­sum­ing that you live in many branches of the “Solomonoff mul­ti­verse” at once, weighted by sim­plic­ity, and re­duc­ing ev­ery­thing to de­ci­sion prob­lems in the ob­vi­ous way—dis­solves our mys­tery nicely and log­i­cally, at the cost of aban­don­ing ap­prox­i­mate con­cepts like “truth” and “de­gree of be­lief”. It agrees with our in­tu­ition in ad­vis­ing you to avoid tor­tur­ing peo­ple in closed boxes, and more gen­er­ally in all ques­tions about moral con­se­quences of the “im­plied in­visi­ble”. And it nicely skips over all the tan­gled is­sues of “ac­tual” vs “po­ten­tial” pre­dic­tions, etc. I’m a lit­tle em­bar­rassed at not hav­ing no­ticed the con­nec­tion ear­lier. Now can we find any other good solu­tions, or is Wei’s idea the only game in town?