# Open Problems Related to Solomonoff Induction

Solomonoff In­duc­tion seems clearly “on the right track”, but there are a num­ber of prob­lems with it that I’ve been puz­zling over for sev­eral years and have not made much progress on. I think I’ve talked about all of them in var­i­ous com­ments in the past, but never col­lected them in one place.

#### Ap­par­ent Un­for­mal­iz­abil­ity of “Ac­tual” Induction

##### Ar­gu­ment via Tarski’s In­defin­abil­ity of Truth

In­for­mally, the the­o­rem states that ar­ith­meti­cal truth can­not be defined in ar­ith­metic. The the­o­rem ap­plies more gen­er­ally to any suffi­ciently strong for­mal sys­tem, show­ing that truth in the stan­dard model of the sys­tem can­not be defined within the sys­tem.

Sup­pose we define a gen­er­al­ized ver­sion of Solomonoff In­duc­tion based on some sec­ond-or­der logic. The truth pred­i­cate for this logic can’t be defined within the logic and there­fore a de­vice that can de­cide the truth value of ar­bi­trary state­ments in this log­i­cal has no finite de­scrip­tion within this logic. If an alien claimed to have such a de­vice, this gen­er­al­ized Solomonoff in­duc­tion would as­sign the hy­poth­e­sis that they’re tel­ling the truth zero prob­a­bil­ity, whereas we would as­sign it some small but pos­i­tive prob­a­bil­ity.

Con­sider an ar­bi­trary prob­a­bil­ity dis­tri­bu­tion P, and the small­est in­te­ger (or the lex­i­co­graph­i­cally least ob­ject) x such that P(x) < 1/​3^^^3 (in Knuth’s up-ar­row no­ta­tion). Since x has a short de­scrip­tion, a uni­ver­sal dis­tri­bu­tion shouldn’t as­sign it such a low prob­a­bil­ity, but P does, so P can’t be a uni­ver­sal dis­tri­bu­tion.

#### Is Solomonoff In­duc­tion “good enough”?

Given the above, is Solomonoff In­duc­tion nev­er­the­less “good enough” for prac­ti­cal pur­poses? In other words, would an AI pro­grammed to ap­prox­i­mate Solomonoff In­duc­tion do as well as any other pos­si­ble agent we might build, even though it wouldn’t have what we’d con­sider cor­rect be­liefs?

#### Is com­plex­ity ob­jec­tive?

Solomonoff In­duc­tion is sup­posed to be a for­mal­iza­tion of Oc­cam’s Ra­zor, and it’s con­fus­ing that the for­mal­iza­tion has a free pa­ram­e­ter in the form of a uni­ver­sal Tur­ing ma­chine that is used to define the no­tion of com­plex­ity. What’s the sig­nifi­cance of the fact that we can’t seem to define a pa­ram­e­ter­less con­cept of com­plex­ity? That com­plex­ity is sub­jec­tive?

#### Is Solomonoff an ideal or an ap­prox­i­ma­tion?

Is it the case that the uni­ver­sal prior (or some suit­able gen­er­al­iza­tion of it that some­how over­comes the above “un­for­mal­iz­abil­ity prob­lems”) is the “true” prior and that Solomonoff In­duc­tion rep­re­sents ideal­ized rea­son­ing, or does Solomonoff just “work well enough” (in some sense) at ap­prox­i­mat­ing any ra­tio­nal agent?

#### How can we ap­ply Solomonoff when our in­puts are not sym­bol strings?

Solomonoff In­duc­tion is defined over sym­bol strings (for ex­am­ple bit strings) but our per­cep­tions are made of “qualia” in­stead of sym­bols. How is Solomonoff In­duc­tion sup­posed to work for us?

#### What does Solomonoff In­duc­tion ac­tu­ally say?

What does Solomonoff In­duc­tion ac­tu­ally say about, for ex­am­ple, whether we live in a cre­ator­less uni­verse that runs on physics? Or the Si­mu­la­tion Ar­gu­ment?

• Con­sider an ar­bi­trary prob­a­bil­ity dis­tri­bu­tion P, and the small­est in­te­ger (or the lex­i­co­graph­i­cally least ob­ject) x such that P(x) < 1/​3^^^3 (in Knuth’s up-ar­row no­ta­tion). Since x has a short de­scrip­tion, a uni­ver­sal dis­tri­bu­tion shouldn’t as­sign it such a low prob­a­bil­ity, but P does, so P can’t be a uni­ver­sal dis­tri­bu­tion.

Solomonoff In­duc­tion would not con­sider this a de­scrip­tion of x as it can­not be used to com­pute x.

• I guess I failed to bridge the in­fer­en­tial gap here. You’re right “Solomonoff In­duc­tion would not con­sider this a de­scrip­tion of x as it can­not be used to com­pute x.” The point I’m try­ing to make is that a cor­rect for­mal­iza­tion of in­duc­tion/​Oc­cam’s Ra­zor ought (in or­der to satisfy our in­tu­itions) to as­sign x some prob­a­bil­ity > 1/​3^^^3 since it has a short (even if un­com­putable) de­scrip­tion, but a for­mal­iza­tion based on this P would not, there­fore it can’t be a cor­rect for­mal­iza­tion. But this is the case no mat­ter what P we use (with differ­ent x de­pend­ing on P), hence the “un­for­mal­iz­abil­ity” prob­lem.

• Could you ex­plain fur­ther? Why “ought” it as­sign it such a prob­a­bil­ity? As stated, this seems more con­vinc­ing as an ar­gu­ment that it “ought not” to as­sign a prob­a­bil­ity > 1/​3^^^3, de­spite the short “de­scrip­tion”.

• The idea is, how­ever we define P, how can we be that sure that there isn’t some kind of un­com­putable physics that would al­low some­one to build a de­vice that can find the lex­i­co­graph­i­cally least ob­ject x such that P(x) < 1/​3^^^3 and pre­sent it us?

• Maybe we should just work off of the as­sump­tion that there’s no rele­vant un­com­putable physics, be­cause if there were, we should prob­a­bly give up our en­deav­ors any­ways, un­less we knew how to model an un­com­putable re­al­ity within a com­putable AGI-pro­gram. As Sch­mid­hu­ber ever so aptly wrote on his home­page:

The dis­tri­bu­tion should at least be com­putable in the limit. That is, there should ex­ist a pro­gram that takes as an in­put any be­gin­ning of the uni­verse his­tory as well as a next pos­si­ble event, and pro­duces an out­put con­verg­ing on the con­di­tional prob­a­bil­ity of the event. If there were no such pro­gram we could not even for­mally spec­ify our uni­verse, leave alone writ­ing rea­son­able sci­en­tific pa­pers about it.

• un­less we knew how to model an un­com­putable re­al­ity within a com­putable AGI-program

You could start out by try­ing to un­der­stand how an AI might in­vent the con­cept of un­com­putabil­ity, and how it might then pro­ceed to the pos­si­bil­ity of un­com­putable physics. And one way to get started here is by think­ing as a cog­ni­tive his­to­rian, and ask­ing how hu­mans came up with the con­cept of un­com­putabil­ity.

• “Give up” is nei­ther a well-defined strat­egy, nor one that was shown to be prefer­able to its al­ter­na­tives.

• That gives you a prob­a­bil­ity in­versely pro­por­tional to the in­te­gral of e^-(the de­scrip­tion length) of each num­ber from 0 to in­finity. Com­plex­ity grows like log(n)-ish. It’s an im­proper prior.

So I agree with Adele that this may be a modus tol­lens /​ modus po­nens mo­ment.

• If there’s some un­com­putable physics that would al­low some­one to build such a de­vice, we ought to re­define what we mean by com­putable to in­clude what­ever the de­vice out­puts. After all, said de­vice falsifies the Church-Tur­ing the­sis, which forms the ba­sis for our defi­ni­tion of “com­putable”.

• Are you es­sen­tially say­ing Solomonoff In­duc­tion could be in­fe­rior to some other meta-al­gorithm (or what­ever you call it) in un­com­putable uni­verses?

• Yes. ETA: I’m also mak­ing the stronger point that any for­mal­iza­tion of in­duc­tion we can come up with would have a similar prob­lem. (Or at least the kinds of for­mal­iza­tions cov­ered by my ar­gu­ments.)

• Tech­ni­cally not if P is suffi­ciently long and com­plex.

• Sup­pose we define a gen­er­al­ized ver­sion of Solomonoff In­duc­tion based on some sec­ond-or­der logic. The truth pred­i­cate for this logic can’t be defined within the logic and there­fore a de­vice that can de­cide the truth value of ar­bi­trary state­ments in this log­i­cal has no finite de­scrip­tion within this logic. If an alien claimed to have such a de­vice, this gen­er­al­ized Solomonoff in­duc­tion would as­sign the hy­poth­e­sis that they’re tel­ling the truth zero prob­a­bil­ity, whereas we would as­sign it some small but pos­i­tive prob­a­bil­ity.

It seems to me that the para­dox may lie within this prob­lem setup, not within the agent do­ing the in­duc­tion.

We first con­sider that, rather than this de­vice be­ing as­signed zero prob­a­bil­ity, it should ac­tu­ally be in­con­ceiv­able to the agent—there should not be a finitely de­scrib­able thingy that the agent as­signs zero prob­a­bil­ity of hav­ing a finitely de­scrib­able prop­erty.

Why would an agent us­ing a sec­ond-or­der analogue of Solomonoff in­duc­tion have such con­cep­tual prob­lems? Well, con­sid­er­ing how Tarski’s origi­nal un­defin­abil­ity the­o­rems worked, per­haps what goes wrong is this: we want to be­lieve that the de­vice out­puts the truth of state­ments about the uni­verse. But we also want to be­lieve this de­vice is in the uni­verse. So what hap­pens if we ask the de­vice, “Does the uni­verse en­tail the sen­tence stat­ing that out­puts ‘No’ in re­sponse to a ques­tion which looks like ?”

Thus, such a de­vice is in­con­ceiv­able in the first place since it has no con­sis­tent model, and we are ac­tu­ally cor­rect to as­sign zero prob­a­bil­ity to the alien’s as­ser­tion that the de­vice pro­duces cor­rect ques­tions to all ques­tions about the uni­verse in­clud­ing ques­tions about the de­vice it­self.

• we want to be­lieve that the de­vice out­puts the truth of state­ments about the universe

I’m not sure why you say this, be­cause the de­vice is sup­posed to out­put the truth of state­ments about some sec­ond-or­der logic, not about the uni­verse. The de­vice is not de­scrib­able by the sec­ond-or­der logic via Tarski, so if the de­vice is in the uni­verse, the uni­verse must only be de­scrib­able by some meta-logic, which im­plies the de­vice is out­putting truth of state­ments about some­thing strictly sim­pler than the uni­verse. The agent ought to be able to con­ceive of this...

• In that case I’m not sure what you mean by “sec­ond-or­der analogue of Solomonoff in­duc­tion”. Solomonoff in­duc­tion is about the uni­verse and pro­ceeds from sen­sory ex­pe­riences. What does it mean to in­duct that some­thing may pro­duce the truth of sen­tences “about a sec­ond-or­der logic”?

• What does it mean to in­duct that some­thing may pro­duce the truth of sen­tences “about a sec­ond-or­der logic”?

I mean sup­pose you en­counter a de­vice that out­puts “true” or “false” when­ever you feed it a sen­tence in some sec­ond-or­der logic, and af­ter do­ing a lot of tests, you think maybe the de­vice out­puts “true” if and only if the in­put sen­tence is ac­tu­ally true, re­gard­less of what in­put it re­ceives. This seems like a straight­for­ward analogue of in­duct­ing that some­thing may be a Halt­ing Or­a­cle, which I thought you agreed an agent should be able to do?

I’m not sure if this an­swers your ques­tion. If not, feel free to find me on Skype or Google Chat to talk more.

• By “true” do you mean that the phys­i­cal uni­verse se­man­ti­cally en­tails the sen­tence, or that the sen­tence is a se­man­tic tau­tol­ogy of sec­ond-or­der logic? I’m as­sum­ing the lat­ter, since the former case is what I was as­sum­ing when I gave my Tarski re­ply.

So… I’m pretty sure you can rep­re­sent a set’s se­man­tic en­tail­ment of a Henkin in­ter­pre­ta­tion of a sec­ond-or­der sen­tence in a first-or­der set the­ory. I’m less sure that you can rep­re­sent en­tail­ment of a sec­ond-or­der sen­tence in­side a sec­ond-or­der set the­ory, but I’m hav­ing trou­ble see­ing why you wouldn’t be able to do that. I also think that sec­ond-or­der set the­o­ries are sup­posed to be finitely ax­iom­a­ti­z­able, though I could be wrong about this. But then I’m not quite sure why we don’t get into Tarskian trou­ble.

Let ST be the finite ax­iom­a­ti­za­tion of a sec­ond-or­der set the­ory. Then in sec­ond-or­der logic, why doesn’t the sen­tence (ST->”All sets: Set en­tails S”) form a truth pred­i­cate for S? My guess is that there’s just no the­o­rem which says, “X is true (en­tailed by the math­e­mat­i­cal uni­ver­sal set /​ model we’re cur­rently op­er­at­ing in­side) iff ‘X’ is en­tailed by all sets (in­side the cur­rent model)”.

If this is so, then it looks to me like for any given set of ax­ioms cor­re­spond­ing to a sec­ond-or­der set the­ory, sec­ond-or­der logic can rep­re­sent the idea of a de­vice that out­puts sen­tences which are se­man­ti­cally en­tailed by ev­ery in­di­vi­d­ual set in­side that the­ory. So the new an­swer would be that you are wel­come to hy­poth­e­size that a de­vice prints out truths of sec­ond-or­der logic, for any given back­ground sec­ond-or­der set the­ory which pro­vides a uni­verse of mod­els against which those sen­tences are judged uni­ver­sally se­man­ti­cally true.

In which case the in­definite ex­ten­si­bil­ity gets packed into the choice of set the­ory that we think this de­vice is judg­ing sec­ond-or­der val­idity for, if I haven’t dropped the bucket some­where along the way.

• Then in sec­ond-or­der logic, why doesn’t the sen­tence (ST->”All sets: Set en­tails S”) form a truth pred­i­cate for S?

Let S=”ST → false”. Then S is false in sec­ond-or­der logic (as­sum­ing ST is con­sis­tent), but (ST->”All sets: Set en­tails S”) is true, be­cause ST’s model has to be big­ger than any set (I think it’s usu­ally taken to be the proper class of all sets), so ev­ery set en­tails “Not ST”.

So the new an­swer would be that you are wel­come to hy­poth­e­size that a de­vice prints out truths of sec­ond-or­der logic, for any given back­ground sec­ond-or­der set the­ory which pro­vides a uni­verse of mod­els against which those sen­tences are judged uni­ver­sally se­man­ti­cally true.

On in­put S=”ST → false”, your de­vice prints out “true”, while my de­vice prints out “false”. I still want to be able to hy­poth­e­size my de­vice. :)

• There are to­tally mod­els of ZFC con­tain­ing sets that are mod­els of ZFC. See “Grothendieck uni­verse”. Is there a rea­son why it’d be differ­ent in sec­ond-or­der logic? I don’t think a sec­ond-or­der set the­ory would pin down a unique model, why would it? Un­less you had some ax­iom stat­ing that there were no more or­di­nals past a cer­tain point in which case you might be able to get a unique model. Un­less I’m get­ting this all com­pletely wrong, since I’m over­run­ning my ex­per­tise here.

So in ret­ro­spect I have to mod­ify this for us to some­how sup­pose that the de­vice is op­er­at­ing in a par­tic­u­lar model of a sec­ond-or­der the­ory. And then my de­vice prints out “true” (if it’s in one of the small­est mod­els) or the de­vice prints out “false” (if it’s in a larger model), un­less the de­vice is against the back­ground of an ST with an up­per bound im­pos­ing a unique model, in which case the de­vice does print out “true” for ST → false and from the out­side, we think that this de­vice is about a small col­lec­tion of sets so this re­sult is not sur­pris­ing.

Then the ques­tion is whether it makes sense to imag­ine that the de­vice is about the “largest rele­vant” model of a set the­ory—i.e., for any other similar de­vices, you think no other de­vice will ever re­fer to a larger model than the cur­rent one, nor will any set the­ory suc­cess­fully force a model larger than the cur­rent one—I think that’s the point at which things get se­man­ti­cally in­ter­est­ing again.

• Is there a rea­son why it’d be differ­ent in sec­ond-or­der logic?

Se­cond-or­der set the­ory is be­yond my ex­per­tise too, but I’m go­ing by this pa­per, which on page 8 says:

We have man­aged to give a for­mal se­man­tics for the sec­ond-or­der lan­guage of set the­ory with­out ex­pand­ing our on­tol­ogy to in­clude classes that are not sets. The ob­vi­ous al­ter­na­tive is to in­voke the ex­is­tence of proper classes. One can then tin­ker with the defi­ni­tion of a stan­dard model so as to al­low for a model with the (proper) class of all sets as its do­main and the class of all or­dered-pairs x, y (for x an el­e­ment of y) as its in­ter­pre­ta­tion func­tion.12 The ex­is­tence of such a model is in fact all it takes to ren­der the truth of a sen­tence of the lan­guage of set the­ory an im­me­di­ate con­se­quence of its val­idity.

So I was tak­ing the “ob­vi­ous al­ter­na­tive” of proper class of all sets to be the stan­dard model for sec­ond or­der set the­ory. I don’t quite un­der­stand the pa­per’s own pro­posed model, but I don’t think it’s a set ei­ther.

• I’m not sure I be­lieve in proper classes and in par­tic­u­lar, I’m not sure there’s a proper class of all sets that could be the model of a sec­ond-or­der the­ory such that you could not de­scribe any set larger than the model, and as for pin­ning down that model us­ing ax­ioms I’m pretty sure you shouldn’t be able to do that. There are analogues of the Lowen­heim-Skolem the­o­rem for suffi­ciently large in­fini­ties in sec­ond-or­der logic, I seem to re­call read­ing.

• 6 Jun 2012 17:56 UTC
5 points

Is com­plex­ity ob­jec­tive?

This one bother me as well. Tur­ing ma­chine ba­sis for com­plex­ity favours cer­tain types of the­o­ries over oth­ers. For ex­am­ple, how many bits of tur­ing ma­chine does it take to pre­dict, say, maxwells equa­tions, which in­volve con­tin­u­ous fields?

(warn­ing: origi­nal “re­search” fol­low­ing)

One temp­ta­tion is to sweep this un­der the rug by say­ing that the com­plex­ity re­quired to talk about con­tin­u­ous math is just a con­stant-sized in­ter­preter that you put on the tur­ing ma­chine, but this can’t get around the fact that some tho­eries won’t need that.

(BTW, is con­tin­u­ous math even ex­actly tur­ing-com­putable?)

If we try to gen­er­al­ize and say “tur­ing ma­chines suck, we’ll use some other lan­guage”, it be­comes clear that all lan­gauges will have some bias. English has a bias for agenty mind-stuff, tur­ing ma­chines have a bias for dis­crete cel­lu­lar au­tomata, some imag­in­able ba­sis lan­guage will have a bias for sim­ple con­tin­u­ous math­e­mat­ics.

I think the ba­sis that is clos­est to cor­rect would be one based on a hy­po­thet­i­cal real-num­ber con­tin­u­ous-mem­ory hy­per­com­puter (for ob­vi­ous rea­sons). This of course im­plies that I have learned this thing, which im­plies some deeper form of in­duc­tion over forms of in­duc­tion, that would make a lan­guage that had only one op­er­a­tor that was just “physics” take ap­pro­pri­ate pe­nal­iza­tion. And then think­ing about that idea re­veals that in­duct­ing over lan­gauges and in­duct­ing over the­o­ries in lan­guages is sus­pi­ciously similar.

That tempts me to say that they should not be seper­ate, but then I still think the gen­eral util­ity of con­tin­u­ous math should not have to be proven for ev­ery the­ory that uses it, which im­plies some shar­ing of com­plex­ity some­how. But that seems like non­sense, SI already han­dles that, I think (by each tho­ery be­ing a com­pletely de­fac­tored pack­age deal). Now I think it would be good to find a new SI that can han­dle fac­tor­ing (for al­gorith­mic rea­sons), but maybe that is a step down the “how do we ac­tu­ally do in­duc­tion” road, which is not what SI is about.

Think­ing over this all again, it seems start­ing with any form of in­duc­tion that sortof works will even­tu­ally come to the right an­swer. The hy­po­thet­i­cal perfect form of in­duc­tion that would be the best a learn­ing ma­chine could start with is just the ac­tual an­swer. Again there’s that du­al­ity (is that the right word) be­tween the­o­ries and in­duc­tion pri­ors.

This looks to me alot like some kind of iter­a­tive im­prove­ment al­gorithm, where you have to start some­where, and where you start is cho­sen from the same do­main as your an­swer (think new­ton’s method, or gra­di­ent de­scent). It seems like even start­ing with some hu­man lan­guage (like en­glish), you would even­tu­ally learn (as we have done) that build­ing a math-in­ter­preter is the thing to do.

Ok, that is the end of my con­fu­sion dump.

My un­proven, badly speci­fied idea that has some­thing to do with in­duc­tion:

Ok, back­ground: All tur­ing-com­plete lan­guages can build in­ter­preters for all oth­ers. This im­plies some traversable lan­guages­pace (which is a sub­set of pro­gramspace) where the dis­tance be­tween two points is the length of the in­ter­preter to get from one lan­guage to the other. We will also want to be able to go back­wards, so that we can go from any point in pro­gramspace to any other (by back­ing up to a good lan­guage, and then go­ing for­ward to new pro­gram). Go­ing back­wards means build­ing this point from some point in lan­guages­pace, rather than build­ing some point in pro­gramspace from this point in lan­guages­pace) Points in pro­gramspace are defined over be­hav­ior, so the same pro­gram will be reach­able from mul­ti­ple lan­guages.

(the lisp-like area gets a lot of traf­fic thru it. (“any suffi­ciently com­plex pro­gram con­tains … half of com­mon lisp”))

Ok, so now we have a pro­gramspace with dis­tances mea­sured in bits. We could just put a prob­a­bil­ity dis­tri­bu­tion over the sub­set of pro­gramspace that makes pre­dic­tions. Solomonof in­duc­tion does this, with ini­tial im­prob­a­bil­ity defined by dis­tance from tur­ing-ma­chine-lan­gauge.

I think there might be some way to use this pro­gramspace idea for in­duc­tion other than just be­ing an­other way to look at SI pro­grams. I’m imag­in­ing prob­a­bil­ity or some­thing flow­ing around in this space up­dat­ing our con­cept of sim­plic­ity so that when we see that new­to­nian me­chan­ics, GR, and quan­tum me­chan­ics are all closely reach­able from some mathy point in lan­guages­pace, we then up­grade the prob­a­bil­ity of math-reach­able tho­eries, with­out even talk­ing about what they pre­dict. I’m also imag­in­ing if we build an AI to do in­duc­tion, we say “here are some in­ter­est­ing points in pro­gramspace (new­ton, maxwell, SR, GR, quan­tum) that have served us well” and let­ting it figure out a dis­tri­bu­tion over pro­gramspace in­stead of “start from tur­ing-ma­chine-lan­guage”.

I have more ideas and I fear I am not com­mu­ni­cat­ing this very well.

I think this in­ter­preter-traversable pro­gramspace is a good con­cept to in­ves­ti­gate for try­ing to make SI more ob­jec­tive. Even if we just stick with some­thing like SI, it seems like an in­ter­est­ing con­cept.

yeah...

• The thing that you’re de­scribing is a lot like find­ing the sta­tion­ary dis­tri­bu­tion of a Markov chain. Not all Markov chains have sta­tion­ary dis­tri­bu­tions and I can’t tell whether this one would, though if it does have one it would be unique. It’s an in­ter­est­ing idea.

I should also note that it is not nec­es­sary to do any­thing like this to pre­form in­duc­tion over mul­ti­ple the­o­ries. In­stead, we can just re­quire a pro­gram that out­puts known the­o­ries in ad­di­tion to the new the­ory. We can ask them to be out­put in any form, such as pre­dic­tions or source code, since con­vert­ing source code to pre­dic­tions is O(1) in pro­gram size. Then, if the differ­ent the­o­ries have more in com­mon, the pro­gram com­put­ing them all will be shorter than one that must seper­ately spec­ify el­e­ments of very differ­ent the­o­ries. A slightly differ­ent im­ple­men­ta­tion of the same gen­eral prin­ci­ple is Sch­mid­hu­ber’s Op­ti­mal Ordered Prob­lem Solver.

• Some con­tin­u­ous math is com­mutable, some is not. There are un­com­putable real num­bers. Com­putable num­ber.

• 6 Jun 2012 2:57 UTC
3 points

This is the sec­ond men­tion of sec­ond-or­der log­i­cal Solomonoff In­duc­tion, but I can’t imag­ine what such a thing would look like.

• It’s Luke and Eliezer’s term, but I guess the idea is similar to the one I had. You take some sec­ond-or­der the­ory, and let each string in the for­mal lan­guage that con­sti­tutes a de­scrip­tion of X con­tribute n^-l to the prob­a­bil­ity of X, where n is the size of the alpha­bet, and l is the length of the string. Use the re­sult­ing dis­tri­bu­tion in place of the uni­ver­sal prior in Solomonoff In­duc­tion.

• Let’s say the the­ory is ZFC. Does the string “0 if the con­tinuum hy­poth­e­sis if true, oth­er­wise 1” con­tribute to the prob­a­bil­ity of 0, or the prob­a­bil­ity of 1?

• (As you most likely know, in the stan­dard in­ter­pre­ta­tion of sec­ond or­der ZFC the con­tinuum hy­poth­e­sis has a definite truth value, but it can’t be proved or dis­proved in any of the de­duc­tive sys­tems that have been pro­posed for sec­ond or­der ZFC.) My sug­ges­tion is that the string “0 if the con­tinuum hy­poth­e­sis if true, oth­er­wise 1” con­tribute to the prob­a­bil­ity of 0 if the con­tinuum hy­poth­e­sis is true, and the prob­a­bil­ity of 1 if the con­tinuum hy­poth­e­sis is false. The prob­lem that we don’t know how to deal with a prob­a­bil­ity dis­tri­bu­tion like this seems similar to the prob­lem of not know­ing how to use the uni­ver­sal prior in the origi­nal Solomonoff In­duc­tion, both of which seem to be in­stances of log­i­cal/​math­e­mat­i­cal un­cer­tainty, and there might be a gen­eral solu­tion to that prob­lem.

The rea­son I think this is plau­si­ble is that for ex­am­ple P=NP may not be prov­able or dis­prov­able in any for­mal de­duc­tive sys­tem that have been pro­posed, and yet any agent would still have to make de­ci­sions that are af­fected by its truth value, such as whether to search for a polyno­mial solu­tion to 3-SAT. If there is some sort of prin­ci­pled way to make those de­ci­sions, per­haps the same meth­ods can be used to make de­ci­sions in­volv­ing the con­tinuum hy­poth­e­sis?

• All uni­ver­sal Tur­ing ma­chines can simu­late each other with log­a­r­ith­mic slow­down. Say­ing that the pa­ram­e­ter means that com­plex­ity “sub­jec­tive” is like say­ing the time com­plex­ity of Quick­sort is “sub­jec­tive” be­cause the al­gorithm doesn’t spec­ify which pro­gram­ming lan­guage to im­ple­ment it in.

• For aliens with a halt­ing or­a­cle:

Sup­pose the aliens have this ma­chine that may or may not be a halt­ing or­a­cle. We give them a few Tur­ing ma­chine pro­grams and they de­cide which ones halt and which ones don’t. Then we run the pro­grams. Sure enough, none of the ones they say run for­ever halt, and some of them they say don’t run for­ever will halt at some point. Sup­pose we re­peat this pro­cess a few times with differ­ent pro­grams.

Now what method should we use to pre­dict the point at which new pro­grams halt? The best strat­egy seems to be to ask the aliens which ones halt, give the non-halt­ing ones a 0 prob­a­bil­ity of halt­ing at ev­ery step, and give the other ones some small nonzero prob­a­bil­ity of halt­ing on ev­ery step. Of course this strat­egy only works if the aliens ac­tu­ally have a halt­ing or­a­cle so it its pre­dic­tions should be lin­early com­bined with a fal­lback strat­egy.

I think that Solomonoff in­duc­tion will find this strat­egy, be­cause the hy­poth­e­sis that the aliens have a true halt­ing or­a­cle is for­mal­iz­able. Here’s how: we learn a func­tion from aliens’ an­swers → dis­tri­bu­tion over when the pro­grams halt. We can use the strat­egy of pre­dict­ing that the ones that the aliens say don’t halt don’t halt, and us­ing some fal­lback mechanism for pre­dict­ing the ones that do halt. This strat­egy is com­putable so Solomonoff in­duc­tion will find it.

For Berry’s para­dox:

This is a prob­lem with ev­ery for­mal­iz­able prob­a­bil­ity dis­tri­bu­tion. You can always define a se­quence that says: see what bit the pre­dic­tor pre­dicts with a higher prob­a­bil­ity, then out­put the op­po­site. Luck­ily Solomonoff in­duc­tion does the best it can by hav­ing the es­ti­mates con­verge to 50%. I don’t see what a use­ful solu­tion to this prob­lem would even look like; it seems best to just treat the un­com­putable se­quence as sub­jec­tively ran­dom.

• If I un­der­stand Solomonoff In­duc­tion cor­rectly, for all n and p the the sum of the prob­a­bil­ities of all the hy­pothe­ses of length n equals the sum of the prob­a­bil­ities of hy­pothe­ses of length p. If this is the case, what nor­mal­iza­tion con­stant could you pos­si­bil­ity use to make all the prob­a­bil­ities sum to one? It seems there are none.

• Use a pre­fix-free en­cod­ing for the hy­pothe­ses. There’s not 2^n hy­pothe­ses of length n: Some of the length-n bit­strings are in­com­plete and you’d need to add more bits in or­der to get a hy­poth­e­sis; oth­ers are ac­tu­ally a length <n hy­poth­e­sis plus some gib­ber­ish on the end.

Then the sum of the prob­a­bil­ities of all pro­grams of all lengths com­bined is 1.0. After ex­clud­ing the pro­grams that don’t halt, the nor­mal­iza­tion con­stant is Chaitin’s Omega.

• Un­for­tu­nately Chaitin’s Omega’s in­com­putable, but even if it wasn’t I don’t see how it would work as a nor­mal­iz­ing con­stant. Chaitin’s Omega is a real num­ber, there is an in­finite num­ber of hy­pothe­ses, and (IIRC) there is no real num­ber r such that r mul­ti­plied by in­finite equals one, so I don’t see how Chaitin’s Omega could pos­si­ble work as a nor­mal­iz­ing con­stant.

• If we as­sign a value of 2^-n to each hy­poth­e­sis (that is, each halt­ing pro­gram) of length n, then the to­tal value summed over all hy­pothe­ses is Chaitin’s Omega.

Thus, if we as­sign a value of 2^-n/​Omega to each hy­poth­e­sis, the to­tal is 1, and this gives us a valid prob­a­bil­ity dis­tri­bu­tion.

• That would as­sign a zero prob­a­bil­ity of hy­pothe­ses that take more than n bits to spec­ify, would it not? That sounds far from op­ti­mal.

• Sorry, I mean that we do this for each value of n. So, given a hy­poth­e­sis H, it is as­signed a prob­a­bil­ity of 2^-length(H)/​Omega.

• I still don’t see how that would make all the hy­pothe­ses sum to 1. Wouldn’t that only make the prob­a­bil­ities of all the hy­pothe­ses of length n sum to 1, and thus make the sum of all hy­pothe­ses sum to > 1? For ex­am­ple, con­sider all the hy­pothe­ses of length 1. As­sum­ing Omega = 1 for sim­plic­ity, there are 2 hy­pothe­ses, each with a prob­a­bil­ity of 2^-1/​1 = 0.5, mak­ing all them sum to 1. There are 4 hy­pothe­ses of length 2, each with a prob­a­bil­ity of 2^-2/​1 = 0.25, mak­ing them sum to 1. Thus, the sum of the prob­a­bil­ities of all hy­pothe­ses of length ⇐ 2 = 2 > 1.

Is Omega do­ing some­thing I don’t un­der­stand? Would all hy­pothe­ses be re­quired to be some set length?

• The hy­pothe­ses are en­coded in a pre­fix-free way: no hy­poth­e­sis is a pre­fix of an­other hy­poth­e­sis.

In par­tic­u­lar, this elimi­nates your ex­am­ple: if both strings of length 1 (“0” and “1“) are valid hy­pothe­ses, then no string of longer length can be a valid hy­poth­e­sis (then we can take Omega=1 and as­sign a prob­a­bil­ity of 12 to each). Or we could have “0”, “10”, and “110” be valid hy­pothe­ses; then we can take Omega = 78 and as­sign prob­a­bil­ities of 47, 27 and 17 to these.

In gen­eral, the pre­fix-free con­di­tion en­sures that the sum over all hy­pothe­ses con­verges, by Kraft’s in­equal­ity, which is the real heavy lifter here; the con­stant is merely there to make the sum over all hy­pothe­ses con­verge to 1 in­stead.

You might imag­ine that hy­pothe­ses are re­quired to be gram­mat­i­cally cor­rect English sen­tences (I guess en­coded in ASCII or some­thing). In that case, no hy­poth­e­sis is a pre­fix of an­other, be­cause each hy­poth­e­sis ends with a pe­riod.

• Right; I for­got that it used a pre­fix-free en­cod­ing. Apolo­gies if the an­swer to this is painfully ob­vi­ous, but does hav­ing a pre­fix-free en­cod­ing en­tail that there is a finite num­ber of pos­si­ble hy­pothe­ses?

• It doesn’t: for ex­am­ple, the set of strings 0, 10, 110, 1110, 11110, 111110, … is in­finite and pre­fix-free. So is the set of C-style (null-ter­mi­nated) strings.

• I see. Does the method of nor­mal­iza­tion you gave work even when there is an in­finite num­ber of hy­pothe­ses?

• It does. The in­finite sum will always con­verge, so by nor­mal­iz­ing we can make it con­verge to 1.

Take gjm’s ap­proach to ex­plain­ing the nor­mal­iza­tion, in which the ini­tial weight of 2^-length as­signed to a hy­poth­e­sis is the prob­a­bil­ity that you ob­tain that hy­poth­e­sis by choos­ing bits at ran­dom. Then the nor­mal­iza­tion is just a con­di­tional prob­a­bil­ity: you di­vide by the prob­a­bil­ity that, when choos­ing bits at ran­dom, you do even­tu­ally end up hit­ting a hy­poth­e­sis.

• Re­mark: these are spe­cial cases of the schema “all strings that con­tain ex­actly one in­stance of pat­tern X in pos­si­ble po­si­tions Y”, where in the first case X is “0“ and Y is “any­where” and in the sec­ond X is “00000000” and Y is “any po­si­tion that’s a mul­ti­ple of 8 bits”.

(Of course there are plenty of other pre­fix-free sets of bit­strings. For in­stance: in­ter­pret blocks of 8 bits as char­ac­ters as in Kindly’s sec­ond ex­am­ple, and say that a valid pro­gram is any­thing that be­gins with a “(”, has prop­erly matched paren­the­ses, and ends with the ”)” match­ing the first “(”. This doesn’t have the con­ve­nient prop­erty that ev­ery bit­string be­gins with a valid pro­gram; for ex­am­ples that do, take the ex­po­nen­tial Golomb cod­ing or the Elias omega cod­ing of nat­u­ral num­bers.

• I think you may be miss­ing two things.

First: the hy­pothe­ses/​pro­grams (re­call: we are rep­re­sent­ing hy­pothe­ses by com­puter pro­grams in some suit­able lan­guage) need to be writ­ten in a “pre­fix-free” rep­re­sen­ta­tion, mean­ing one in which e.g. if 100111 is a valid pro­gram then noth­ing else be­gin­ning 100111 can be. So you don’t get two pro­grams of length 1, four of length 2, etc. If you do this, and if you also ar­range your cod­ing so that ev­ery in­finite string of bits starts with some le­gal pro­gram, then the sum over all le­gal pro­grams of 2^-length is ex­actly 1. (So Pr(pro­gram) = 2^-length(pro­gram) defines a prob­a­bil­ity dis­tri­bu­tion over pro­grams. It’s the same prob­a­bil­ity dis­tri­bu­tion as you get from the fol­low­ing pro­ce­dure: keep gen­er­at­ing ran­dom bits un­til the bits you’ve gen­er­ated form a le­gal pro­gram.)

Se­cond: you can’t just take Ω=1 “for sim­plic­ity”; Ω is defined to be what you get by adding up 2^-length for all pro­grams that halt. (Re­mem­ber that hy­pothe­ses are be­ing rep­re­sented as com­puter pro­grams here.) Equiv­a­lently, if you imag­ine pick­ing a ran­dom well-formed pro­gram by choos­ing ran­dom bits, Ω is the prob­a­bil­ity that the re­sult­ing pro­gram halts. So it’s cer­tainly <1.

[EDITED to fix an ugly but hope­fully not con­fus­ing typo.]

• Con­sider an ar­bi­trary prob­a­bil­ity dis­tri­bu­tion P, and the small­est in­te­ger (or the lex­i­co­graph­i­cally least ob­ject) x such that P(x) < 1/​3^^^3 (in Knuth’s up-ar­row no­ta­tion). Since x has a short de­scrip­tion, a uni­ver­sal dis­tri­bu­tion shouldn’t as­sign it such a low prob­a­bil­ity, but P does, so P can’t be a uni­ver­sal dis­tri­bu­tion.

The de­scrip­tion of x has to in­clude the de­scrip­tion of P, and that has to be com­putable if a uni­ver­sal dis­tri­bu­tion is go­ing to as­sign pos­i­tive prob­a­bil­ity to x.

If P has a short com­putable de­scrip­tion, then yes, you can con­clude that P is not a uni­ver­sal dis­tri­bu­tion. Univer­sal dis­tri­bu­tions are not com­putable.

If the short­est com­putable de­scrip­tion of P is long, then you can’t con­clude from this ar­gu­ment that P is not a uni­ver­sal dis­tri­bu­tion, but I sus­pect that it still can’t be a uni­ver­sal dis­tri­bu­tion, since P is com­putable.

If there is no com­putable de­scrip­tion of P, then we don’t know that there is a com­putable de­scrip­tion of x, so you have no con­tra­dic­tion to start with.

• Sup­pose we define a gen­er­al­ized ver­sion of Solomonoff In­duc­tion based on some sec­ond-or­der logic. The truth pred­i­cate for this logic can’t be defined within the logic and there­fore a de­vice that can de­cide the truth value of ar­bi­trary state­ments in this log­i­cal has no finite de­scrip­tion within this logic. If an alien claimed to have such a de­vice, this gen­er­al­ized Solomonoff in­duc­tion would as­sign the hy­poth­e­sis that they’re tel­ling the truth zero prob­a­bil­ity, whereas we would as­sign it some small but pos­i­tive probability

Ac­tu­ally, what Tarski seems to show is that for any lan­guage for de­scribing any set of uni­verses, there just is no lan­guage rep­re­sentable in­side those uni­verses for rep­re­sent­ing ar­bi­trary state­ments, with truth val­ues, about “ev­ery­thing” in­clud­ing the lan­guage and the state­ments in it. If you try to in­vent such a lan­guage, it will end up in­con­sis­tent—not at the point where it tries to cor­rectly as­sign truth, but at the point where it can rep­re­sent truth, due to analogues of “This state­ment is false.” It isn’t need­ful to as­sign 0 or 1, in par­tic­u­lar, to this state­ment; the mo­ment you rep­re­sent it, you can prove an in­con­sis­tency. But is this re­ally proper to blame on Solomonoff in­duc­tion?

• But is this re­ally proper to blame on Solomonoff in­duc­tion?

I would like to have a method of in­duc­tion that for any for­mal lan­guage, as­signs a non-zero prior to the ex­is­tence of a de­vice that can enu­mer­ate or de­cide all true sen­tences in that lan­guage, or al­ter­na­tively an ex­pla­na­tion based on rea­son­able prin­ci­ples for which such de­vices should have zero prob­a­bil­ities. Right now we do not have ei­ther, and your re­search pro­gram for im­prov­ing SI (i.e., to base it on sec­ond-or­der logic) will not give us ei­ther even if it’s suc­cess­ful. So while I’m not sure it makes sense to say I “blame” Solomonoff in­duc­tion (what could that mean?), you could say that I’m not satis­fied with ei­ther the sta­tus quo or any im­prove­ments to it that we can cur­rently fore­see.

• Give me a set of for­mal lan­guages over which you can say the phrase “for any for­mal lan­guage”, and the truth pred­i­cate for the union of the set won’t be in any lan­guage in the set. I’m still try­ing to un­der­stand how to deal with this in­side AI, but I’m not sure that blam­ing it on sec­ond-or­der log­i­cal in­duc­tion is putting the blame in the right place.

• Again, I’m not sure what you mean by “blame” here. If you’re say­ing that Tarski’s re­sult rep­re­sents a prob­lem that af­fects more than just at­tempts to gen­er­al­ize Solomonoff in­duc­tion, then I agree.

BTW, while I have your at­ten­tion, what’s your eval­u­a­tion of Paul Chris­ti­ano’s FAI de­sign idea, which sort of tries to punt as many philo­soph­i­cal prob­lems as pos­si­ble (in­clud­ing this one)? I no­ticed that you didn’t com­ment in that dis­cus­sion.

• [ ]
[deleted]
• A kilo­bit of im­prob­a­bil­ity re­quires only a kilo­bit of data to offset it, which isn’t very crack­pot at all. Prov­ing min­i­mum length is im­pos­si­ble, but prov­ing an up­per bound on length is very easy, and that proves a lower bound on prob­a­bil­ity.

• [ ]
[deleted]
• I take your point re: length vs speed. The the­o­rem that I think jus­tifies call­ing Kol­mogorov Com­plex­ity ob­jec­tive is this:

“If K1 and K2 are the com­plex­ity func­tions rel­a­tive to de­scrip­tion lan­guages L1 and L2, then there is a con­stant c (which de­pends only on the lan­guages L1 and L2) such that |K1(s) - K2(s)| ⇐ c for all strings s.”

(To see why this is true, note that you can write a com­piler for L2 in L1 and vice versa)

I don’t see why code mod­el­ling sym­met­ric laws should be longer than code mod­el­ling asym­met­ric laws (I’d ex­pect the re­verse; more sym­me­tries means more ways to com­press.) Nor why 3 spa­tial di­men­sions (or 10 if you ask string the­o­rists) is the min­i­mum num­ber of spa­tial di­men­sions com­pat­i­ble with in­tel­li­gent life.

The whole point of Solomonoff in­duc­tion is that the pri­ors of a the­ory are not ar­bi­trary. They are de­ter­mined by the com­plex­ity of the the­ory, then you use Bayes rule on all the­o­ries to do in­duc­tion.

• How can we ap­ply Solomonoff when our in­puts are not sym­bol strings?

In­puts can always be rep­re­sented by bit se­quences. Qualia are an ir­rele­vance. Q.E.D.

• Doesn’t the “ir­rele­vance” de­pend on an as­sump­tion about how our minds work? If we have some ana­log com­pu­ta­tion go­ing on, the bit se­quences might be a mere ap­prox­i­ma­tion to our ac­tual in­duc­tive rea­son­ing. Of course, once we put our con­clu­sions into lan­guage, those con­clu­sions can be digi­tized—but that might hap­pen at a rel­a­tively late stage of the rea­son­ing game.

• Ana­log com­pu­ta­tion is an­other ir­rele­vance—ana­log sys­tems can always be ap­prox­i­mated ar­bi­trar­ily closely by dis­crete sys­tems. Wit­ness the on­go­ing di­gial rev­olu­tion.

• Sure, if you are de­sign­ing a sys­tem from the ground up. I’ve never needed more than 12 bits of pre­ci­sion for any real en­g­ineer­ing pur­pose, and I wouldn’t pay a sin­gle dol­lar for more. But Wei_Dai’s ques­tion was about us—or at least reads liter­ally that way. Maybe I took it too liter­ally.

• There’s no cred­ible ev­i­dence that we aren’t digi­tal. E.g. see digi­tal physics. If no­body can tell the differ­ence, the is­sue is prob­a­bly in the realm of vac­u­ous philos­o­phy..

• Here’s an idea for a mod­ifi­ca­tion of Solomonoff in­duc­tion that no longer has a sub­jec­tively-cho­sen ma­chine to en­code hy­pothe­ses in. One could in­stead sim­ply con­sider how many bits it would take to en­code a solu­tion on all uni­ver­sal Tur­ing ma­chines, and mak­ing each hy­pothe­ses’ prior equal to the av­er­age of its prior on each ma­chine. This makes some­what in­tu­itive sense, as one doesn’t know which “ma­chine” the uni­verse in “pro­grammed” on, so it’s best to just as­sume a ran­dom ma­chine. Un­less I’m mis­taken, there’s an in­finite num­ber of uni­ver­sal Tur­ing ma­chines, but I think al­gorithms could still ap­prox­i­mate the in­duc­tion.

• Is there any jus­tifi­ca­tion that Solomonoff In­duc­tion is ac­cu­rate, other than in­tu­ition?

• of course not

• “Of course” im­plies that the an­swer is ob­vi­ous. Why is it ob­vi­ous?

• Dunno what User­name was think­ing, but here’s the an­swer I had in mind: “Why is it ob­vi­ous? Be­cause the Prob­lem of In­duc­tion has not yet been solved.”

• Sup­pose we define a gen­er­al­ized ver­sion of Solomonoff In­duc­tion based on some sec­ond-or­der logic. The truth pred­i­cate for this logic can’t be defined within the logic and there­fore a de­vice that can de­cide the truth value of ar­bi­trary state­ments in this log­i­cal has no finite de­scrip­tion within this logic. If an alien claimed to have such a de­vice, this gen­er­al­ized Solomonoff in­duc­tion would as­sign the hy­poth­e­sis that they’re tel­ling the truth zero prob­a­bil­ity, whereas we would as­sign it some small but pos­i­tive prob­a­bil­ity.

I’m not sure I un­der­stand you cor­rectly, but there are two im­me­di­ate prob­lems with this:

• If the goal is to figure out how use­ful Solomonoff in­duc­tion is, then “a gen­er­al­ized ver­sion of Solomonoff In­duc­tion based on some sec­ond-or­der logic” is not rele­vant. We don’t need ran­dom gen­er­al­iza­tions of Solomonoff in­duc­tion to work in or­der to de­cide whether Solomonoff in­duc­tion works. I think this is re­pairable, see be­low.

• Whether the alien has a de­vice that does such-and-such is not a prop­erty of the world, so Solomonoff in­duc­tion does not as­sign a prob­a­bil­ity to it. At any given time, all you have ob­served is the be­hav­ior of the de­vice for some finite past, and per­haps what the in­side of the de­vice looks like, if you get to see. Any finite amount of past ob­ser­va­tions will be as­signed pos­i­tive prob­a­bil­ity by the uni­ver­sal prior so there is never a mo­ment when you en­counter a con­tra­dic­tion.

If I un­der­stand your is­sue right, you can ex­plore the same is­sue us­ing stock Solomonoff in­duc­tion: What hap­pens if an alien shows up with a de­vice that pro­duces some un­com­putable re­sult? The prior prob­a­bil­ity of the pre­sent situ­a­tion will be­come pro­gres­sively smaller as you make more ob­ser­va­tions and asymp­tot­i­cally ap­proach zero. If we as­sume quan­tum me­chan­ics re­ally is non­de­ter­minis­tic, that will be the nor­mal case any­way, so noth­ing spe­cial is hap­pen­ing here.

• What does Solomonoff In­duc­tion ac­tu­ally say?

I be­lieve this one has been closed ages ago by Alan Tur­ing, and prac­ti­cally demon­strated for ap­prox­i­ma­tions by the in­ves­ti­ga­tion into busy beaver func­tion for ex­am­ple. We won’t be able to know BB(10) from God almighty. Ever.

• I’m not sure what you’re try­ing to say here, but if you con­sider this a rel­a­tive weak­ness of Solomonoff In­duc­tion, then I think you’re look­ing at it the wrong way. We will know it as well as we pos­si­bly could given the ev­i­dence available. Hu­mans are sub­ject to the con­straints that Solomonoff In­duc­tion is sub­ject to, and more.

• Solomonoff In­duc­tion is sup­posed to be a for­mal­iza­tion of Oc­cam’s Ra­zor, and it’s con­fus­ing that the for­mal­iza­tion has a free pa­ram­e­ter in the form of a uni­ver­sal Tur­ing ma­chine that is used to define the no­tion of com­plex­ity.

My un­der­stand­ing is that an “ideal” refer­ence UTM would be a uni­ver­sal tur­ing ma­chine with no bias to­wards any ar­bi­trary string, but rigor­ously defin­ing such a ma­chine is an open prob­lem. Based on our ob­ser­va­tion of UTMs, the more ar­bi­trary sim­plifi­ca­tions a Tur­ing Ma­chine makes, the longer its com­piler will have to be on other UTMs. This is called the Short Com­piler As­sump­tion. Since we can’t agree on a sin­gle ideal refer­ence UTM, we in­stead ap­prox­i­mate it by limit­ing our­selves to a class of “nat­u­ral” UTMs which are mu­tu­ally in­ter­pretable within a con­stant. The smaller the con­stant, the less ar­bi­trary sim­plifi­ca­tion the UTMs in the class will tend to make.

This seems to mesh with the se­quences post on Oc­cam’s Ra­zor:

What if you don’t like Tur­ing ma­chines? Then there’s only a con­stant com­plex­ity penalty to de­sign your own Univer­sal Tur­ing Ma­chine that in­ter­prets what­ever code you give it in what­ever pro­gram­ming lan­guage you like.

What I’m con­fused about is this con­stant penalty. Is it just “some con­stant” or is it know­able and small? Is there a spe­cific UTM for which we can always write a short com­piler on any other UTM?

I’m get­ting out of my league here, but I’d guess that there is an up­per bound on how com­plex you can make cer­tain in­struc­tions across all UTMs be­cause UTMs must (a) be finite, and (b) at the low­est level im­ple­ment a min­i­mal set of in­struc­tions, in­clud­ing a func­tion­ally full set of log­i­cal con­nec­tives. So for ex­am­ple, say I take as my “non­bi­ased” UTM a UTM that aside from the el­e­men­tary op­er­a­tions of the ma­chine on its tape, jump in­struc­tions, etc. has only a min­i­mal num­ber of in­struc­tions im­ple­ment­ing a min­i­mally com­plete op­er­a­tor set with less than two con­nec­tives: {NAND} or {NOR}. My un­der­stand­ing is that any­thing that’s a Univer­sal Tur­ing Ma­chine is go­ing to have to it­self have a small num­ber of in­struc­tions that im­ple­ment the ba­sic ma­chine in­struc­tions and a com­plete set of con­nec­tives some­where in its in­struc­tion set, and con­vert­ing be­tween {NAND} or {NOR} and any other com­plete set of con­nec­tives can be done with a triv­ially short en­cod­ing.

• Since we can’t agree on a sin­gle ideal refer­ence UTM, we in­stead ap­prox­i­mate it by limit­ing our­selves to a class of “nat­u­ral” UTMs which are mu­tu­ally in­ter­pretable within a con­stant.

Even if you do that, you’re left with an in­finite num­ber of cliques such that within any given clique the lan­guages can write short in­ter­preters for each other. Pick­ing one of the cliques is just as ar­bi­trary as pick­ing a sin­gle lan­guage to be­gin with. i.e. for any given class of what-we-in­tu­itively-think-of-as-com­pli­cated pro­grams X, you can de­sign a lan­guage that can con­cisely rep­re­sent mem­bers of X and can con­cisely rep­re­sent in­ter­preters with this spe­cial case.

What I’m con­fused about is this con­stant penalty. Is it just “some con­stant” or is it know­able and small?

It’s a func­tion of the lan­guage you’re writ­ing an in­ter­preter of and the lan­guage you’re writ­ing it in. “Con­stant” in that it doesn’t de­pend on the pro­grams you’re go­ing to run in the new lan­guage. i.e. for any given pair of lan­guages there’s a finite amount of dis­agree­ment be­tween those two ver­sions of the Solomonoff prior; but for any given num­ber there are pairs of lan­guages that dis­agree by more than that.

Is there a spe­cific UTM for which we can always write a short com­piler on any other UTM?

No. There’s no law against hav­ing a gi­gabyte-long op­code for NAND, and us­ing all the shorter op­codes for things-we-in­tu­itively-think-of-as-com­pli­cated.

• So is there then a prag­matic as­sump­tion that can be made? Maybe we as­sume that if I pick a tur­ing ma­chine blindly, with­out speci­fi­cally de­sign­ing it for a par­tic­u­lar out­put string, it’s un­likely to be strongly bi­ased to­wards that string.

• What prob­a­bil­ity dis­tri­bu­tion over tur­ing ma­chines do you blindly pick it from? That’s an­other in­stance of the same prob­lem.

Prag­mat­i­cally, if I non-blindly pick some rep­re­sen­ta­tion of tur­ing ma­chines that looks sim­ple to me (e.g. the one Tur­ing used), I don’t re­ally doubt that it’s within a few thou­sand bits of the “right” ver­sion of solomonoff, what­ever that means.

• Prag­mat­i­cally, if I non-blindly pick some rep­re­sen­ta­tion of tur­ing ma­chines that looks sim­ple to me (e.g. the one Tur­ing used), I don’t re­ally doubt that it’s within a few thou­sand bits of the “right” ver­sion of solomonoff, what­ever that means.

Why not?

• Ap­par­ent Un­for­mal­iz­abil­ity of “Ac­tual” Induction

Re: Tarski’s In­defin­abil­ity of Truth—I don’t get this one. Solomonoff In­duc­tion es­ti­mates prob­a­bil­ities of bit­string se­quences—but it’s un­com­putable—and the best we can do is finite ap­prox­i­ma­tions—which of course have limi­ta­tions and will be able to be im­proved upon—just as Tarski’s “In­defin­abil­ity of Truth” says.

Re: Ar­gu­ment via Berry’s Para­dox—but the de­scrip­tion of x in­cludes P—which could be com­plex. So the de­scrip­tion of x is not short—if you are not given P.

• Is com­plex­ity ob­jec­tive?

Kinda. Some no­tions of com­plex­ity are more use­ful than oth­ers. How­ever, con­di­tion your ma­chine on some real world sense data for a while, and any rea­son­able prior soon gets swamped by data—so in prac­tice, this is not a big deal.

• If it was a mat­ter of gath­er­ing enough data we wouldn’t have to talk about Solomonoff in­duc­tion at all.

• I’m not sure what you have in mind by bas­ing Solomonoff in­duc­tion on a sec­ond-or­der logic. Can you give a refer­ence?

Here is what I’d sug­gest as a baseline. Use John Tromp’s Bi­nary Lambda Calcu­lus with some vari­a­tion of Levin Search to ex­plore time/​com­plex­ity bounded pro­grams that out­put some ini­tial se­quence of data.

• Solomonoff In­duc­tion is sup­posed to be a for­mal­iza­tion of Oc­cam’s Ra­zor, and it’s con­fus­ing that the for­mal­iza­tion has a free pa­ram­e­ter in the form of a uni­ver­sal Tur­ing ma­chine that is used to define the no­tion of com­plex­ity. What’s the sig­nifi­cance of the fact that we can’t seem to define a pa­ram­e­ter­less con­cept of com­plex­ity? That com­plex­ity is sub­jec­tive?

Per­haps we could for­mal­ize some ar­gu­ment that for an ar­bi­trar­ily poor defi­ni­tion of com­plex­ity, the er­ror be­comes neg­ligible for the av­er­age suffi­ciently com­plex uni­verse?

• Solomonoff In­duc­tion is defined over sym­bol strings (for ex­am­ple bit strings) but our per­cep­tions are made of “qualia” in­stead of sym­bols. How is Solomonoff In­duc­tion sup­posed to work for us?

Our per­cep­tions have some neu­rolog­i­cal rep­re­sen­ta­tion.

• The in­ter­est­ing bit is that if you could ac­tu­ally use Solomonoff in­duc­tion as prior, I’m pretty sure you’d be an ir­reparable aether be­liever and ‘rel­a­tivity scep­tic’, with con­fi­dence that has so many nines noth­ing would ever con­vince you. That’s be­cause the ab­solute-speed ir­re­versible aether uni­verses like 3d ver­sions of Con­way’s game of life are so much more com­pu­ta­tion­ally com­pact than highly sym­met­ric uni­verse with many space-time sym­me­tries (to the point of time and space in­ter­mix­ing in the equa­tions) and the fully re­versible fun­da­men­tal laws of physics. Say­ing this to you as ap­plied math­e­mat­i­cian with ex­pe­rience ac­tu­ally im­ple­ment­ing laws of physics in soft­ware.

The pri­ors are only a free pa­ram­e­ter if all you care for is not get­ting Dutch-booked on your bets about physics but do not give a damn to have any use­ful the­o­ries.

• That’s be­cause the ab­solute-speed ir­re­versible aether uni­verses like 3d ver­sions of Con­way’s game of life are so much more com­pu­ta­tion­ally com­pact than highly sym­met­ric uni­verse with many space-time sym­me­tries (to the point of time and space in­ter­mix­ing in the equa­tions) and the fully re­versible fun­da­men­tal laws of physics.

Can you ex­plain why this is the case? Also, when you say “aether” uni­verses are more com­pu­ta­tion­ally com­pact than “rel­a­tivity” uni­verses, is this be­fore or af­ter tak­ing into ac­count our ob­ser­va­tions (i.e., are you re­strict­ing your at­ten­tion to uni­verses that fit our ob­ser­va­tions, or not)?

Say­ing this to you as ap­plied math­e­mat­i­cian with ex­pe­rience ac­tu­ally im­ple­ment­ing laws of physics in soft­ware.

Is it pos­si­ble that what you said is true only if we want the laws of physics to run fast on cur­rent com­put­ers? I’m afraid that most soft­ware peo­ple, in­clud­ing my­self, prob­a­bly have bad in­tu­itions about Solomonoff In­duc­tion be­cause we only have ex­pe­rience with a very small sub­set of pos­si­ble com­pu­ta­tions, namely those that can be run eco­nom­i­cally on mod­ern com­put­ers. Per­haps the laws of physics can be im­ple­mented much more com­pactly if we ig­nored effi­ciency?

• Can you ex­plain why this is the case?

As in­tu­ition pump con­sider the re­versible vs ir­re­versible cel­lu­lar au­tomata. If you pick at ran­dom, vast ma­jor­ity will not be re­versible. Ditto for the sym­me­tries. (Keep in mind that in Solomonoff prob­a­bil­ity we feed in­finite ran­dom tape to the ma­chine. It is no Oc­cam’s ra­zor. Ele­gant sim­plest de­ter­minis­tic things can be vastly out­num­bered by in­el­e­gant, even if they are most prob­a­ble. edit: that is to say you can be more likely to pick some­thing asym­met­ric even if any par­tic­u­lar asym­met­ric is less likely than sym­met­ric)

Also, when you say “aether” uni­verses are more com­pu­ta­tion­ally com­pact than “rel­a­tivity” uni­verses, is this be­fore or af­ter tak­ing into ac­count our ob­ser­va­tions (i.e., are you re­strict­ing your at­ten­tion to uni­verses that fit our ob­ser­va­tions, or not)?

There can always be a vast con­spir­acy ex­plain­ing the ob­ser­va­tions… ideally if you could simu­late whole uni­verse (or mul­ti­verse) from big bang to to­day and pick out the data match­ing ob­ser­va­tions or the con­spired ly­ing, then maybe it’d work, but the whole ex­er­cise of do­ing physics is that you are em­bed­ded within uni­verse you are study­ing. edit: and that trick won’t work if the code eats a lot of ran­dom tape.

Is it pos­si­ble that what you said is true only if we want the laws of physics to run fast on cur­rent com­put­ers?

I don’t think re­lax­ing the fast re­quire­ment re­ally helps that much. Con­sider pro­gram­ming Con­way’s game of life in Tur­ing ma­chine. Or vice versa. Or the in­ter­preter for gen­eral TM on the min­i­mal TM. It gets way worse if you want full ro­ta­tional sym­me­try on dis­crete sys­tem.

Of course, maybe one of the small busy beavers is a su­per­in­tel­li­gence that likes to play with var­i­ous rules like that. Then I’d be wrong. Can not rule even this pos­si­bil­ity out. Kol­mogorov/​Solomonoff name drop is awe­some spice for cook­ing proven-untestable propo­si­tions.

One could ar­gue that sec­ond or­der logic could work bet­ter, but this is get­ting way deep into land of untestable propo­si­tions that are even proven untestable, and the ap­pro­pri­ate re­sponse would be high ex­pec­ta­tions asian father pic­ture with “why not third or­der logic?”.

edit: also you hit nail on the head on an is­sue here: i can not be sure that there is no very short way to en­code some­thing. You can ask me if I am sure that busy beaver 6 is not any­thing, and I am not sure! I am not sure it is not the god almighty. The propo­si­tion that there is a sim­ple way is a state­ment of faith that can not be dis­proved any more than ex­is­tence of god. Also, I feel that there has to be scal­ing for the com­pu­ta­tional effi­ciency in the prior. The more effi­cient struc­tures can run more minds in­side. Or con­versely, the less effi­cient struc­tures take more cod­ing to lo­cate minds in­side of them.

• Discrete uni­verses in­tro­duce an un­ob­served pa­ram­e­ter (a fun­da­men­tal length) as well as a preferred frame. Aban­don­ing com­plex num­bers for a countable group in QM would also be very difficult. There is a com­plex­ity the­ory for real-val­ued ob­jects and if Solomonoff in­duc­tion could be adapted to that con­text, con­tinuum the­o­ries ought to be fa­vored.

• Yep. Well, in any case, the point is that the Solomonoff prob­a­bil­ity is not in any way ‘uni­ver­sal prior’, you have to pick the ma­chine, then you can’t ac­tu­ally com­pute any­thing use­ful be­cause its un­com­putable and you’ll never get any­thing non-triv­ial all the way down to min­i­mum length.

If you go for chang­ing the ma­chine, you could use laws of physics as you know them as the ma­chine, too (then you’ll be ir­reparable scep­tic of any new physics). In the con­text of laws of physics its just the em­u­la­tion of one uni­ver­sal com­pu­ta­tion (laws of physics) with an­other (what ever stuff you pick). We are only dis­cussing this be­cause some highly self-im­por­tant peo­ple heard of Solomonoff in­duc­tion and Kol­mogorov com­plex­ity, mod­el­led those with some­thing ill defined and fuzzy (lack­ing not only suffi­cient un­der­stand­ing of those things but the un­der­stand­ing of the con­cepts in which to un­der­stand those, and so on sev­eral lay­ers deep, as is usu­ally the case when you choose some­thing re­ally com­pli­cated with­out hav­ing spent a lot of time study­ing to­wards it) and then used it as uni­ver­sal smart sound­ing word to say (e.g. here or here or here ). Name-drop­ping and noth­ing more.

• Con­sider Pen­rose’s “An­gu­lar mo­men­tum: An ap­proach to com­bi­na­to­rial space-time” (math.ucr.edu/​​home/​​baez/​​pen­rose/​​Pen­rose-An­gu­larMo­men­tum.pdf)

Hence, we have a way of get­ting hold of the con­cept of Eu­clidean an­gle, start­ing from a purely com­bi­na­to­rial scheme

...

The cen­tral idea is that the sys­tem defines the ge­om­e­try. If you like, you can use the con­ven­tional de­scrip­tion to fit the thing into the ‘or­di­nary space-time’ to be­gin with, but then the ge­om­e­try you get out is not nec­es­sar­ily the one you put into it

It seems that eu­clidean space, at least, can be de­rived as a limit­ing case from sim­ple com­bi­na­to­rial prin­ci­ples. It is not at all clear that gen­eral rel­a­tivity does not have kol­mogorov com­plex­ity com­pa­rable to the cel­lu­lar au­tomata of your “aether uni­verses”.

• Kol­mogorov com­plex­ity of GR it­self (text of GR or some­thing) is ir­rele­vant. Kol­mogorov com­plex­ity of uni­verse that has the sym­me­tries of GR and rest of physics, is. Com­bi­na­to­rial prin­ci­ples are nice but it boils down to rep­re­sent­ing state of the uni­verse with cells on tape of lin­ear tur­ing ma­chine.

• What does Solomonoff In­duc­tion ac­tu­ally say about, for ex­am­ple, whether we live in a cre­ator­less uni­verse that runs on physics?

The en­tirely de­pends on your defi­ni­tion of cre­ator. Tra­di­tional cre­ators such as the Chris­tian god could po­ten­tially have enough ex­plana­tory power once prop­erly defined, yet would end up hor­ren­dously com­plex (en­cod­ing an en­tire hu­man-like mind into the prim­i­tive nat­u­ral law) un­less you fur­ther re­duce the con­struc­tion of the cre­ator to a nat­u­ral pro­cess.

Or the Si­mu­la­tion Ar­gu­ment?

Solomonoff In­duc­tion is em­piri­cal: un­less the laws of our uni­verse are bro­ken by the simu­la­tors, Solomonoff In­duc­tion says noth­ing about whether we are in a simu­la­tion. If the simu­la­tors have not in­fluenced our uni­verse but will in the fu­ture, it de­pends en­tirely on whether the simu­la­tor uni­verse is sim­pler than ours. If the simu­la­tors have already in­fluenced our uni­verse it’s more com­pli­cated.

• Peo­ple are perfectly fine with fuzzy ap­prox­i­mate ex­pla­na­tions of phe­nom­ena, like Maxwell’s equa­tions &c. “God­didit” is not that differ­ent. Try­ing to get a full causal ex­pla­na­tion would mean find­ing bits of Omega. In the end, de­ci­sion the­ory is fun­da­men­tal, and episte­molog­i­cal ab­strac­tions like SI are cool but ul­ti­mately ir­rele­vant. This whole “en­cod­ing a hu­man-like mind” thing doesn’t work like you think it does—you can in­ter­pret SI that way and see some cool im­pli­ca­tions, just re­mem­ber it’s a use­less toy model. …Just sayin’.

• “God­didit” is not that differ­ent.

Physics the­o­ries im­port low-com­plex­ity math­e­mat­i­cal mod­els. “God­didit” im­ports com­pli­cated hu­man no­tions of agency. Ap­prox­i­mate ex­pla­na­tions are fine if we can rea­son that their im­plicit com­plex­ity is low rel­a­tive to their ex­plana­tory power (a rel­a­tively eas­ily satis­fied met­ric, af­ter which com­pe­ti­tion be­tween the­o­ries be­comes the key fac­tor).

In Solomonoff In­duc­tion, the­o­ries that don’t ex­plain data must con­tain that data raw.

• Physics the­o­ries im­port low-com­plex­ity math­e­mat­i­cal mod­els. “God­didit” im­ports com­pli­cated hu­man no­tions of agency.

Frankly, I think this idea is at­trac­tive but ul­ti­mately an er­ror. It is in­deed true that to an an­a­lyt­i­cal mind with an in­ter­est in physics, math­e­mat­ics feels a lot less com­plex, in some sense, than in­tu­itive no­tions of agency. But no mat­ter how much physics or psy­chol­ogy you know, you don’t have in­tro­spec­tive ac­cess to the uni­ver­sal prior—maybe the prior priv­ileges math over psy­chol­ogy, or maybe it doesn’t. All we have is our ev­i­dence, of­ten in the form of con­clu­sions drawn from in­tu­itive analy­ses of what hy­pothe­ses have or haven’t tended to bear in­tel­lec­tual or in­stru­men­tal fruit in the past—id est, we’re awfully close to talkin’ ’bout prag­mat­ics and de­ci­sion the­ory here. And yes, math­e­mat­i­cal ex­pla­na­tions have been sur­pris­ingly effec­tive. But if you look at hu­man his­tory, hy­pothe­ses that make use of “com­pli­cated hu­man no­tions of agency” have also been pretty damn effec­tive. It’s not ob­vi­ous what no­tion of com­plex­ity would mas­sively priv­ilege the former over the lat­ter, and at any rate, we have no way of know­ing, be­cause you can’t find the uni­ver­sal prior in your back­yard.

• It is in­deed true that to an an­a­lyt­i­cal mind with an in­ter­est in physics, math­e­mat­ics feels a lot less com­plex,

We have ob­jec­tive ver­ifi­ca­tion of the low com­plex­ity of for­mal­ized math­e­mat­i­cal the­o­ries be­cause we can look at the length of their for­mal de­scrip­tion in say, first-or­der logic.

But no mat­ter how much physics or psy­chol­ogy you know, you don’t have in­tro­spec­tive ac­cess to the uni­ver­sal prior—maybe the prior priv­ileges math over psy­chol­ogy, or maybe it doesn’t.

Are you re­ally sug­gest­ing some model of com­pu­ta­tion based on hu­man ideas might work bet­ter than say, lambda calcu­lus for com­put­ing Kol­mogorov com­plex­ity for Solomonoff In­duc­tion? I’m not sure how to ar­gue with that but I would ap­pre­ci­ate it if you would state it ex­plic­itly.

• We have ob­jec­tive ver­ifi­ca­tion of the low com­plex­ity of for­mal­ized math­e­mat­i­cal the­o­ries be­cause we can look at the length of their for­mal de­scrip­tion in say, first-or­der logic.

Right, and that’ll be im­por­tant if we ever run into aliens that for some rea­son can’t wrap their brains around English, but in­stead can figure out our cat­e­gory the­ory no­ta­tion and so on. Or if we’re try­ing to build an FAI, or col­lab­o­rate with the afore­men­tioned aliens to build FAI.

I’m not sure how to ar­gue with that but I would ap­pre­ci­ate it if you would state it ex­plic­itly.

Apolo­gies, in­fer­en­tial dis­tance, and there’s a few meta-level points that I think are im­por­tant to com­mu­ni­cate. But there’s in­fer­en­tial dis­tance on the meta level too.

• Also keep in mind that al­gorith­mic in­for­ma­tion/​prob­a­bil­ity the­ory is ac­tu­ally quite hard to in­ter­pret cor­rectly—the ba­sic, in­tu­itive way to read mean­ing into the math is not quite the way to do it. cousin_it has a post or two cor­rect­ing some in­tu­itive er­rors of in­ter­pre­ta­tion.

• Alas, none of those are the rele­vant ones I think. I’m ac­tu­ally rather busy vis­it­ing home, so I can only jus­tify cer­tain com­ments to my­self, but I hope some­one pro­vides the links.

For what it’s worth, I’m a lit­tle skep­ti­cal of luke­prog’s un­der­stand­ing of SI—no offense to him meant, it’s just I so hap­pen to be­lieve he made a rather big er­ror when in­ter­pret­ing the math. On the other hand, cousin_it seems to be re­ally on the ball here. Those are just my im­pres­sions; I’m a pre­tend philoso­pher, not a comp­sci dude. At any rate I think it’d be just dandy for cousin_it to check Luke’s posts and share his im­pres­sion or cri­tiques.

• Here’s one I was think­ing of:

(If I re­call, Nesov’s com­ment clearly demon­strates the im­por­tant point.)

• That post seems to mix to­gether the con­cept of a prior with the con­cept of ex­pe­rience.

• we can adopt the gen­eral rule that men­tion­ing K-com­plex­ity in a dis­cus­sion of physics is always a sign of con­fu­sion :-)

Men­tion­ing it any­where ex­cept al­gorith­mic in­for­ma­tion the­ory is a sign of con­fu­sion. This in­cludes the­ol­ogy and para­psy­chol­ogy. Use just Bayes or, if you want to be all fancy, up­date­less-like de­ci­sion the­o­ries. I love al­gorith­mic prob­a­bil­ity to death but it’s just not some­thing you should use ca­su­ally. Too many pit­falls.

• Use just Bayes

Bayes re­quires a prior.

• No one should ever need to dis­cuss “pri­ors”. Fo­cus on the like­li­hood ra­tio.

• ...but that’s like com­par­ing ap­ples and cheese!

• Peo­ple are perfectly fine with fuzzy ap­prox­i­mate ex­pla­na­tions of phe­nom­ena, like Maxwell’s equa­tions &c.

Ap­prox­i­mate ex­pla­na­tions have some pre­dic­tive power.

“God­didit” is not that differ­ent.

What ex­pe­rience do you ex­pect if “God­didit”, as op­posed to if “God­did­nt­doit”?

(Skele­tons of an­gels ver­sus skele­tons of dinosaurs? Peo­ple with su­per­nat­u­ral pow­ers ver­sus peo­ple work­ing with su­per­sti­tion? Benev­olent uni­verse ver­sus in­differ­ent uni­verse?)

• If in your heart you be­lieve you already know, or if in your heart you do not wish to know, then your ques­tion­ing will be pur­pose­less and your skills with­out di­rec­tion.

—Twelve Virtues of Rationality

It’s just, I’m hav­ing an amaz­ing time back home, and my time is limited. I don’t know your goals, but you might want to try harder to sig­nal that you’re re­ally cu­ri­ous and not just ask­ing ques­tions that you think are rhetor­i­cal. When you refer­ence com­mon knowl­edge ’round these parts, like Eliezer’s posts, you should ex­pect that the other per­son is already aware of that knowl­edge, and that they have real, sub­stan­tive rea­sons to think that what they said is not en­tirely re­futed by the con­tents of said com­mon knowl­edge.

Of course, ask­ing rhetor­i­cal ques­tions is a perfectly de­cent way to make an ar­gu­ment. It’s just that ar­gu­ments in that sense aren’t quite what’s called for in situ­a­tions like these, I think. But that might just be a differ­ence in our epistemic styles, es­pe­cially if you’re Slavic. (Gasp, racism! ;P )

• When you refer­ence com­mon knowl­edge ’round these parts, like Eliezer’s posts, you should ex­pect that the other per­son is already aware of that knowledge

Good point.

Also good point about time be­ing limited, so...

If you’d some­day later feel like writ­ing a LW ar­ti­cle about similar­i­ties be­tween “God­didit” and Maxwell’s equa­tions, or some­thing like that, I will read it.

• 24 Aug 2012 8:05 UTC
−2 points

Solomonoff In­duc­tion is sup­posed to be a for­mal­iza­tion of Oc­cam’s Ra­zor, and it’s con­fus­ing that the for­mal­iza­tion has a free pa­ram­e­ter in the form of a uni­ver­sal Tur­ing ma­chine that is used to define the no­tion of com­plex­ity. What’s the sig­nifi­cance of the fact that we can’t seem to define a pa­ram­e­ter­less con­cept of com­plex­ity? That com­plex­ity is sub­jec­tive?

In the Kol­mogorov-Chaitin Min­i­mum de­scrip­tion length ap­proach, the sub­ject must pick a Tur­ing ma­chine whose op­er­a­tions de­scribe the ba­sic op­er­a­tions be­lieved to rep­re­sent ‘sim­plic­ity’ by the sub­ject.

Kant pos­tu­lated that all knowl­edge re­lies on syn­thetic, a pri­ori laws of na­ture, like causal­ity and sub­stance. The prob­lem, then, is how this is pos­si­ble. Kant’s solu­tion was to rea­son that the sub­ject must sup­ply laws that make ex­pe­rience of ob­jects pos­si­ble, and that these laws are the syn­thetic, a pri­ori laws of na­ture that we know ap­ply to all ob­jects be­fore we ex­pe­rience them.

Kants apri­ori cat­e­gories could be equiv­a­lent to the ‘ba­sic op­er­a­tions’ of the afore­men­tioned Tur­ing ma­chine. In the lan­guage of com­puter sci­ence, this would be equiv­a­lent to spec­i­fy­ing an up­per on­tol­ogy defin­ing all the fun­da­men­tal on­tolog­i­cal pri­ma­tives in the do­main ‘re­al­ity’.