# Occam’s Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann

[Epistemic Sta­tus: My in­side view feels con­fi­dent, but I’ve only dis­cussed this with one other per­son so far, so I won’t be sur­prised if it turns out to be con­fused.]

Arm­strong and Min­der­mann (A&M) ar­gue “that even with a rea­son­able sim­plic­ity prior/​Oc­cam’s ra­zor on the set of de­com­po­si­tions, we can­not dis­t­in­guish be­tween the true de­com­po­si­tion and oth­ers that lead to high re­gret. To ad­dress this, we need sim­ple ‘nor­ma­tive’ as­sump­tions, which can­not be de­duced ex­clu­sively from ob­ser­va­tions.”

I ex­plain why I think their ar­gu­ment is faulty, con­clud­ing that maybe Oc­cam’s Ra­zor is suffi­cient to do the job af­ter all.

In what fol­lows I as­sume the reader is fa­mil­iar with the pa­per already or at least with the con­cepts within it.

## Brief sum­mary of A&M’s ar­gu­ment:

(This is merely a brief sketch of A&M’s ar­gu­ment; I’ll en­gage with it in more de­tail be­low. For the full story, read their pa­per.)

Take a hu­man policy pi = P(R) that we are try­ing to rep­re­sent in the plan­ner-re­ward for­mal­ism. R is the hu­man’s re­ward func­tion, which en­codes their de­sires/​prefer­ences/​val­ues/​goals. P() is the hu­man’s plan­ner func­tion, which en­codes how they take their ex­pe­riences as in­put and try to choose out­puts that achieve their re­ward. Pi, then, en­codes the over­all be­hav­ior of the hu­man in ques­tion.

Step 1: In any rea­son­able lan­guage, for any plau­si­ble policy, you can con­struct “de­gen­er­ate” plan­ner-re­ward pairs that are al­most as sim­ple as the sim­plest pos­si­ble way to gen­er­ate the policy, yet yield high re­gret (i.e. have a re­ward com­po­nent which is very differ­ent from the “true”/​”In­tended” one.)

• Ex­am­ple: The plan­ner de­on­tolog­i­cally fol­lows the policy, de­spite a bud­dha-like empty util­ity function

• Ex­am­ple: The plan­ner greed­ily max­i­mizes the re­ward func­tion “obe­di­ence-to-the-policy.”

• Ex­am­ple: Dou­ble-negated ver­sion of ex­am­ple 2.

It’s easy to see that these ex­am­ples, be­ing con­structed from the policy, are at most slightly more com­plex than the sim­plest pos­si­ble way to gen­er­ate the policy, since they could make use of that way.

Step 2: The “in­tended” plan­ner-re­ward pair—the one that hu­mans would judge to be a rea­son­able de­com­po­si­tion of the hu­man policy in ques­tion—is likely to be sig­nifi­cantly more com­plex than the sim­plest pos­si­ble plan­ner-re­ward pair.

• Ar­gu­ment: It’s re­ally com­pli­cated.

• Ar­gu­ment: The pair con­tains more in­for­ma­tion than the policy, so it should be more com­pli­cated.

• Ar­gu­ment: Philoso­phers and economists have been try­ing for years and haven’t suc­ceeded yet.

Con­clu­sion: If we use Oc­cam’s Ra­zor alone to find plan­ner-re­ward pairs that fit a par­tic­u­lar hu­man’s be­hav­ior, we’ll set­tle on one of the de­gen­er­ate ones (or some­thing else en­tirely) rather than a rea­son­able one. This could be very dan­ger­ous if we are build­ing an AI to max­i­mize the re­ward.

## Me­thinks the ar­gu­ment proves too much:

My first point is that A&M’s ar­gu­ment prob­a­bly works just as well for other uses of Oc­cam’s Ra­zor. In par­tic­u­lar it works just as well for the canon­i­cal use: find­ing the Laws and Ini­tial Con­di­tions that de­scribe our uni­verse!

Take a se­quence of events we are try­ing to pre­dict/​rep­re­sent with the lawlike-uni­verse for­mal­ism, which posits C (the ini­tial con­di­tions) and then L() the dy­nam­i­cal laws, a func­tion that takes ini­tial con­di­tions and ex­trap­o­lates ev­ery­thing else from them. L(C) = E, the se­quence of events/​con­di­tions/​world-states we are try­ing to pre­dict/​rep­re­sent.

Step 1: In any rea­son­able lan­guage, for any plau­si­ble se­quence of events, we can con­struct “de­gen­er­ate” ini­tial con­di­tion + laws pairs that are al­most as sim­ple as the sim­plest pair.

• Ex­am­ple: The ini­tial con­di­tions are an empty void, but the laws say “And then the se­quence of events that hap­pens is E”

• Ex­am­ple: The ini­tial con­di­tions are sim­ply E, and L() doesn’t do any­thing.

It’s easy to see that these ex­am­ples, be­ing con­structed from E, are at most slightly more com­plex than the sim­plest pos­si­ble pair, since they could use the sim­plest pair to gen­er­ate E.

Step 2: The “in­tended” ini­tial con­di­tion+law pair is likely to be sig­nifi­cantly more com­plex than the sim­plest pair.

• Ar­gu­ment: It’s re­ally com­pli­cated.

• Ar­gu­ment: The pair con­tains more in­for­ma­tion than the se­quence of events, so it should be more com­pli­cated.

• Ar­gu­ment: Physi­cists have been try­ing for years and haven’t suc­ceeded yet.

Con­clu­sion: If we use Oc­cam’s Ra­zor alone to find law-con­di­tion pairs that fit all the world’s events, we’ll set­tle on one of the de­gen­er­ate ones (or some­thing else en­tirely) rather than a rea­son­able one. This could be very dan­ger­ous if we are e.g. build­ing an AI to do sci­ence for us and an­swer coun­ter­fac­tual ques­tions like “If we had posted the nu­clear launch codes on the In­ter­net, would any nukes have been launched?”

This con­clu­sion may ac­tu­ally be true, but it’s a pretty con­tro­ver­sial claim and I pre­dict most philoso­phers of sci­ence wouldn’t be im­pressed by this ar­gu­ment for it—even the ones who agree with the con­clu­sion.

## Ob­ject­ing to the three ar­gu­ments for Step 2

Con­sider the fol­low­ing hy­poth­e­sis, which is ba­si­cally equiv­a­lent to the claim A&M are try­ing to dis­prove:

Oc­cam Suffi­ciency Hy­poth­e­sis: The “In­tended” pair hap­pens to be the sim­plest way to gen­er­ate the policy.

No­tice that ev­ery­thing in Step 1 is con­sis­tent with this hy­poth­e­sis. The first de­gen­er­ate pairs are con­structed from the policy, so they are more com­pli­cated than the sim­plest way to gen­er­ate it, so if that way is via the in­tended pair, they are more com­pli­cated (albeit only slightly) than the in­tended pair.

Next, no­tice that the three ar­gu­ments in sup­port of Step 2 don’t re­ally hurt this hy­poth­e­sis:

Re: first ar­gu­ment: The in­tended pair can be both very com­plex and the sim­plest way to gen­er­ate the policy; no con­tra­dic­tion there. In­deed that’s not even sur­pris­ing: since the policy is gen­er­ated by a mas­sive messy neu­ral net in an ex­tremely di­verse en­vi­ron­ment, we should ex­pect it to be com­plex. What mat­ters for our pur­poses is not how com­plex the in­tended pair is, but rather how com­plex it is rel­a­tive to the sim­plest pos­si­ble way to gen­er­ate the policy. A&M need to ar­gue that the sim­plest pos­si­ble way to gen­er­ate the policy is sim­pler than the in­tended pair; ar­gu­ing that the in­tended pair is com­plex is at best only half the ar­gu­ment.

Com­pare to the case of physics: Sure, the laws of physics are com­plex. They prob­a­bly take at least a page of code to write up. And that’s as­pira­tional; we haven’t even got to that point yet. But that doesn’t mean Oc­cam’s Ra­zor is in­suffi­cient to find the laws of physics.

Re: sec­ond ar­gu­ment: The in­fer­ence from “This pair con­tains more in­for­ma­tion than the policy” to “this pair is more com­plex than the policy” is fal­la­cious. Of course the in­tended pair con­tains more in­for­ma­tion than the policy! All ways of gen­er­at­ing the policy con­tain more in­for­ma­tion than it. This is be­cause there are many ways (e.g. plan­ner-re­ward pairs) to get any given policy, and thus spec­i­fy­ing any par­tic­u­lar way is giv­ing you strictly more in­for­ma­tion than sim­ply spec­i­fy­ing the policy.

Com­pare to the case of physics: Even once we’ve been given the com­plete his­tory of the world (or a com­plete his­tory of some ar­bi­trar­ily large set of ex­per­i­ment-events) there will still be ad­di­tional things left to spec­ify about what the laws and ini­tial con­di­tions truly are. Do the laws con­tain a dou­ble nega­tion in them, for ex­am­ple? Do they have some weird clause that cre­ates in­finite en­ergy but only when a cer­tain ex­tremely rare in­ter­ac­tion oc­curs that never in fact oc­curs? What lan­guage are the laws writ­ten in, any­way? And what about the ini­tial con­di­tions? Lots of things left to spec­ify that aren’t de­ter­mined by the com­plete his­tory of the world. Yet this does not mean that the Laws + Ini­tial Con­di­tions are more com­plex than the com­plete his­tory of the world, and it cer­tainly doesn’t mean we’ll be led astray if we be­lieve in the Laws+Con­di­tions pair that is sim­plest.

Re: third ar­gu­ment: Yes, peo­ple have been try­ing to find plan­ner-re­ward pairs to ex­plain hu­man be­hav­ior for many years, and yes, no one has man­aged to build a sim­ple al­gorithm to do it yet. In­stead we rely on all sorts of im­plicit and in­tu­itive heuris­tics, and we still don’t suc­ceed fully. But all of this can be said about Physics too. It’s not like physi­cists are liter­ally fol­low­ing the Oc­cam’s Ra­zor al­gorithm—iter­at­ing through all pos­si­ble Law+Con­di­tion pairs in or­der from sim­plest to most com­plex and check­ing each one to see if it out­puts a uni­verse con­sis­tent with all our ob­ser­va­tions. And more­over, physi­cists haven’t suc­ceeded fully ei­ther. Nev­er­the­less, many of us are still con­fi­dent that Oc­cam’s Ra­zor is in prin­ci­ple suffi­cient: If we were to fol­low the al­gorithm ex­actly, with enough data and com­pute, we would even­tu­ally set­tle on a Law+Con­di­tion pair that ac­cu­rately de­scribes re­al­ity, and it would be the true pair. Again, maybe we are wrong about that, but the ar­gu­ments A&M have given so far aren’t con­vinc­ing.

## Conclusion

Per­haps Oc­cam’s Ra­zor is in­suffi­cient af­ter all. (In­deed I sus­pect as much, for rea­sons I’ll sketch in the ap­pendix) But as far as I can tell, A&M’s ar­gu­ments are at best very weak ev­i­dence against the suffi­ciency of Oc­cam’s Ra­zor for in­fer­ring hu­man prefer­ences, and more­over they work pretty much just as well against the canon­i­cal use of Oc­cam’s Ra­zor too.

This is a bold claim, so I won’t be sur­prised if it turns out I was con­fused. I look for­ward to hear­ing peo­ple’s feed­back. Thanks in ad­vance! And thanks es­pe­cially to Arm­strong and Min­der­mann if they take the time to re­ply.

## Ap­pendix: So, is Oc­cam’s Ra­zor suffi­cient or not?

--A pri­ori, we should ex­pect some­thing more like a speed prior to be ap­pro­pri­ate for iden­ti­fy­ing the mechanisms of a finite mind, rather than a pure com­plex­ity prior.

--Sure enough, we can think of sce­nar­ios in which e.g. a de­ter­minis­tic uni­verse with some­what sim­ple laws de­vel­ops con­se­quen­tial­ists who run mas­sive simu­la­tions in­clud­ing of our uni­verse and then write down Daniel’s policy in flam­ing let­ters some­where, such that the al­gorithm “Run this de­ter­minis­tic uni­verse un­til you find big flam­ing let­ters, then read out that policy” be­comes a very sim­ple way to gen­er­ate Daniel’s policy. (This is ba­si­cally just the “Univer­sal Prior is Mal­ign” idea ap­plied in a new way.)

--So yeah, pure com­plex­ity prior is prob­a­bly not good. But maybe a speed prior would work, or some­thing like it. Or maybe not. I don’t know.

--One case that seems use­ful to me: Sup­pose we are con­sid­er­ing two ex­pla­na­tions of some­one’s be­hav­ior: (A) They de­sire the well-be­ing of the poor, but [in­sert epicy­cles here to ex­plain why they aren’t donat­ing much, are donat­ing con­spicu­ously, are donat­ing in­effec­tively] and (B) They de­sire their peers (and their selves) to be­lieve that they de­sire the well-be­ing of the poor. Thanks to the epicy­cles in (A), both the­o­ries fit the data equally well. But the­ory B is much more sim­ple. Do we con­clude that this per­son re­ally does de­sire the well-be­ing of the poor, or not? If we think that even though (A) is more com­plex it is also more ac­cu­rate, then yeah it seems like Oc­cam’s Ra­zor is in­suffi­cient to in­fer hu­man prefer­ences. But if we in­stead think “Yeah, this per­son just re­ally doesn’t care, and the proof is how much sim­pler B is than A” then it seems we re­ally are us­ing some­thing like Oc­cam’s Ra­zor to in­fer hu­man prefer­ences. Of course, this is just one case, so the only way it could prove any­thing is as a coun­terex­am­ple. To me it doesn’t seem like a coun­terex­am­ple to Oc­cam’s suffi­ciency, but I could per­haps be con­vinced to change my mind about that.

--Also, I’m pretty sure that once we have bet­ter the­o­ries of the brain and mind, we’ll have new con­cepts and the­o­ret­i­cal posits to ex­plain hu­man be­hav­ior. (e.g. some­thing some­thing Karl Fris­ton some­thing some­thing free en­ergy?) Thus, the sim­plest gen­er­a­tor of a given hu­man’s be­hav­ior will prob­a­bly not di­vide au­to­mat­i­cally into a plan­ner and a re­ward; it’ll prob­a­bly have many com­po­nents and there will be de­bates about which com­po­nents the AI should be faith­ful to (dub these com­po­nents the re­ward) and which com­po­nents the AI should seek to sur­pass (dub these com­po­nents the plan­ner.) Th­ese de­bates may be in­tractable, turn­ing on sub­jec­tive and/​or philo­soph­i­cal con­sid­er­a­tions. So this is an­other sense in which I think yeah, definitely Oc­cam’s Ra­zor isn’t suffi­cient—for we will also need to have a philo­soph­i­cal de­bate about what ra­tio­nal­ity is.

• Some ob­jec­tions:

• The thing that you can’t do is de­com­pose be­hav­ior into plan­ner and re­ward. If you just want to pre­dict be­hav­ior, you can to­tally do that. Similarly, you can pre­dict fu­ture events with physics.

• You do need to do the de­com­po­si­tion to run coun­ter­fac­tu­als. And in­deed I buy the claim that if you liter­ally try to find some in­put and some dy­nam­ics such that is the world tra­jec­tory, se­lect­ing only by Kol­mogorov com­plex­ity and ac­cu­racy at pre­dict­ing data, you prob­a­bly won’t be able to use the re­sult­ing to run coun­ter­fac­tu­als. Even ig­nor­ing the ma­lign uni­ver­sal prior ar­gu­ment.

• If it turns out you can run coun­ter­fac­tu­als with , I would strongly ex­pect that to be be­cause physics “ac­tu­ally” works by some sim­ple that is “in­var­i­ant” to the in­put state. In con­trast, I would be as­ton­ished if hu­mans “ac­tu­ally” have some re­ward in their head that they are try­ing to max­i­mize, and that is what drives be­hav­ior.

I don’t feel much bet­ter about the speed prior than the reg­u­lar Solomonoff prior.

• Thanks! I’m not sure I fol­low you. Here’s what I think you are say­ing:

--Oc­cam’s Ra­zor will be suffi­cient for pre­dict­ing hu­man be­hav­ior of course; it just isn’t suffi­cient for find­ing the in­tended plan­ner-re­ward pair. Be­cause (A) the sim­plest way to pre­dict hu­man be­hav­ior has noth­ing to do with plan­ners and re­wards, and so (B) the sim­plest plan­ner-re­ward pair will be de­gen­er­ate or weird as A&M ar­gue.

--You agree that this ar­gu­ment also works for Laws+Ini­tial Con­di­tions; Oc­cam’s Ra­zor is gen­er­ally in­suffi­cient, not just in­suffi­cient for in­fer­ring prefer­ences of ir­ra­tional agents!

--You think the ar­gu­ment is more likely to work for in­fer­ring prefer­ences than for Laws+Ini­tial Con­di­tions though.

If this is what you are say­ing, then I agree with the sec­ond and third points but dis­agree with the first—or at least, I don’t see any ar­gu­ment for it in A&M’s pa­per. It may still be true, but fur­ther ar­gu­ment is needed. In par­tic­u­lar their ar­gu­ments for (A) are pretty weak, me­thinks—that’s what my sec­tion “Ob­jec­tions to the ar­gu­ments for step 2” is about.

Edit to clar­ify: By “I agree with the sec­ond point” I mean I agree that if the ar­gu­ment works at all, it prob­a­bly works for Laws+Ini­tial Con­di­tions as well. I don’t think the ar­gu­ment works though. But I do think that Oc­cam’s Ra­zor is prob­a­bly in­suffi­cient.

• That’s an ac­cu­rate sum­mary of what I’m say­ing.

at least, I don’t see any ar­gu­ment for it in A&M’s pa­per. It may still be true, but fur­ther ar­gu­ment is needed.

If you are pick­ing ran­domly out of a set of N pos­si­bil­ities, the chance that you pick the “cor­rect” one is 1/​N. It seems like in any de­com­po­si­tion (whether plan­ner/​re­ward or ini­tial con­di­tions/​dy­nam­ics), there will be N de­com­po­si­tions, with N >> 1, where I’d say “yeah, that prob­a­bly has similar com­plex­ity as the cor­rect one”. The chance that the cor­rect one is also the sim­plest one out of all of these seems ba­si­cally like 1/​N, which is ~0.

You could make an ar­gu­ment that we aren’t ac­tu­ally choos­ing ran­domly, and cor­rect­ness is ba­si­cally iden­ti­cal to sim­plic­ity. I feel the pull of this ar­gu­ment in the limit of in­finite data for laws of physics (but not for finite data), but it just seems flatly false for the re­ward/​plan­ner de­com­po­si­tion.

• I feel like there’s a big differ­ence be­tween “similar com­plex­ity” and “the same com­plex­ity.” Like, if we have the­ory T and then we have the­ory T* which adds some sim­ple un­ob­tru­sive twist to it, we get an­other the­ory which is of similar com­plex­ity… yet re­al­is­ti­cally an Oc­cam’s-Ra­zor-driven search pro­cess is not go­ing to set­tle on T*, be­cause you only get T* by mod­ify­ing T. And if I’m wrong about this then it seems like Oc­cam’s Ra­zor is bro­ken in gen­eral; in any do­main there are go­ing to be ways to turn T’s into T*’s. But Oc­cam’s Ra­zor is not bro­ken in gen­eral (I feel).

Maybe this is the ar­gu­ment you an­ti­ci­pate above with ”...we aren’t ac­tu­ally choos­ing ran­domly.” Oc­cam’s Ra­zor isn’t ran­dom. Again, I might agree with you that in­tu­itively Oc­cam’s Ra­zor seems more use­ful in physics than in prefer­ence-learn­ing. But in­tu­itions are not ar­gu­ments, and any­how they aren’t ar­gu­ments that ap­peared in the text of A&M’s pa­per.

• I thought about this more and re-read the A&M pa­per, and I now have a differ­ent line of think­ing com­pared to my pre­vi­ous com­ments.

I still think A&M’s No Free Lunch the­o­rem goes through, but now I think A&M are prov­ing the wrong the­o­rem. A&M try to find the sim­plest (plan­ner, re­ward) de­com­po­si­tion that is com­pat­i­ble with the hu­man policy, but it seems like we in­stead ad­di­tion­ally want com­pat­i­bil­ity with all the ev­i­dence we have ob­served, in­clud­ing sen­sory data of hu­mans say­ing things like “if I was more ra­tio­nal, I would be ex­er­cis­ing right now in­stead of watch­ing TV” and “no re­ally, my re­ward func­tion is not empty”. The im­por­tant point is that such sen­sory data gives us in­for­ma­tion not just about the hu­man policy, but also about the de­com­po­si­tion. Forc­ing com­pat­i­bil­ity with this sen­sory data seems to rule out de­gen­er­ate pairs. This makes me feel like Oc­cam’s Ra­zor would work for in­fer­ring prefer­ences up to a cer­tain point (i.e. as long as the situ­a­tions are all “in-dis­tri­bu­tion”).

If we are try­ing to find the (plan­ner, re­ward) de­com­po­si­tion of non-hu­man minds: I think if we were ran­domly handed a mind from all of mind de­sign space, then A&M’s No Free Lunch the­o­rem would ap­ply, be­cause the sim­plest ex­pla­na­tion re­ally is that the mind has a de­gen­er­ate de­com­po­si­tion. But if we were ran­domly handed an alien mind from our uni­verse, then we would be able to use all the facts we have learned about our uni­verse, in­clud­ing how the aliens likely evolved, any state­ments they seem to be mak­ing about what they value, and so on.

Does this line of think­ing also ap­ply to the case of sci­ence? I think not, be­cause we wouldn’t be able to use our ob­ser­va­tions to get in­for­ma­tion about the de­com­po­si­tion. Un­like the case of val­ues, the nat­u­ral world isn’t mak­ing state­ments like “ac­tu­ally, the laws are empty and all the com­plex­ity is in the ini­tial con­di­tions”. I still don’t think the No Free Lunch the­o­rem works for sci­ence ei­ther, be­cause of my pre­vi­ous com­ments.

• com­pat­i­bil­ity with all the ev­i­dence we have observed

That is the whole point of my re­search agenda: https://​​www.less­wrong.com/​​posts/​​CSEdLLEkap2pub­jof/​​re­search-agenda-v0-9-syn­the­sis­ing-a-hu­man-s-prefer­ences-into

The prob­lem is that the non-sub­jec­tive ev­i­dence does not map onto facts about the de­com­po­si­tion. A hu­man claims X; well, that’s a speech act; are they tel­ling the truth or not, and how do we know? Same for sen­sory data, which is mainly data about the brain cor­re­lated with facts about the out­side world; to in­ter­pret that, we need to solve hu­man sym­bol ground­ing.

All these ideas are in the re­search agenda (es­pe­cially sec­tion 2). Just as you need some­thing to bridge the is-ought gap, you need some as­sump­tions to make ev­i­dence in the world (eg speech acts) cor­re­spond to prefer­ence-rele­vant facts.

This video may also illus­trate the is­sues: https://​​www.youtube.com/​​watch?v=1M9CvESSeVc&t=1s

• Hmm, I like that. I won­der what A&M would say in re­sponse. And I agree this is an im­por­tant and rele­vant differ­ence be­tween the case of prefer­ences and the case of sci­ence.

I still don’t think A&M show that the sim­plest ex­pla­na­tion is a de­gen­er­ate de­com­po­si­tion. They show that if it is, then Oc­cam’s Ra­zor won’t be suffi­cient, and more­over that there are some de­gen­er­ate de­com­po­si­tions pretty close to max­i­mally sim­ple. But they don’t do much to rule out the pos­si­bil­ity that the sim­plest ex­pla­na­tion is the in­tended one.

• Hey there!

Thanks for this cri­tique; I have, ob­vi­ously, a few com­ments ^_^

In no par­tic­u­lar or­der:

• First of all, the FHI chan­nel has a video go­ing over the main points of the ar­gu­ment (and of the re­search agenda); it may help to un­der­stand where I’m com­ing from: https://​​www.youtube.com/​​watch?v=1M9CvESSeVc

• A use­ful point from that: given hu­man the­ory of mind, the de­com­po­si­tion of hu­man be­havi­our into prefer­ences and ra­tio­nal­ity is sim­ple; with­out that the­ory of mind, it is com­plex. Since it’s hard for us to turn off our the­ory of mind, the de­com­po­si­tion will always feel sim­ple to us. How­ever, the hu­man the­ory of mind suffers from Mo­ravec’s para­dox: though the the­ory of mind seems sim­ple to us, it is very hard to spec­ify, es­pe­cially into code.

• You’re en­tirely cor­rect to de­com­pose the ar­gu­ment into Step 1 and Step 2, and to point out that Step 1 has much stronger for­mal sup­port than Step 2.

• I’m not too wor­ried about the de­gen­er­ate pairs speci­fi­cally; you can rule them all out with two bits of in­for­ma­tion. But, once you’ve done that, there will be other al­most-as-de­gen­er­ate pairs that bit with the new in­for­ma­tion. To rule them out, you need to add more in­for­ma­tion… but by the time you’ve added all of that, you’ve es­sen­tially defined the “proper” pair, by hand.

• On speed pri­ors: the stan­dard ar­gu­ment ap­plies for a speed prior, too (see Ap­pendix A of our pa­per). It ap­plies perfectly for the in­differ­ent plan­ner/​zero re­ward, and ap­plies, given an ex­tra as­sump­tion, for the other two de­gen­er­ate solu­tions.

• Onto the physics anal­ogy! First of all, I’m a bit puz­zled by your claim that physi­cists don’t know how to do this di­vi­sion. Now, we don’t have a full the­ory of physics; how­ever, all the phys­i­cal the­o­ries I know of, have a very clear and known di­vi­sion be­tween laws and ini­tial con­di­tions. So physi­cists do seem to know how to do this. And when we say that “it’s very com­plex”, this doesn’t seem to mean the di­vi­sion into laws and ini­tial con­di­tions is com­plex, just that the ini­tial con­di­tions are com­plex (and maybe that the laws are not yet known).

• The in­differ­ence plan­ner con­tains al­most ex­actly the same amount of on in­for­ma­tion as the policy. The “proper” pair, on the other hand, con­tains in­for­ma­tion such as whether the an­chor­ing bias is a bias (it is) com­pared with whether pay­ing more for bet­ter tast­ing choco­lates is a bias (it isn’t). Ba­si­cally, none of the de­gen­er­ate pairs con­tain any bias in­for­ma­tion at all; so ev­ery­thing to do with hu­man bi­ases is ex­tra in­for­ma­tion that comes along with the “proper” pair.

• Even ig­nor­ing all that, the fact that (p,R) is of com­pa­rable com­plex­ity to (-p,-R) shows that Oc­cams ra­zor can­not dis­t­in­guish the proper pair from its nega­tive.

• And thanks for the re­ply!

FWIW, I like the re­search agenda. I just don’t like the ar­gu­ment in the pa­per. :)

--Yes, with­out the­ory of mind the de­com­po­si­tion is com­plex. But is it more com­plex than the sim­plest way to con­struct the policy? Maybe, maybe not. For all you said in the pa­per, it could still be that the sim­plest way to con­struct the policy is via the in­tended pair, com­plex though it may be. (In my words: The Oc­cam Suffi­ciency Hy­poth­e­sis might still be true.)

--If the Oc­cam Suffi­ciency Hy­poth­e­sis is true, then not only do we not have to worry about the de­gen­er­ate pairs, we don’t have to worry about any­thing more com­plex than them ei­ther.

--I agree that your ar­gu­ment, if it works, ap­plies to the speed prior too. I just don’t think it works; I think Step 2 in par­tic­u­lar might break for the speed prior, be­cause the Speed!Oc­cam Suffi­ciency Hy­poth­e­sis might be true.

--If I ever said physi­cists don’t know how to dis­t­in­guish be­tween laws and ini­tial con­di­tions, I didn’t mean it. (Did I?) What I thought I said was that physi­cists haven’t yet found a law+IC pair that can ac­count for the data we’ve ob­served. Also that they are in fact us­ing lots of other heuris­tics and as­sump­tions in their method­ol­ogy, they aren’t just iter­at­ing through law+IC pairs and com­par­ing the re­sults to our data. So, in that re­gard the situ­a­tion with physics is par­allel to the situ­a­tion with prefer­ences/​ra­tio­nal­ity.

--My point is that they are ir­rele­vant to what is more com­plex than what. In par­tic­u­lar, just be­cause A has more in­for­ma­tion than B doesn’t mean A is more com­plex than B. Ex­am­ple: The true Laws + Ini­tial Con­di­tions pair con­tains more in­for­ma­tion than E, the set of all events in the world. Why? Be­cause from E you can­not con­clude any­thing about coun­ter­fac­tu­als, but from the true Laws+IC pair you can. Yet you can de­duce E from the true Laws+IC pair. (As­sume de­ter­minism for sim­plic­ity.) But it’s not true that the true Laws+IC pair is more com­plex than E; the com­plex­ity of E is the length of the short­est way to gen­er­ate it, and (let’s as­sume) the true Laws+IC is the short­est way to gen­er­ate E. So both have the same com­plex­ity.

I re­al­ize I may be con­fused here about how com­plex­ity or in­for­ma­tion works; please cor­rect me if so!

But any­how if I’m right about this then I am skep­ti­cal of con­clu­sions drawn from in­for­ma­tion to com­plex­ity… I’d like to see the ar­gu­ment made more ex­plicit and bro­ken down more at least.

For ex­am­ple, the “proper” pair con­tains all this in­for­ma­tion about what’s a bias and what isn’t, be­cause our defi­ni­tion of bias refer­ences the plan­ner/​re­ward dis­tinc­tion. But isn’t that un­fair? Ex­am­ple: We can write 99999999999999999999999 or we can write “20-digits of 9′s.” The lat­ter is shorter, but it con­tains more in­for­ma­tion if we cheat and say it tells us things like “how to spell the word that refers to the parts of a writ­ten num­ber.”

Any­how don’t the de­gen­er­ate pairs also con­tain in­for­ma­tion about bi­ases—for ex­am­ple, ac­cord­ing to the policy-plan­ner+empty-re­ward pair, noth­ing is a bias, be­cause noth­ing would sys­tem­at­i­cally lead to more re­ward than what is already be­ing done?

--If it were true that Oc­cam’s Ra­zor can’t dis­t­in­guish be­tween P,R and -P,-R, then… isn’t that a pretty gen­eral ar­gu­ment against Oc­cam’s Ra­zor, not just in this do­main but in other do­mains too?

--

• Hey there!

Re­spond­ing to a few points. But first, I want to make the point that treat­ing an agent as (p,R) pair is ba­si­cally an in­ten­tional stance. We choose to treat the agent that way, ei­ther for ease of pre­dict­ing its ac­tions (Den­net’s ap­proach) or for ex­tract­ing its prefer­ences, to satisfy them (my ap­proach). The de­com­po­si­tion is not a nat­u­ral fact about the world.

--If I ever said physi­cists don’t know how to dis­t­in­guish be­tween laws and ini­tial con­di­tions, I didn’t mean it. (Did I?) What I thought I said was that physi­cists haven’t yet found a law+IC pair that can ac­count for the data we’ve ob­served. Also that they are in fact us­ing lots of other heuris­tics and as­sump­tions in their method­ol­ogy, they aren’t just iter­at­ing through law+IC pairs and com­par­ing the re­sults to our data. So, in that re­gard the situ­a­tion with physics is par­allel to the situ­a­tion with prefer­ences/​ra­tio­nal­ity.

No, the situ­a­tion is very differ­ent. Physi­cists are try­ing to model and pre­dict what is hap­pen­ing in the world (and in coun­ter­fac­tual wor­lds). This is equiv­a­lent with try­ing to figure out the hu­man policy (which can be pre­dicted from ob­ser­va­tions, as long as you in­clude coun­ter­fac­tual ones). The de­com­po­si­tion of the policy into prefer­ences and ra­tio­nal­ity is a sep­a­rate step, very un­like what physi­cists are do­ing (quick way to check this: if physi­cists were un­bound­edly ra­tio­nal with in­finite data, they could solve their prob­lem; whereas we couldn’t, we’d still have to make de­ci­sions).

(if you want to talk about situ­a­tions where we know some things but not all about the hu­man policy, then the treat­ment is more com­plex, but ul­ti­mately the same ar­gu­ments ap­ply).

--My point is that they are ir­rele­vant to what is more com­plex than what. In par­tic­u­lar, just be­cause A has more in­for­ma­tion than B doesn’t mean A is more com­plex than B. Ex­am­ple: The true Laws + Ini­tial Con­di­tions pair con­tains more in­for­ma­tion than E, the set of all events in the world. Why? Be­cause from E you can­not con­clude any­thing about coun­ter­fac­tu­als, but from the true Laws+IC pair you can. Yet you can de­duce E from the true Laws+IC pair. (As­sume de­ter­minism for sim­plic­ity.) But it’s not true that the true Laws+IC pair is more com­plex than E; the com­plex­ity of E is the length of the short­est way to gen­er­ate it, and (let’s as­sume) the true Laws+IC is the short­est way to gen­er­ate E. So both have the same com­plex­ity.

Well, it de­pends. Sup­pose there are mul­ti­ple TL (true laws) + IC that could gen­er­ate E. In that case, TL+IC has more com­plex­ity than E, since you need to choose among the pos­si­ble op­tions. But if there is only one fea­si­ble TL+IC that gen­er­ates E, then you can work back­wards from E to get that TL+IC, and now you have all the coun­ter­fac­tual info, from E, as well.

For ex­am­ple, the “proper” pair con­tains all this in­for­ma­tion about what’s a bias and what isn’t, be­cause our defi­ni­tion of bias refer­ences the plan­ner/​re­ward dis­tinc­tion. But isn’t that un­fair? Ex­am­ple: We can write 99999999999999999999999 or we can write “20-digits of 9′s.” The lat­ter is shorter, but it con­tains more in­for­ma­tion if we cheat and say it tells us things like “how to spell the word that refers to the parts of a writ­ten num­ber.”

That ar­gu­ment shows that if you look into the al­gorithm, you can get other differ­ences. But I’m not look­ing into the al­gorithm; I’m just us­ing the de­com­po­si­tion into (p, R), and play­ing around with the p and R pieces, with­out look­ing in­side.

Any­how don’t the de­gen­er­ate pairs also con­tain in­for­ma­tion about bi­ases—for ex­am­ple, ac­cord­ing to the policy-plan­ner+empty-re­ward pair, noth­ing is a bias, be­cause noth­ing would sys­tem­at­i­cally lead to more re­ward than what is already be­ing done?

Among the de­gen­er­ate pairs, the one with the in­differ­ent plan­ner has a bias of zero, the greedy plan­ner has a bias of zero, and the anti-greedy plan­ner has a bias of −1 at ev­ery timestep. So they do define bias func­tions, but par­tic­u­larly sim­ple ones. Noth­ing like the com­plex­ity of the bi­ases gen­er­ated by the “proper” pair.

The rele­vance of in­for­ma­tion for com­plex­ity is this: given rea­son­able as­sump­tions, the hu­man policy is sim­pler than all pairs, and the three de­gen­er­ate pairs are al­most as sim­ple as the policy. How­ever, the “proper” pair can gen­er­ate a com­pli­cated ob­ject, the bias func­tion (which has a non-triv­ial value in al­most ev­ery pos­si­ble state). So the proper pair con­tains at least enough in­for­ma­tion to spec­ify a) the hu­man policy, and b) the bias func­tion. The kol­mogorov com­plex­ity of the proper pair is thus at least that of the sim­plest al­gorithm that can gen­er­ate both those ob­jects.

So one of two things are hap­pen­ing: ei­ther the hu­man policy can gen­er­ate the bias func­tion di­rectly, in some sim­ple way[1], or the proper pair is more com­pli­cated that the policy. The first is not im­pos­si­ble, but no­tice that it has to be “sim­ple”. So the fact that we have not yet found a way to gen­er­ate the bias func­tion from the policy is an ar­gu­ment that it can’t be done. Cer­tainly there are no el­e­men­tary math­e­mat­i­cal ma­nipu­la­tions of the policy that pro­duces any­thing suit­able.

--If it were true that Oc­cam’s Ra­zor can’t dis­t­in­guish be­tween P,R and -P,-R, then… isn’t that a pretty gen­eral ar­gu­ment against Oc­cam’s Ra­zor, not just in this do­main but in other do­mains too?

No, be­cause Oc­cam’s ra­zor works in other do­mains. This is a strong illus­tra­tion that this do­main is ac­tu­ally differ­ent.

1. Let A be the sim­plest al­gorithm that gen­er­ates the hu­man policy, and B the sim­plest that gen­er­ates the hu­man policy and the bias func­tion. If there are n differ­ent al­gorithms that gen­er­ate the hu­man policy and are of length |B| or shorter, then we need to add log2(n) bits of in­for­ma­tion to the hu­man policy to gen­er­ate B, and hence, the bias func­tion. So if B is close is com­plex­ity to A, be don’t need to add much. ↩︎

• Thanks again! I still dis­agree, sur­prise sur­prise.

I think I agree with you that the (p,R) de­com­po­si­tion is not a nat­u­ral fact about the world, but I’m not so sure. Any­how I don’t think it mat­ters for our pur­poses.

No, the situ­a­tion is very differ­ent. Physi­cists are try­ing to model and pre­dict what is hap­pen­ing in the world (and in coun­ter­fac­tual wor­lds). This is equiv­a­lent with try­ing to figure out the hu­man policy (which can be pre­dicted from ob­ser­va­tions, as long as you in­clude coun­ter­fac­tual ones). The de­com­po­si­tion of the policy into prefer­ences and ra­tio­nal­ity is a sep­a­rate step, very un­like what physi­cists are do­ing (quick way to check this: if physi­cists were un­bound­edly ra­tio­nal with in­finite data, they could solve their prob­lem; whereas we couldn’t, we’d still have to make de­ci­sions).
(if you want to talk about situ­a­tions where we know some things but not all about the hu­man policy, then the treat­ment is more com­plex, but ul­ti­mately the same ar­gu­ments ap­ply).

Physi­cists are try­ing to do many things. Yes, one thing they are try­ing to do is pre­dict what it hap­pen­ing in the world. But an­other thing they are try­ing to do is figure out stuff about coun­ter­fac­tu­als, and for that they need to have a Laws+IC de­com­po­si­tion to work with. So they take their data and they look for a sim­ple Laws+IC de­com­po­si­tion that fits it. They would still do this even if they already knew the re­sults of all the ex­per­i­ments ever, and had no more need to pre­dict things. (Ex­tend­ing the sym­me­try, hu­mans also typ­i­cally use the in­ten­tional stance on in­com­plete data about a tar­get hu­man’s policy, for the pur­pose of pre­dict­ing the rest of the policy. But this isn’t what you con­cern your­self with; you as­sume for the sake of ar­gu­ment that we already have the whole policy and point out that we’d still want to use the in­ten­tional stance to get a de­com­po­si­tion so that we could make judg­ments about ra­tio­nal­ity. I say yes, true, now ap­ply the same rea­son­ing to physics: as­sume for the sake of ar­gu­ment that we already know ev­ery­thing that will hap­pen, all the events, and no­tice that we’d still want to have a Laws+IC de­com­po­si­tion, per­haps to figure out coun­ter­fac­tu­als.)

Well, it de­pends. Sup­pose there are mul­ti­ple TL (true laws) + IC that could gen­er­ate E. In that case, TL+IC has more com­plex­ity than E, since you need to choose among the pos­si­ble op­tions. But if there is only one fea­si­ble TL+IC that gen­er­ates E, then you can work back­wards from E to get that TL+IC, and now you have all the coun­ter­fac­tual info, from E, as well.

I was as­sum­ing there were mul­ti­ple Law+IC pairs that would gen­er­ate E… well ac­tu­ally no, the ex­am­ple de­gen­er­ate pairs I gave prove that there are, no need to as­sume it!

That ar­gu­ment shows that if you look into the al­gorithm, you can get other differ­ences. But I’m not look­ing into the al­gorithm; I’m just us­ing the de­com­po­si­tion into (p, R), and play­ing around with the p and R pieces, with­out look­ing in­side.

I don’t see the differ­ence be­tween what you are do­ing and what I did. You started with a policy and said “But what about bias-facts? The policy by it­self doesn’t tell us these facts. So let’s look at the var­i­ous de­com­po­si­tions of the policy into p,R pairs; they tell us the bias facts.” I start with a num­ber and say “But what about how-to-spell-the-word-that-refers-to-the-parts-of-a-writ­ten-num­ber facts? The num­ber doesn’t tell us that. Let’s look at the var­i­ous de­com­po­si­tions of the num­ber into strings of sym­bols that rep­re­sent it; they tell us those facts.”

Among the de­gen­er­ate pairs, the one with the in­differ­ent plan­ner has a bias of zero, the greedy plan­ner has a bias of zero, and the anti-greedy plan­ner has a bias of −1 at ev­ery timestep. So they do define bias func­tions, but par­tic­u­larly sim­ple ones. Noth­ing like the com­plex­ity of the bi­ases gen­er­ated by the “proper” pair.

Thanks for the clar­ifi­ca­tion—that’s what I sus­pected. So then ev­ery p,R pair com­pat­i­ble with the policy con­tains more in­for­ma­tion than the policy. Thus even the sim­plest p,R pair com­pat­i­ble with the policy con­tains more in­for­ma­tion than the policy. By analo­gous rea­son­ing, ev­ery al­gorithm for con­struct­ing the policy con­tains more in­for­ma­tion than the policy. So even the sim­plest al­gorithm for con­struct­ing the policy con­tains more in­for­ma­tion than the policy. So (by your rea­son­ing) even the sim­plest al­gorithm for con­struct­ing the policy is more com­plex than the policy. But this isn’t so; the sim­plest al­gorithm for con­struct­ing the policy is length L and so has com­plex­ity L, and the policy has com­plex­ity L too… That’s my ar­gu­ment at least. Again, maybe I’m mi­s­un­der­stand­ing how com­plex­ity works. But now that I’ve laid it out step-by-step, which step do you dis­agree with?

The rele­vance of in­for­ma­tion for com­plex­ity is this: given rea­son­able as­sump­tions, the hu­man policy is sim­pler than all pairs, …

Wait what? This is what I was ob­ject­ing to in the origi­nal post. The “Oc­cam Suffi­ciency Hy­poth­e­sis” is that the hu­man policy is not sim­pler than all pairs; in par­tic­u­lar, it is pre­cisely the sim­plic­ity of the in­tended pair, be­cause the in­tended pair is the sim­plest way to con­struct the policy.

What are the rea­son­able as­sump­tions that lead to the OSH be­ing false?

My ob­jec­tion to your pa­per, in a nut­shell, was that you didn’t dis­cuss this part—you didn’t give any rea­son to think OSH was false. The three rea­sons you gave in Step 2 were rea­sons to think the in­tended pair is com­plex, not rea­sons to think it is more com­plex than the policy. Or so I ar­gued.

--If it were true that Oc­cam’s Ra­zor can’t dis­t­in­guish be­tween P,R and -P,-R, then… isn’t that a pretty gen­eral ar­gu­ment against Oc­cam’s Ra­zor, not just in this do­main but in other do­mains too?
No, be­cause Oc­cam’s ra­zor works in other do­mains. This is a strong illus­tra­tion that this do­main is ac­tu­ally differ­ent.

My ar­gu­ment is that if you are right, Oc­cam’s Ra­zor would be gen­er­ally use­less, but i’s not, so you are wrong. In more de­tail: If Oc­cam’s Ra­zor can’t dis­t­in­guish be­tween P,R and -P,-R, then (by anal­ogy) it an ar­bi­trary do­main it won’t be able to dis­t­in­guish be­tween the­ory X and the­ory b(X) where b() is some sim­ple biz­zaro func­tion that negates or in­verts the parts of X in such a way as to make it the changes can­cel out.

• I’m not sure the physics anal­ogy is get­ting us very far—I feel there is a very nat­u­ral way of de­com­pos­ing physics into laws+ini­tial con­di­tions, while there is no such nat­u­ral way of do­ing so for prefer­ences and ra­tio­nal­ity. But if we have differ­ent in­tu­itions on that, then dis­cussing the anal­ogy doesn’t isn’t go­ing to help us con­verge!

So then ev­ery p,R pair com­pat­i­ble with the policy con­tains more in­for­ma­tion than the policy. Thus even the sim­plest p,R pair com­pat­i­ble with the policy con­tains more in­for­ma­tion than the policy.

Agreed (though the ex­tra in­for­ma­tion may be tiny—a few ex­tra sym­bols).

By analo­gous rea­son­ing, ev­ery al­gorithm for con­struct­ing the policy con­tains more in­for­ma­tion than the policy.

That does not fol­low; the sim­plest al­gorithm for build­ing a policy does not go via de­com­pos­ing into two pieces and then re­com­bin­ing them. We are com­par­ing al­gorithms that pro­duce a plan­ner-re­ward pair (two out­puts) with al­gorithms that pro­duce a policy (one out­put). (but your whole ar­gu­ment shows you may be slightly mi­s­un­der­stand­ing com­plex­ity in this con­text).

Now, though all pairs are slightly more com­plex than the policy it­self, the bias ar­gu­ment shows that the “proper” pair is con­sid­er­ably more com­plex. To use an anal­ogy: sup­pose file1 and file2 are both max­i­mally zipped files. When you un­zip file1, you pro­duce image1 (and maybe a small, blank, image2). When you un­zip file2, you also pro­duce the same image1, and a large, com­plex, image2′. Then, as long as image1 and image2′ are at least slightly in­de­pen­dent, file2 has to be larger than file1. The more com­plex image2′ is, and the more in­de­pen­dent it is from image1, the larger file2 has to be.

Does that make sense?

• I agree that the de­com­po­si­tion of physics into laws+IC is much sim­pler than the de­com­po­si­tion of a hu­man policy into p,R. (Is that what you mean by “more nat­u­ral?”) But this is not rele­vant to my ar­gu­ment, I think.

I feel that our con­ver­sa­tion now has branched into too many branches, some of which have been aban­doned. In the in­ter­est of re-fo­cus­ing the con­ver­sa­tion, I’m go­ing to an­swer the ques­tions you asked and then ask a few new ones of my own.

To your ques­tions: For me to un­der­stand your ar­gu­ment bet­ter I’d like to know more about what the pieces rep­re­sent. Is file1 the de­gen­er­ate pair and file2 the in­tended pair, and image1 the policy and image2 the bias-facts? Then what is the “un­zip” func­tion? Pairs don’t un­zip to any­thing. You can ap­ply the func­tion “ap­ply the first el­e­ment of the pair to the sec­ond” or you can ap­ply the func­tion “do that, and then ap­ply the MAXIMIZE func­tion to the sec­ond el­e­ment of the pair and com­pute the differ­ence.” Or there are in­finitely many other things you can do with the pair. But the pair it­self doesn’t tell you what to do with it, un­like a zipped file which is like an al­gorithm—it tells you “run me.”

I have two ques­tions. 1. My cen­tral claim—which I still up­hold as not-ruled-out-by-your-ar­gu­ments (though of course I don’t ac­tu­ally be­lieve it) is the Oc­cam Suffi­ciency Hy­poth­e­sis: “The ‘in­tended’ pair is the sim­plest way to gen­er­ate the policy.” So, ba­si­cally, what OSH says is that within each de­gen­er­ate pair is a term, pi (the policy), and when you crack open that term and see what it is made of, you see p(R), the in­tended policy ap­plied to the in­tended re­ward func­tion! Thus, a sim­plic­ity-based search will stum­ble across <p,R> be­fore it stum­bles across any of the de­gen­er­ate pairs, be­cause it needs p and R to con­struct the de­gen­er­ate pairs. What part of this do you ob­ject to?

2. Ear­lier you said “given rea­son­able as­sump­tions, the hu­man policy is sim­pler than all pairs” What are those as­sump­tions?

Once again, thanks for tak­ing the time to en­gage with me on this! Sorry it took me so long to re­ply, I got busy with fam­ily stuff.

• Is file1 the de­gen­er­ate pair and file2 the in­tended pair, and image1 the policy and image2 the bias-facts?

Yes.

Then what is the “un­zip” func­tion?

The “short­est al­gorithm gen­er­at­ing BLAH” is the max­i­mally com­pressed way of ex­press­ing BLAH—the “zipped” ver­sion of BLAH.

Ig­nor­ing un­zip, which isn’t very rele­vant, we know that the de­gen­er­ate pairs are just above the policy in com­plex­ity.

So zip(de­gen­er­ate pair) zip(policy), while zip(rea­son­able pair) > zip(policy+com­plex bias facts) (and zip(policy+com­plex bias facts) > zip(policy)).

Does that help?

• It helps me to un­der­stand more clearly your ar­gu­ment. I still dis­agree with it though. I ob­ject to this:

zip(rea­son­able pair) > zip(policy+com­plex bias facts)

I claim this begs the ques­tion against OSH. If OSH is true, then zip(rea­son­able pair) ≈ zip(policy).

• In­deed. It might be pos­si­ble to con­struct that com­plex bias func­tion, from the policy, in a sim­ple way. But that claim needs to be sup­ported, and the fact that it hasn’t been found so far (I re­peat that it has to be sim­ple) is ev­i­dence against it.

• how­ever, all the phys­i­cal the­o­ries I know of, have a very clear and known di­vi­sion be­tween laws and ini­tial con­di­tions.

Physics doesn’t work on Oc­cam’s ra­zor alone. You need an IC/​law di­vi­sion to be able to figure out coun­ter­fac­tu­als, but equally you can im­ple­ment coun­ter­fac­tu­als in the form of ex­per­i­ment, and use them to figure out the IC/​law split.

• How does that al­ter­nate method work? Im­ple­ment­ing coun­ter­fac­tu­als in the form of an ex­per­i­ment?

• That would.. just perform­ing an ex­per­i­ment. All ex­per­i­ments an­swer a “what if” ques­tion.

• I think that’s a bit con­tro­ver­sial. Ex­per­i­ments tell us what hap­pens in one timeline, the ac­tual one… just like ev­ery­thing else we see and do. They don’t tell us what would have hap­pened if such-and-such had oc­curred, be­cause such-and-such didn’t in fact oc­cur.

• After the ex­per­i­ment has been performed, the coun­ter­fac­tual is now ac­tual, but it was a coun­ter­fac­tual be­fore­hand. Even if you take the view that ev­ery­thing is de­ter­mined, ex­per­i­ments are still ex­plor­ing log­i­cal coun­ter­fac­tu­als. On the other hand, if you as­sume holism, then you can’t ex­plore coun­ter­fac­tu­als with ex­per­i­ments be­cause you can’t con­struct a com­plete state of the uni­verse.

• I’m pretty sure that’s not how coun­ter­fac­tu­als are nor­mally thought to work. “Coun­ter­fac­tual” means con­trary-to-the-facts. Some­thing that is true is not con­trary to the facts.

Ar­gu­ment: If you are right, then why is this only true for ex­per­i­ments? Isn’t it equally true for any­thing that hap­pens—be­fore it hap­pens, it’s just a coun­ter­fac­tual, and then af­ter it hap­pens, it’s ac­tual?

• I’m not con­fi­dent I’ve un­der­stood this post, but it seems to me that the differ­ence be­tween the val­ues case and the em­piri­cal case is that in the val­ues case, we want to do bet­ter than hu­mans at achiev­ing hu­man val­ues (this is the “am­bi­tious” in “am­bi­tious value learn­ing”) whereas in the em­piri­cal case, we are fine with just pre­dict­ing what the uni­verse does (we aren’t try­ing to pre­dict the uni­verse even bet­ter than the uni­verse it­self). In the for­mal­ism, in π = P(R) we are af­ter R (rather than π), but in E = L(C) we are af­ter E (rather than L or C), so in the lat­ter case it doesn’t mat­ter if we get a de­gen­er­ate pair (be­cause it will still pre­dict the fu­ture events well). Similarly, in the val­ues case, if all we wanted was to imi­tate hu­mans, then it seems like get­ting a de­gen­er­ate pair would be fine (it would act just as hu­man as the “in­tended” pair).

If we use Oc­cam’s Ra­zor alone to find law-con­di­tion pairs that fit all the world’s events, we’ll set­tle on one of the de­gen­er­ate ones (or some­thing else en­tirely) rather than a rea­son­able one. This could be very dan­ger­ous if we are e.g. build­ing an AI to do sci­ence for us and an­swer coun­ter­fac­tual ques­tions like “If we had posted the nu­clear launch codes on the In­ter­net, would any nukes have been launched?”

I don’t un­der­stand how this con­clu­sion fol­lows (un­less it’s about the ma­lign prior, which seems not rele­vant here). Could you give more de­tails on why an­swer­ing coun­ter­fac­tual ques­tions like this would be dan­ger­ous?

• Thanks! OK, so I agree that nor­mally in do­ing sci­ence we are fine with just pre­dict­ing what will hap­pen, there’s no need to de­com­pose into Laws and Con­di­tions. Whereas with value learn­ing we are try­ing to do more than just pre­dict be­hav­ior; we are try­ing to de­com­pose into Plan­ner and Re­ward so we can max­i­mize Re­ward.

How­ever the sci­ence case can be made analo­gous in two ways. First, as Eigil says be­low, re­al­is­ti­cally we don’t have ac­cess to ALL be­hav­ior or ALL events, so we will have to ac­cept that the pre­dic­tor which pre­dicted well so far might not pre­dict well in the fu­ture. Thus if Oc­cam’s Ra­zor set­tles on weird de­gen­er­ate pre­dic­tors, it might also set­tle on one that pre­dicts well up un­til time T but then pre­dicts poorly af­ter that.

Se­cond, (this is the way I went, with coun­ter­fac­tu­als) sci­ence isn’t all about pre­dic­tion. Part of sci­ence is about an­swer­ing coun­ter­fac­tual ques­tions like “what would have hap­pened if...” And typ­i­cally the way to an­swer these ques­tions is by de­com­pos­ing into Laws + Con­di­tions and then do­ing a sur­gi­cal in­ter­ven­tion on the con­di­tions and then ap­ply­ing the same Laws to the new con­di­tions.

So, for ex­am­ple, if we use Oc­cam’s Ra­zor to find Laws+Con­di­tions for our uni­verse, and some­how it set­tles on the de­gen­er­ate pair “Con­di­tions := null, Laws := se­quence of events E hap­pens” then all our coun­ter­fac­tual queries will give bo­gus an­swers—for ex­am­ple, “what would have hap­pened if we had posted the nu­clear launch codes on the In­ter­net?” An­swer: “Vary­ing the Con­di­tions but hold­ing the Laws fixed… it looks like E would have hap­pened. So yeah, post­ing launch codes on the In­ter­net would have been fine, wouldn’t have changed any­thing.”

• Thanks for the ex­pla­na­tion, I think I un­der­stand this bet­ter now.

My re­sponse to your sec­ond point: I wasn’t sure how the se­quence pre­dic­tion ap­proach to in­duc­tion (like Solomonoff in­duc­tion) deals with coun­ter­fac­tu­als, so I looked it up, and it looks like we can con­vert the coun­ter­fac­tual ques­tion into a se­quence pre­dic­tion ques­tion by ap­pend­ing the coun­ter­fac­tual to all the data we have seen so far. So in the nu­clear launch codes ex­am­ple, we would feed the se­quence pre­dic­tor with a video of the launch codes be­ing posted to the in­ter­net, and then ask it to pre­dict what se­quence it ex­pects to see next. (See the top of page 9 of this PDF and also ex­am­ple 5.2.2 in Li and Vi­tanyi for more de­tails and fur­ther ex­am­ples.) This doesn’t re­quire a de­com­po­si­tion into laws and con­di­tions; rather it seems to re­quire that the events E be a func­tion that can take in bits and print out more bits (or a prob­a­bil­ity dis­tri­bu­tion over bits). But this doesn’t seem like a prob­lem, since in the val­ues case the policy π is also a func­tion. (Maybe my real point is that I don’t un­der­stand why you are as­sum­ing E has to be a se­quence of events?) [ETA: ac­tu­ally, maybe E can be just a se­quence of events, but if we’re talk­ing about com­plex­ity, there would be some pro­gram that gen­er­ates E, so I am sug­gest­ing we use that pro­gram in­stead of L and C for coun­ter­fac­tual rea­son­ing.]

My re­sponse to your first point: I am far from an ex­pert here, but my guess is that an Oc­cam’s Ra­zor ad­vo­cate would bite the bul­let and say this is fine, since ei­ther (1) the de­gen­er­ate pre­dic­tors will have high com­plex­ity so will be dom­i­nated by sim­pler pre­dic­tors, or (2) we are just as likely to be liv­ing in a “de­gen­er­ate” world as we are to be liv­ing in the kind of “pre­dictable” world that we think we are liv­ing in.

• Thanks! OK, so I agree that nor­mally in do­ing sci­ence we are fine with just pre­dict­ing what will hap­pen, there’s no need to de­com­pose into Laws and Con­di­tions.

Where we can pre­dict, we do so by feed­ing a set of con­di­tions into laws.

Se­cond, (this is the way I went, with coun­ter­fac­tu­als) sci­ence isn’t all about pre­dic­tion. Part of sci­ence is about an­swer­ing coun­ter­fac­tual ques­tions like “what would have hap­pened if...” And typ­i­cally the way to an­swer these ques­tions is by de­com­pos­ing into Laws + Con­di­tions and then do­ing a sur­gi­cal in­ter­ven­tion on the con­di­tions and then ap­ply­ing the same Laws to the new con­di­tions.

Method­olog­i­cally, coun­ter­fac­tu­als and pre­dic­tions are al­most the same thing. In the case of a pre­dic­tion , you feed an ac­tual con­di­tion into your laws, in the case of a coun­ter­fac­tual, you feed in a non-ac­tual. one.

• A sim­ple re­mark: we don’t have ac­cess to all of , only up un­til the cur­rent time. So we have to make sure that we don’t get a de­gen­er­ate pair which di­verges wildly from the ac­tual uni­verse at some point in the fu­ture.

Maybe this is similar to the fact that we don’t want AIs to di­verge from hu­man val­ues once we go off-dis­tri­bu­tion? But you’re definitely right that there’s a differ­ence: we do want AIs to di­verge from hu­man be­havi­our (even in com­mon situ­a­tions).

• This is neat. It makes me re­al­ize that think­ing in terms of sim­plic­ity and com­plex­ity pri­ors was serv­ing some­what as a se­man­tic stop sign for me whereas speed prior vs slow prior doesn’t.

• When we de­com­pose the se­quence of events E into laws L and ini­tial con­di­tions C, the laws dont just calcu­late E from C. Rather, L is a func­tion form events->events, and the se­quence E con­tains many in­put-out­put pairs of L.

By con­trast, when we de­com­pose a policy into a plan­ner P and a re­ward R, P is a func­tion from re­wards->policy. With the setup of the prob­lem as-is, we have data on many in­stances of pairs (be­havi­our), so we can in­fer with high ac­cu­racy. But we only get to see one policy, and we never get to ex­plic­itly see re­wards. In such a case, in­deed we will get the empty re­ward and . To cor­rectly in­fer R and P, we would have to see our P ap­plied to some other re­wards, and the poli­cies re­sult­ing from that.

• Take the limit as we ob­serve more and more be­hav­ior—it takes a mil­lion bits to spec­ify E, for ex­am­ple, or a billion. Then the util­ity max­i­mizer and util­ity min­i­mizer are both much much sim­pler (can be speci­fied in fewer bits) than the Bud­dha-like zero util­ity agent (as­sum­ing E is in fact con­sis­tent with a sim­ple util­ity func­tion). Like­wise, in that same limit, the true laws of physics plus ini­tial con­di­tions are much much sim­pler than say­ing “L=0 and E just hap­pens”. Right? Sorry if I’m mi­s­un­der­stand­ing, I haven’t read A&M.

• The trick is that you can use the sim­plest method for con­struct­ing E in your state­ment “L=0 and E just hap­pens.” So e.g. if you have some sim­ple Laws l and Con­di­tions c such that l(c) = E, your state­ment can be “L=0 and l(c) just hap­pens.”

• I think the physics anal­ogy here is re­ally cool—the idea of draw­ing a par­allel be­tween the pair “what a per­son wants and how how they be­have to get those things” and the pair “how the uni­verse is set up and how it be­haves as a re­sult” is an in­ter­est­ing one.

How­ever, ar­guably, I think many physi­cists have already set­tled on a de­gen­er­ate model of physics: The idea be­hind the Copen­hagen In­ter­pre­ta­tion is es­sen­tially that given an ini­tial con­di­tion, some event (par­tially defined by those con­di­tions) will just ran­domly hap­pen. It’s not ex­actly one of the de­gen­er­ate ex­am­ples you give (be­cause a lot of rules can be ex­tracted from the ini­tial con­di­tions about how those ran­dom things hap­pen) but, at the end of the day, lots of peo­ple already ac­cept that the ini­tial-con­di­tions to laws-of-physics pairing is best de­scribed by say­ing “some­times some things hap­pen and some­times other things hap­pen.”

• I think many physi­cists have already set­tled on a de­gen­er­ate model of physics: [..] the Copen­hagen In­ter­pre­ta­tion [..] It’s not ex­actly one of the de­gen­er­ate ex­am­ples you give

I don’t see what’s de­gen­er­ate about it at all.

Lots of peo­ple already ac­cept that the ini­tial-con­di­tions to laws-of-physics pairing is best de­scribed by say­ing “some­times some things hap­pen and some­times other things hap­pen.”

Every in­ter­pre­ta­tion yields the same re­sults. There’s no known way of re­jig­ging the ini­tial con­di­tions-to-laws-of-evolu­tion bal­ance that does bet­ter.

• Every in­ter­pre­ta­tion yields the same re­sults. There’s no known way of re­jig­ging the ini­tial con­di­tions-to-laws-of-evolu­tion bal­ance that does bet­ter.

Ex­actly. The fact that mul­ti­ple con­cep­tu­ally dis­tinct rule-sets yield the same re­sults is what makes it de­gen­er­ate. In the same way that a sin­gle policy can be de­scribed ex­actly by mul­ti­ple de­gen­er­ate re­ward func­tions of similar com­plex­ity, the evolu­tion of the uni­verse can be de­scribed ex­actly by mul­ti­ple sets of phys­i­cal laws of similar com­plex­ity. Sure the ran­dom­ness the best we can do in terms of pre­dic­tion but the un­der­ly­ing way that ran­dom­ness is pro­duced is de­gen­er­ate:

1. The next state of the uni­verse is evolved from the cur­rent state by a com­bi­na­tion of de­tails about the cur­rent state and a ran­dom fluc­tu­a­tion that just happened

2. The next state of the uni­verse is evolved from the cur­rent state by a com­bi­na­tion of de­tails about the cur­rent state and a set of events by the laws of the uni­verse which only ap­pear ran­dom to us

3. The next state of the uni­verse is evolved from the cur­rent state by a com­bi­na­tion of de­tails about the cur­rent state and a set of ob­ser­va­tion­ally ran­dom events that were cho­sen to oc­cur in se­quence be­fore the be­gin­ning of the universe

and so on...

I per­son­ally like quan­tum me­chan­ics though. I’m just pick­ing on it be­cause, while many for­mu­la­tions of de­ter­minis­tic laws ex­ist, peo­ple can always make the ar­gu­ment that their “differ­ent” in­ter­pre­ta­tions are just differ­ent math­e­mat­i­cal re­for­mu­la­tions of a sin­gle con­cept. In con­trast, it’s easy to pick con­cep­tu­ally differ­ent ways in which ob­ser­va­tion­ally ran­dom events are pro­duced.

In sci­ence, the dis­tinc­tion be­tween 1, 2 and 3 don’t mat­ter since they all pre­dict the same things. But similar dis­tinc­tions in terms of re­ward func­tions mat­ter greatly be­cause they, in­tu­itively, im­ply differ­ent “sub­jec­tive” ex­pe­riences. But, the up­shot is that the ar­ti­cle’s claim that “physics be­ing de­gen­er­ate” is a con­tro­ver­sial idea isn’t some­thing I be­lieve.

• The fact that mul­ti­ple con­cep­tu­ally dis­tinct rule-sets yield the same re­sults is what makes it de­gen­er­ate.

What does the sin­gu­lar “it” re­fer to? You could claim that QM is de­gen­er­ate be­cause mul­ti­ple for­mu­la­tions lead to the same re­sult, but you seemed to have a spe­cific beef with Copen­hagen.

But similar dis­tinc­tions in terms of re­ward func­tions mat­ter greatly be­cause they, in­tu­itively, im­ply differ­ent “sub­jec­tive” ex­pe­riences.

Much more than that. There is a lot of moral con­cern about whether some­one is do­ing some­thing bad as a re­sult of try­ing to do some­thing good in­com­pe­tently, or do­ing some­thing bad in­ten­tion­ally.

• What does the sin­gu­lar “it” re­fer to? You could claim that QM is de­gen­er­ate be­cause mul­ti­ple for­mu­la­tions lead to the same re­sult but you seemed to have a spe­cific beef with Copen­hagen

I picked Copen­hagen be­cause it in­volves col­laps­ing a wave-func­tion to a ran­dom state for a spe­cific uni­verse (ie, the uni­verse evolves in a way that is par­tially ran­dom). If you’re a many wor­lds the­o­rist, you could plau­si­bly claim that, since the prob­a­bil­ity dis­tri­bu­tion de­scribes how fre­quently differ­ent kinds of wor­lds hap­pen with re­spect to each other, the uni­verse doesn’t evolve ran­domly at all—what we per­ceive as ran­dom­ness de­scribes an de­ter­minis­tic dis­tri­bu­tion of all pos­si­ble wor­lds.

To me, it looks easy to re­but this ar­gu­ment—you just point out that there is still ran­dom­ness in your sub­jec­tive per­spec­tive of the world. But then some­one else might ques­tion that be­cause your “sub­jec­tive per­spec­tive” be­comes a mat­ter of an­throp­ics and then the whole con­ver­sa­tion gets into some con­fus­ing weeds that would dra­mat­i­cally lengthen the amount of time I need to think about things. So I picked Copen­hagen speci­fi­cally as a short-cut.

So yeah, I was pick­ing on Copen­hagen be­cause it’s eas­ier to es­tab­lish in the con­text of the point I was try­ing to make (quan­tum me­chan­ics is de­gen­er­ate). But I wasn’t pick­ing on it be­cause other in­ter­pre­ta­tions of QM are less prob­le­matic than Copen­hagen.

Also to clar­ify:

spe­cific beef with Copenhagen

I don’t have a beef with Copen­hagen or with QM. I just think its a de­gen­er­ate world model and, with the defi­ni­tion I’m us­ing, de­gen­er­ate world mod­els of the kind that QM is aren’t a bad thing.

Much more than that. There is a lot of moral con­cern about whether some­one is do­ing some­thing bad as a re­sult of try­ing to do some­thing good in­com­pe­tently, or do­ing some­thing bad in­ten­tion­ally.

Even more dra­mat­i­cally than that, we can re­verse this to get an­other im­por­tant im­pli­ca­tion! If you’re try­ing to figure out what’s good for a per­son based on the con­se­quences they seem to be seek­ing out, you can’t tell whether that per­son ac­tu­ally wants the con­se­quences of their be­hav­ior (ie the con­se­quences are sub­jec­tively good) or whether they want some­thing else but are go­ing about it in an ir­ra­tional and in­effec­tive way (ie the con­se­quences are sub­jec­tively in­de­ter­mi­nate). This is re­ally bad for AI al­ign­ment.

As a side­note: One might try to solve this prob­lem by just ap­ply­ing Oc­cam’s Ra­zor (doesn’t it seem more likely and more sim­ple that some­one is act­ing in ways re­flec­tive of their prefer­ences rather than in­com­pe­tence?). But whether this ac­tu­ally works seems un­likely to me because

-The pa­per this ar­ti­cle is try­ing to re­but in­di­cates that Oc­cam’s Ra­zor will miss peo­ple’s ac­tual prefer­ences be­cause most prefer­ences are un­likely to be the most sim­ple explanation

-This ar­ti­cle tries to re­but by point­ing out that the pa­per’s ar­gu­ment proves too much by im­ply­ing that physics mod­els are degenerate

-I think that physics mod­els are pretty ob­vi­ously de­gen­er­ate and I’m okay with us hav­ing de­gen­er­ate mod­els of physics. I’m not okay in gen­eral with de­gen­er­ate mod­els of what peo­ple prefer

• If you want to ex­plain why the mul­ti­ple in­ter­pre­ta­tions of QM are de­gen­er­ate,the min­i­mum num­ber of ex­am­ples you need is 2 not 1.

• It’s easy to see that these ex­am­ples, be­ing con­structed from E, are at most slightly more com­plex than the sim­plest pos­si­ble pair, since they could use the sim­plest pair to gen­er­ate E.

Not ac­tu­ally clear. If I had a re­ally long list of fac­to­ri­als (of length n), then per­haps it could be “com­pressed” in terms of f of 1 through n + a de­scrip­tion of f. How­ever, it’s not clear how large n would have to be for this to be, for that de­scrip­tion to be shorter. Thus:

Ex­am­ple: The ini­tial con­di­tions are sim­ply E, and L() doesn’t do any­thing.

is ac­tu­ally sim­pler, un­til E is big enough.

• I don’t fol­low?