Naturalized induction – a challenge for evidential and causal decision theory

As some of you may know, I dis­agree with many of the crit­i­cisms lev­eled against ev­i­den­tial de­ci­sion the­ory (EDT). Most no­tably, I be­lieve that Smok­ing le­sion-type prob­lems don’t re­fute EDT. I also don’t think that EDT’s non-up­date­less­ness leaves a lot of room for dis­agree­ment, given that EDT recom­mends im­me­di­ate self-mod­ifi­ca­tion to up­date­less­ness. How­ever, I do be­lieve there are some is­sues with run-of-the-mill EDT. One of them is nat­u­ral­ized in­duc­tion. It is in fact not only a prob­lem for EDT but also for causal de­ci­sion the­ory (CDT) and most other de­ci­sion the­o­ries that have been pro­posed in- and out­side of academia. It does not af­fect log­i­cal de­ci­sion the­o­ries, how­ever.

The role of nat­u­ral­ized in­duc­tion in de­ci­sion theory

Re­call that EDT pre­scribes tak­ing the ac­tion that max­i­mizes ex­pected util­ity, i.e.

where is the set of available ac­tions, is the agent’s util­ity func­tion, is a set of pos­si­ble world mod­els, rep­re­sents the agent’s past ob­ser­va­tions (which may in­clude in­for­ma­tion the agent has col­lected about it­self). CDT works in a – for the pur­pose of this ar­ti­cle – similar way, ex­cept that in­stead of con­di­tion­ing on in the usual way, it calcu­lates some causal coun­ter­fac­tual, such as Pearl’s do-calcu­lus: . The prob­lem of nat­u­ral­ized in­duc­tion is that of as­sign­ing pos­te­rior prob­a­bil­ities to world mod­els (or or what­ever) when the agent is nat­u­ral­ized, i.e., em­bed­ded into its en­vi­ron­ment.

Con­sider the fol­low­ing ex­am­ple. Let’s say there are 5 world mod­els , each of which has equal prior prob­a­bil­ity. Th­ese world mod­els may be cel­lu­lar au­tomata. Now, the agent makes the ob­ser­va­tion . It turns out that wor­lds and don’t con­tain any agents at all, and con­tains no agent mak­ing the ob­ser­va­tion . The other two world mod­els, on the other hand, are con­sis­tent with . Thus, for and for . Let’s as­sume that the agent has only two ac­tions and that in world model the only agent mak­ing ob­ser­va­tion takes ac­tion and in the only agent mak­ing ob­ser­va­tion takes ac­tion , then and . Thus, if, for ex­am­ple, , an EDT agent would take ac­tion to en­sure that world model is ac­tual.

The main prob­lem of nat­u­ral­ized induction

This ex­am­ple makes it sound as though it’s clear what pos­te­rior prob­a­bil­ities we should as­sign. But in gen­eral, it’s not that easy. For one, there is the is­sue of an­throp­ics: if one world model con­tains more agents ob­serv­ing than an­other world model , does that mean ? Whether CDT and EDT can rea­son cor­rectly about an­throp­ics is an in­ter­est­ing ques­tion in it­self (cf. Bostrom 2002; Arm­strong 2011; Conitzer 2015), but in this post I’ll dis­cuss a differ­ent prob­lem in nat­u­ral­ized in­duc­tion: iden­ti­fy­ing in­stan­ti­a­tions of the agent in a world model.

It seems that the core of the rea­son­ing in the above ex­am­ple was that some wor­lds con­tain an agent ob­serv­ing and oth­ers don’t. So, be­sides an­throp­ics, the cen­tral prob­lem of nat­u­ral­ized in­duc­tion ap­pears to be iden­ti­fy­ing agents mak­ing par­tic­u­lar ob­ser­va­tions in a phys­i­cal­ist world model. While this can of­ten be done un­con­tro­ver­sially – a world con­tain­ing only rocks con­tains no agents –, it seems difficult to spec­ify how it works in gen­eral. The core of the prob­lem is a type mis­match of the “men­tal stuff” (e.g., num­bers or Strings) and the “physics stuff” (atoms, etc.) of the world model. Rob Bens­inger calls this the prob­lem of “build­ing phe­nomenolog­i­cal bridges” (BPB) (also see his Bridge Col­lapse: Re­duc­tion­ism as Eng­ineer­ing Prob­lem).

Sen­si­tivity to phe­nomenolog­i­cal bridges

Some­times, the de­ci­sions made by CDT and EDT are very sen­si­tive to whether a phe­nomenolog­i­cal bridge is built or not. Con­sider the fol­low­ing prob­lem:

One But­ton Per Agent. There are two similar agents with the same util­ity func­tion. Each lives in her own room. Both rooms con­tain a but­ton. If agent 1 pushes her but­ton, it cre­ates 1 utilon. If agent 2 pushes her but­ton, it cre­ates −50 utilons. You know that agent 1 is an in­stan­ti­a­tion of you. Should you press your but­ton?

Note that this is es­sen­tially New­comb’s prob­lem with po­ten­tial an­thropic un­cer­tainty (see the sec­ond para­graph here) – press­ing the but­ton is like two-box­ing, which causally gives you $1k if you are the real agent but costs you $1M if you are the simu­la­tion.

If agent 2 is suffi­ciently similar to you to count as an in­stan­ti­a­tion of you, then you shouldn’t press the but­ton. If, on the other hand, you be­lieve that agent 2 does not qual­ify as some­thing that might be you, then it comes down to what de­ci­sion the­ory you use: CDT would press the but­ton, whereas EDT wouldn’t (as­sum­ing that the two agents are strongly cor­re­lated).

It is easy to spec­ify a prob­lem where EDT, too, is sen­si­tive to the phe­nomenolog­i­cal bridges it builds:

One But­ton Per World. There are two pos­si­ble wor­lds. Each con­tains an agent liv­ing in a room with a but­ton. The two agents are similar and have the same util­ity func­tion. The but­ton in world 1 cre­ates 1 utilon, the but­ton in world 2 cre­ates −50 utilons. You know that the agent in world 1 is an in­stan­ti­a­tion of you. Should you press the but­ton?

If you be­lieve that the agent in world 2 is an in­stan­ti­a­tion of you, both EDT and CDT recom­mend you not to press the but­ton. How­ever, if you be­lieve that the agent in world 2 is not an in­stan­ti­a­tion of you, then nat­u­ral­ized in­duc­tion con­cludes that world 2 isn’t ac­tual and so press­ing the but­ton is safe.

Build­ing phe­nomenolog­i­cal bridges is hard and per­haps confused

So, to solve the prob­lem of nat­u­ral­ized in­duc­tion and ap­ply EDT/​CDT-like de­ci­sion the­o­ries, we need to solve BPB. The be­hav­ior of an agent is quite sen­si­tive to how we solve it, so we bet­ter get it right.

Un­for­tu­nately, I am skep­ti­cal that BPB can be solved. Most im­por­tantly, I sus­pect that state­ments about whether a par­tic­u­lar phys­i­cal pro­cess im­ple­ments a par­tic­u­lar al­gorithm can’t be ob­jec­tively true or false. There seems to be no way of test­ing any such re­la­tions.

Prob­a­bly we should think more about whether BPB re­ally is doomed. There even seems to be some philo­soph­i­cal liter­a­ture that seems worth look­ing into (again, see this Brian To­masik post; cf. some of Hofs­tadter’s writ­ings and the liter­a­tures sur­round­ing “Mary the color sci­en­tist”, the com­pu­ta­tional the­ory of mind, com­pu­ta­tion in cel­lu­lar au­tomata, etc.). But at this point, BPB looks con­fus­ing/​con­fused enough to look into al­ter­na­tives.

As­sign­ing prob­a­bil­ities prag­mat­i­cally?

One might think that one could map be­tween phys­i­cal pro­cesses and al­gorithms on a prag­matic or func­tional ba­sis. That is, one could say that a phys­i­cal pro­cess A im­ple­ments a pro­gram p to the ex­tent that the re­sults of A cor­re­late with the out­put of p. I think this idea goes into the right di­rec­tion and we will later see an im­ple­men­ta­tion of this prag­matic ap­proach that does away with nat­u­ral­ized in­duc­tion. How­ever, it feels in­ap­pro­pri­ate as a solu­tion to BPB. The main prob­lem is that two pro­cesses can cor­re­late in their out­put with­out hav­ing similar sub­jec­tive ex­pe­riences. For in­stance, it is easy to show that Merge sort and Inser­tion sort have the same out­put for any given in­put, even though they have very differ­ent “sub­jec­tive ex­pe­riences”. (Another prob­lem is that the de­pen­dence be­tween two ran­dom vari­ables can­not be ex­pressed as a sin­gle num­ber and so it is un­clear how to trans­late the en­tire joint prob­a­bil­ity dis­tri­bu­tion of the two into a sin­gle num­ber de­ter­min­ing the like­li­hood of the al­gorithm be­ing im­ple­mented by the phys­i­cal pro­cess. That said, if im­ple­ment­ing an al­gorithm is con­ceived of as bi­nary – ei­ther true or false –, one could just re­quire perfect cor­re­la­tion.)

Get­ting rid of the prob­lem of build­ing phe­nomenolog­i­cal bridges

If we adopt an EDT per­spec­tive, it seems clear what we have to do to avoid BPB. If we don’t want to de­cide whether some world con­tains the agent, then it ap­pears that we have to ar­tifi­cially en­sure that the agent views it­self as ex­ist­ing in all pos­si­ble wor­lds. So, we may take ev­ery world model and add a causally sep­a­rate or non-phys­i­cal en­tity rep­re­sent­ing the agent. I’ll call this ad­di­tional agent a log­i­cal zom­bie (l-zom­bie) (a con­cept in­tro­duced by Benja Fallen­stein for a some­what differ­ent de­ci­sion-the­o­ret­i­cal rea­son). To avoid all BPB, we will as­sume that the agent pre­tends that it is the l-zom­bie with cer­tainty. I’ll call this the l-zom­bie var­i­ant of EDT (LZEDT). It is prob­a­bly the most nat­u­ral ev­i­den­tial­ist log­i­cal de­ci­sion the­ory.

Note that in the con­text of LZEDT, l-zom­bies are a fic­tion used for prag­matic rea­sons. LZEDT doesn’t make the meta­phys­i­cal claim that l-zom­bies ex­ist or that you are se­cretly an l-zom­bie. For dis­cus­sions of re­lated meta­phys­i­cal claims, see, e.g., Brian To­masik’s es­say Why Does Physics Ex­ist? and refer­ences therein.

LZEDT rea­sons about the real world via the cor­re­la­tions be­tween the l-zom­bie and the real world. In many cases, LZEDT will act as we ex­pect an EDT agent to act. For ex­am­ple, in One But­ton Per Agent, it doesn’t press the but­ton be­cause that en­sures that nei­ther agent pushes the but­ton.

LZEDT doesn’t need any ad­di­tional an­throp­ics but be­haves like an­thropic de­ci­sion the­ory/​EDT+SSA, which seems alright.

Although LZEDT may as­sign a high prob­a­bil­ity to wor­lds that don’t con­tain any ac­tual agents, it doesn’t op­ti­mize for these wor­lds be­cause it can­not sig­nifi­cantly in­fluence them. So, in a way LZEDT adopts the prag­matic/​func­tional ap­proach (men­tioned above) of, other things equal, giv­ing more weight to wor­lds that con­tain a lot of closely cor­re­lated agents.

LZEDT is au­to­mat­i­cally up­date­less. For ex­am­ple, it gives the money in coun­ter­fac­tual mug­ging. How­ever, it in­vari­ably im­ple­ments a par­tic­u­larly strong ver­sion of up­date­less­ness. It’s not just up­date­less­ness in the way that “son of EDT” (i.e., the de­ci­sion the­ory that EDT would self-mod­ify into) is up­date­less, it is also up­date­less w.r.t. its ex­is­tence. So, for ex­am­ple, in the One But­ton Per World prob­lem, it never pushes the but­ton, be­cause it thinks that the sec­ond world, in which push­ing the but­ton gen­er­ates −50 utilons, could be ac­tual. This is the case even if the sec­ond world very ob­vi­ously con­tains no im­ple­men­ta­tion of LZEDT. Similarly, it is un­clear what LZEDT does in the Coin Flip Creation prob­lem, which EDT seems to get right.

So, LZEDT op­ti­mizes for world mod­els that nat­u­ral­ized in­duc­tion would as­sign zero prob­a­bil­ity to. It should be noted that this is not done on the ba­sis of some ex­otic eth­i­cal claim ac­cord­ing to which non-ac­tual wor­lds de­serve moral weight.

I’m not yet sure what to make of LZEDT. It is el­e­gant in that it effortlessly gets an­throp­ics right, avoids BPB and is up­date­less with­out hav­ing to self-mod­ify. On the other hand, not up­dat­ing on your ex­is­tence is of­ten coun­ter­in­tu­itive and even reg­u­lar up­date­less is, in my opinion, best jus­tified via pre­com­mit­ment. Its ap­proach to avoid­ing BPB isn’t im­mune to crit­i­cism ei­ther. In a way, it is just a very wrong ap­proach to BPB (map­ping your al­gorithm into fic­tions rather than your real in­stan­ti­a­tions). Per­haps it would be more rea­son­able to use reg­u­lar EDT with an ap­proach to BPB that in­ter­prets any­thing as you that could po­ten­tially be you?

Of course, LZEDT also in­her­its some of the po­ten­tial prob­lems of EDT, in par­tic­u­lar, the 5-and-10 prob­lem.

CDT is more de­pen­dant on build­ing phe­nomenolog­i­cal bridges

It seems much harder to get rid of the BPB prob­lem in CDT. Ob­vi­ously, the l-zom­bie ap­proach doesn’t work for CDT: be­cause none of the l-zom­bies has a phys­i­cal in­fluence on the world, “LZCDT” would always be in­differ­ent be­tween all pos­si­ble ac­tions. More gen­er­ally, be­cause CDT ex­erts no con­trol via cor­re­la­tion, it needs to be­lieve that it might be X if it wants to con­trol X’s ac­tions. So, causal de­ci­sion the­ory only works with BPB.

That said, a causal­ist ap­proach to avoid­ing BPB via l-zom­bies could be to tam­per with the defi­ni­tion of causal­ity such that the l-zom­bie “log­i­cally causes” the choices made by in­stan­ti­a­tions in the phys­i­cal world. As far as I un­der­stand it, most peo­ple at MIRI cur­rently pre­fer this fla­vor of log­i­cal de­ci­sion the­ory.


Most of my views on this topic formed in dis­cus­sions with Jo­hannes Treut­lein. I also benefited from dis­cus­sions at AISFP.