# Naturalized induction – a challenge for evidential and causal decision theory

As some of you may know, I dis­agree with many of the crit­i­cisms lev­eled against ev­i­den­tial de­ci­sion the­ory (EDT). Most no­tably, I be­lieve that Smok­ing le­sion-type prob­lems don’t re­fute EDT. I also don’t think that EDT’s non-up­date­less­ness leaves a lot of room for dis­agree­ment, given that EDT recom­mends im­me­di­ate self-mod­ifi­ca­tion to up­date­less­ness. How­ever, I do be­lieve there are some is­sues with run-of-the-mill EDT. One of them is nat­u­ral­ized in­duc­tion. It is in fact not only a prob­lem for EDT but also for causal de­ci­sion the­ory (CDT) and most other de­ci­sion the­o­ries that have been pro­posed in- and out­side of academia. It does not af­fect log­i­cal de­ci­sion the­o­ries, how­ever.

# The role of nat­u­ral­ized in­duc­tion in de­ci­sion theory

Re­call that EDT pre­scribes tak­ing the ac­tion that max­i­mizes ex­pected util­ity, i.e.

$\underset{a\in A}{\mathrm{argmax}} ~\mathbb{E}[U(w)|a,o] = \underset{a\in A}{\mathrm{argmax}} \sum_{w\in W} P(w|a,o) U(w),$

where $A$ is the set of available ac­tions, $U$ is the agent’s util­ity func­tion, $W$ is a set of pos­si­ble world mod­els, $o$ rep­re­sents the agent’s past ob­ser­va­tions (which may in­clude in­for­ma­tion the agent has col­lected about it­self). CDT works in a – for the pur­pose of this ar­ti­cle – similar way, ex­cept that in­stead of con­di­tion­ing on $a$ in the usual way, it calcu­lates some causal coun­ter­fac­tual, such as Pearl’s do-calcu­lus: $P(w|do(a),o)$. The prob­lem of nat­u­ral­ized in­duc­tion is that of as­sign­ing pos­te­rior prob­a­bil­ities to world mod­els $P(w|a,o)$ (or $P(w|do(a),o)$ or what­ever) when the agent is nat­u­ral­ized, i.e., em­bed­ded into its en­vi­ron­ment.

Con­sider the fol­low­ing ex­am­ple. Let’s say there are 5 world mod­els $W=\{w_1,...,w_5\}$, each of which has equal prior prob­a­bil­ity. Th­ese world mod­els may be cel­lu­lar au­tomata. Now, the agent makes the ob­ser­va­tion $o$. It turns out that wor­lds $w_1$ and $w_2$ don’t con­tain any agents at all, and $w_3$ con­tains no agent mak­ing the ob­ser­va­tion $o$. The other two world mod­els, on the other hand, are con­sis­tent with $o$. Thus, $P(w_i\mid o)=0$ for $i=1,2,3$ and $P(w_i\mid o)=\frac{1}{2}$ for $i=4,5$. Let’s as­sume that the agent has only two ac­tions $A=\{a_1,a_2\}$ and that in world model $w_4$ the only agent mak­ing ob­ser­va­tion $o$ takes ac­tion $a_1$ and in $w_5$ the only agent mak­ing ob­ser­va­tion $o$ takes ac­tion $a_2$, then $P(w_4\mid a_1)=1=P(w_5\mid a_2)$ and $P(w_5\mid a_1)=0=P(w_4\mid a_2)$. Thus, if, for ex­am­ple, $U(w_5)>U(w_4)$, an EDT agent would take ac­tion $a_2$ to en­sure that world model $w_5$ is ac­tual.

# The main prob­lem of nat­u­ral­ized induction

This ex­am­ple makes it sound as though it’s clear what pos­te­rior prob­a­bil­ities we should as­sign. But in gen­eral, it’s not that easy. For one, there is the is­sue of an­throp­ics: if one world model $w_1$ con­tains more agents ob­serv­ing $o$ than an­other world model $w_2$, does that mean $P(w_1\mid o) > P(w_2\mid o)$? Whether CDT and EDT can rea­son cor­rectly about an­throp­ics is an in­ter­est­ing ques­tion in it­self (cf. Bostrom 2002; Arm­strong 2011; Conitzer 2015), but in this post I’ll dis­cuss a differ­ent prob­lem in nat­u­ral­ized in­duc­tion: iden­ti­fy­ing in­stan­ti­a­tions of the agent in a world model.

It seems that the core of the rea­son­ing in the above ex­am­ple was that some wor­lds con­tain an agent ob­serv­ing $o$ and oth­ers don’t. So, be­sides an­throp­ics, the cen­tral prob­lem of nat­u­ral­ized in­duc­tion ap­pears to be iden­ti­fy­ing agents mak­ing par­tic­u­lar ob­ser­va­tions in a phys­i­cal­ist world model. While this can of­ten be done un­con­tro­ver­sially – a world con­tain­ing only rocks con­tains no agents –, it seems difficult to spec­ify how it works in gen­eral. The core of the prob­lem is a type mis­match of the “men­tal stuff” (e.g., num­bers or Strings) $o$ and the “physics stuff” (atoms, etc.) of the world model. Rob Bens­inger calls this the prob­lem of “build­ing phe­nomenolog­i­cal bridges” (BPB) (also see his Bridge Col­lapse: Re­duc­tion­ism as Eng­ineer­ing Prob­lem).

# Sen­si­tivity to phe­nomenolog­i­cal bridges

Some­times, the de­ci­sions made by CDT and EDT are very sen­si­tive to whether a phe­nomenolog­i­cal bridge is built or not. Con­sider the fol­low­ing prob­lem:

One But­ton Per Agent. There are two similar agents with the same util­ity func­tion. Each lives in her own room. Both rooms con­tain a but­ton. If agent 1 pushes her but­ton, it cre­ates 1 utilon. If agent 2 pushes her but­ton, it cre­ates −50 utilons. You know that agent 1 is an in­stan­ti­a­tion of you. Should you press your but­ton?

Note that this is es­sen­tially New­comb’s prob­lem with po­ten­tial an­thropic un­cer­tainty (see the sec­ond para­graph here) – press­ing the but­ton is like two-box­ing, which causally gives you $1k if you are the real agent but costs you$1M if you are the simu­la­tion.

If agent 2 is suffi­ciently similar to you to count as an in­stan­ti­a­tion of you, then you shouldn’t press the but­ton. If, on the other hand, you be­lieve that agent 2 does not qual­ify as some­thing that might be you, then it comes down to what de­ci­sion the­ory you use: CDT would press the but­ton, whereas EDT wouldn’t (as­sum­ing that the two agents are strongly cor­re­lated).

It is easy to spec­ify a prob­lem where EDT, too, is sen­si­tive to the phe­nomenolog­i­cal bridges it builds:

One But­ton Per World. There are two pos­si­ble wor­lds. Each con­tains an agent liv­ing in a room with a but­ton. The two agents are similar and have the same util­ity func­tion. The but­ton in world 1 cre­ates 1 utilon, the but­ton in world 2 cre­ates −50 utilons. You know that the agent in world 1 is an in­stan­ti­a­tion of you. Should you press the but­ton?

If you be­lieve that the agent in world 2 is an in­stan­ti­a­tion of you, both EDT and CDT recom­mend you not to press the but­ton. How­ever, if you be­lieve that the agent in world 2 is not an in­stan­ti­a­tion of you, then nat­u­ral­ized in­duc­tion con­cludes that world 2 isn’t ac­tual and so press­ing the but­ton is safe.

# Build­ing phe­nomenolog­i­cal bridges is hard and per­haps confused

So, to solve the prob­lem of nat­u­ral­ized in­duc­tion and ap­ply EDT/​CDT-like de­ci­sion the­o­ries, we need to solve BPB. The be­hav­ior of an agent is quite sen­si­tive to how we solve it, so we bet­ter get it right.

Un­for­tu­nately, I am skep­ti­cal that BPB can be solved. Most im­por­tantly, I sus­pect that state­ments about whether a par­tic­u­lar phys­i­cal pro­cess im­ple­ments a par­tic­u­lar al­gorithm can’t be ob­jec­tively true or false. There seems to be no way of test­ing any such re­la­tions.

Prob­a­bly we should think more about whether BPB re­ally is doomed. There even seems to be some philo­soph­i­cal liter­a­ture that seems worth look­ing into (again, see this Brian To­masik post; cf. some of Hofs­tadter’s writ­ings and the liter­a­tures sur­round­ing “Mary the color sci­en­tist”, the com­pu­ta­tional the­ory of mind, com­pu­ta­tion in cel­lu­lar au­tomata, etc.). But at this point, BPB looks con­fus­ing/​con­fused enough to look into al­ter­na­tives.

## As­sign­ing prob­a­bil­ities prag­mat­i­cally?

One might think that one could map be­tween phys­i­cal pro­cesses and al­gorithms on a prag­matic or func­tional ba­sis. That is, one could say that a phys­i­cal pro­cess A im­ple­ments a pro­gram p to the ex­tent that the re­sults of A cor­re­late with the out­put of p. I think this idea goes into the right di­rec­tion and we will later see an im­ple­men­ta­tion of this prag­matic ap­proach that does away with nat­u­ral­ized in­duc­tion. How­ever, it feels in­ap­pro­pri­ate as a solu­tion to BPB. The main prob­lem is that two pro­cesses can cor­re­late in their out­put with­out hav­ing similar sub­jec­tive ex­pe­riences. For in­stance, it is easy to show that Merge sort and Inser­tion sort have the same out­put for any given in­put, even though they have very differ­ent “sub­jec­tive ex­pe­riences”. (Another prob­lem is that the de­pen­dence be­tween two ran­dom vari­ables can­not be ex­pressed as a sin­gle num­ber and so it is un­clear how to trans­late the en­tire joint prob­a­bil­ity dis­tri­bu­tion of the two into a sin­gle num­ber de­ter­min­ing the like­li­hood of the al­gorithm be­ing im­ple­mented by the phys­i­cal pro­cess. That said, if im­ple­ment­ing an al­gorithm is con­ceived of as bi­nary – ei­ther true or false –, one could just re­quire perfect cor­re­la­tion.)

# Get­ting rid of the prob­lem of build­ing phe­nomenolog­i­cal bridges

If we adopt an EDT per­spec­tive, it seems clear what we have to do to avoid BPB. If we don’t want to de­cide whether some world con­tains the agent, then it ap­pears that we have to ar­tifi­cially en­sure that the agent views it­self as ex­ist­ing in all pos­si­ble wor­lds. So, we may take ev­ery world model and add a causally sep­a­rate or non-phys­i­cal en­tity rep­re­sent­ing the agent. I’ll call this ad­di­tional agent a log­i­cal zom­bie (l-zom­bie) (a con­cept in­tro­duced by Benja Fallen­stein for a some­what differ­ent de­ci­sion-the­o­ret­i­cal rea­son). To avoid all BPB, we will as­sume that the agent pre­tends that it is the l-zom­bie with cer­tainty. I’ll call this the l-zom­bie var­i­ant of EDT (LZEDT). It is prob­a­bly the most nat­u­ral ev­i­den­tial­ist log­i­cal de­ci­sion the­ory.

Note that in the con­text of LZEDT, l-zom­bies are a fic­tion used for prag­matic rea­sons. LZEDT doesn’t make the meta­phys­i­cal claim that l-zom­bies ex­ist or that you are se­cretly an l-zom­bie. For dis­cus­sions of re­lated meta­phys­i­cal claims, see, e.g., Brian To­masik’s es­say Why Does Physics Ex­ist? and refer­ences therein.

LZEDT rea­sons about the real world via the cor­re­la­tions be­tween the l-zom­bie and the real world. In many cases, LZEDT will act as we ex­pect an EDT agent to act. For ex­am­ple, in One But­ton Per Agent, it doesn’t press the but­ton be­cause that en­sures that nei­ther agent pushes the but­ton.

LZEDT doesn’t need any ad­di­tional an­throp­ics but be­haves like an­thropic de­ci­sion the­ory/​EDT+SSA, which seems alright.

Although LZEDT may as­sign a high prob­a­bil­ity to wor­lds that don’t con­tain any ac­tual agents, it doesn’t op­ti­mize for these wor­lds be­cause it can­not sig­nifi­cantly in­fluence them. So, in a way LZEDT adopts the prag­matic/​func­tional ap­proach (men­tioned above) of, other things equal, giv­ing more weight to wor­lds that con­tain a lot of closely cor­re­lated agents.

LZEDT is au­to­mat­i­cally up­date­less. For ex­am­ple, it gives the money in coun­ter­fac­tual mug­ging. How­ever, it in­vari­ably im­ple­ments a par­tic­u­larly strong ver­sion of up­date­less­ness. It’s not just up­date­less­ness in the way that “son of EDT” (i.e., the de­ci­sion the­ory that EDT would self-mod­ify into) is up­date­less, it is also up­date­less w.r.t. its ex­is­tence. So, for ex­am­ple, in the One But­ton Per World prob­lem, it never pushes the but­ton, be­cause it thinks that the sec­ond world, in which push­ing the but­ton gen­er­ates −50 utilons, could be ac­tual. This is the case even if the sec­ond world very ob­vi­ously con­tains no im­ple­men­ta­tion of LZEDT. Similarly, it is un­clear what LZEDT does in the Coin Flip Creation prob­lem, which EDT seems to get right.

So, LZEDT op­ti­mizes for world mod­els that nat­u­ral­ized in­duc­tion would as­sign zero prob­a­bil­ity to. It should be noted that this is not done on the ba­sis of some ex­otic eth­i­cal claim ac­cord­ing to which non-ac­tual wor­lds de­serve moral weight.

I’m not yet sure what to make of LZEDT. It is el­e­gant in that it effortlessly gets an­throp­ics right, avoids BPB and is up­date­less with­out hav­ing to self-mod­ify. On the other hand, not up­dat­ing on your ex­is­tence is of­ten coun­ter­in­tu­itive and even reg­u­lar up­date­less is, in my opinion, best jus­tified via pre­com­mit­ment. Its ap­proach to avoid­ing BPB isn’t im­mune to crit­i­cism ei­ther. In a way, it is just a very wrong ap­proach to BPB (map­ping your al­gorithm into fic­tions rather than your real in­stan­ti­a­tions). Per­haps it would be more rea­son­able to use reg­u­lar EDT with an ap­proach to BPB that in­ter­prets any­thing as you that could po­ten­tially be you?

Of course, LZEDT also in­her­its some of the po­ten­tial prob­lems of EDT, in par­tic­u­lar, the 5-and-10 prob­lem.

## CDT is more de­pen­dant on build­ing phe­nomenolog­i­cal bridges

It seems much harder to get rid of the BPB prob­lem in CDT. Ob­vi­ously, the l-zom­bie ap­proach doesn’t work for CDT: be­cause none of the l-zom­bies has a phys­i­cal in­fluence on the world, “LZCDT” would always be in­differ­ent be­tween all pos­si­ble ac­tions. More gen­er­ally, be­cause CDT ex­erts no con­trol via cor­re­la­tion, it needs to be­lieve that it might be X if it wants to con­trol X’s ac­tions. So, causal de­ci­sion the­ory only works with BPB.

That said, a causal­ist ap­proach to avoid­ing BPB via l-zom­bies could be to tam­per with the defi­ni­tion of causal­ity such that the l-zom­bie “log­i­cally causes” the choices made by in­stan­ti­a­tions in the phys­i­cal world. As far as I un­der­stand it, most peo­ple at MIRI cur­rently pre­fer this fla­vor of log­i­cal de­ci­sion the­ory.

# Acknowledgements

Most of my views on this topic formed in dis­cus­sions with Jo­hannes Treut­lein. I also benefited from dis­cus­sions at AISFP.

• Last time I looked at a post of yours about this, you got some­thing very ba­sic wrong. That is:

Note that in non-New­comb-like situ­a­tions, P(s|do(a)) and P(s|a) yield the same re­sult, see ch. 3.2.2 of Pearl’s Causal­ity.

is wrong. You never replied. Why do you post if you don’t en­gage with crit­i­cism? Are you “write-only”?

• I apol­o­gize for not re­ply­ing to your ear­lier com­ment. I do en­gage with com­ments a lot. E.g., I re­call that your com­ment on that post con­tained a link to a ~1h talk that I watched af­ter read­ing it. There are many ob­vi­ous rea­sons that some­times cause me not re­ply to com­ments, e.g. if I don’t feel like I have any­thing in­ter­est­ing to say, or if the com­ment in­di­cates lack of in­ter­est in dis­cus­sion (e.g., your “I am not ac­tu­ally here, but … Ok, dis­ap­pear­ing again”). Any­way, I will re­ply your com­ment now. Sorry again for not do­ing so ear­lier.

• It seems to me that the origi­nal UDT already in­cor­po­rated this type of ap­proach to solv­ing nat­u­ral­ized in­duc­tion. See here and here for pre­vi­ous dis­cus­sions. Also, UDT, as origi­nally de­scribed, was in­tended as a var­i­ant of EDT (where the “ac­tion” in EDT is in­ter­preted as “this source code im­ple­ments this policy (in­put/​out­put map)”. MIRI peo­ple seem to mostly pre­fer a causal var­i­ant of UDT, but my po­si­tion has always been that the ev­i­den­tial var­i­ant is sim­pler so let’s go with that un­til there’s con­clu­sive ev­i­dence that the ev­i­den­tial var­i­ant is not good enough.

LZEDT seems to be more com­plex than UDT but it’s not clear to me that it solves any ad­di­tional prob­lems. If it’s sup­posed to have ad­van­tages over UDT, can you ex­plain what those are?

• I hadn’t seen these par­tic­u­lar dis­cus­sions, al­though I was aware of the fact that UDT and other log­i­cal de­ci­sion the­o­ries avoid build­ing phe­nomenolog­i­cal bridges in this way. I also knew that oth­ers (e.g., the MIRI peo­ple) were aware of this.

I didn’t know you preferred a purely ev­i­den­tial var­i­ant of UDT. Thanks for the clar­ifi­ca­tion!

As for the differ­ences be­tween LZEDT and UDT:

• My un­der­stand­ing was that there is no full for­mal speci­fi­ca­tion of UDT. The coun­ter­fac­tu­als seem to be given by some un­speci­fied math­e­mat­i­cal in­tu­ition mod­ule. LZEDT, on the other hand, seems easy to spec­ify for­mally (as­sum­ing a solu­tion to nat­u­ral­ized in­duc­tion). (That said, if UDT is just the up­date­less-ev­i­den­tial­ist fla­vor of log­i­cal de­ci­sion the­ory, it should be easy to spec­ify as well. I haven’t seen peo­ple UDT char­ac­ter­ize in this way, but per­haps this is be­cause MIRI’s con­cep­tion of UDT differs from yours?)

• LZEDT isn’t log­i­cally up­date­less.

• LZEDT doesn’t do ex­plicit op­ti­miza­tion of poli­cies. (Ex­plicit policy op­ti­miza­tion is the differ­ence be­tween UDT1.1 and UDT1.0, right?)

(Based on a com­ment you made on an ear­lier past post of mine, it seems that UDT and LZEDT rea­son similarly about med­i­cal New­comb prob­lems.)

Any­way, my rea­son for writ­ing this isn’t so much that LZEDT differs from other de­ci­sion the­o­ries. (As I say in the post, I ac­tu­ally think LZEDT is equiv­a­lent to the most nat­u­ral ev­i­den­tial­ist log­i­cal de­ci­sion the­ory — which has been con­sid­ered by MIRI at least.) In­stead, it’s that I have a differ­ent mo­ti­va­tion for propos­ing it. My un­der­stand­ing is that the LWers’ search for new de­ci­sion the­o­ries was not driven by the BPB is­sue (al­though some of the mo­ti­va­tions you listed in 2012 are re­lated to it). In­stead it seems that peo­ple aban­doned EDT — the most ob­vi­ous ap­proach — mainly for rea­sons that I don’t en­dorse. E.g., the TDT pa­per seems to give med­i­cal New­comb prob­lems as the main ar­gu­ment against EDT. It may well be that look­ing be­yond EDT to avoid nat­u­ral­ized in­duc­tion/​BPB leads to the same de­ci­sion the­o­ries as these other mo­ti­va­tions.

• Have you seen the “XOR Black­mail” in the Death in Da­m­as­cus pa­per? That’s a much bet­ter prob­lem with EDT than the smok­ing le­sion prob­lem, in my view. And it’s sim­ple to de­scribe:

An agent has been alerted to a ru­mor that her house has a ter­rible ter­mite in­fes­ta­tion, which would cost her $1,000,000 in dam­ages. She doesn’t know whether this ru­mor is true. A greedy and ac­cu­rate pre­dic­tor with a strong rep­u­ta­tion for hon­esty has learned whether or not it’s true, and drafts a let­ter: I know whether or not you have ter­mites, and I have sent you this let­ter iff ex­actly one of the fol­low­ing is true: (i) the ru­mor is false, and you are go­ing to pay me$1,000 upon re­ceiv­ing this let­ter; or (ii) the ru­mor is true, and you will not pay me upon re­ceiv­ing this let­ter.

The pre­dic­tor then pre­dicts what the agent would do upon re­ceiv­ing the let­ter, and sends the agent the let­ter iff ex­actly one of (i) or (ii) is true. Thus, the claim made by the let­ter is true. As­sume the agent re­ceives the let­ter. Should she pay up?

• EDT doesn’t pay if it is given the choice to com­mit to not pay­ing ex-ante (be­fore re­ceiv­ing the let­ter). So the thought ex­per­i­ment might be an ar­gu­ment against or­di­nary EDT, but not against up­date­less EDT. If one takes the pos­si­bil­ity of an­thropic un­cer­tainty into ac­count, then even or­di­nary EDT might not pay the black­mailer. See also Abram Dem­ski’s post about the Smok­ing Le­sion. Ahmed and Price defend EDT along similar lines in a re­sponse to a re­lated thought ex­per­i­ment by Frank Arntze­nius.

• Yes, this demon­strates that EDT is also un­sta­ble un­der self mod­ifi­ca­tion, just as CDT is. And try­ing to build an up­date­less EDT is ex­actly what UDT is do­ing.

• If state­ments about whether an al­gorithm ex­ists are not ob­jec­tively true or false, there is also no ob­jec­tively cor­rect de­ci­sion the­ory, since the ex­is­tence of agents is not ob­jec­tive in the first place. Of course you might even agree with this but con­sider it not to be an ob­jec­tion, since you can just say that de­ci­sion the­ory is some­thing we want to do, not some­thing ob­jec­tive.

• Yes, I share the im­pres­sion that the BPB prob­lem im­plies some amount of de­ci­sion the­ory rel­a­tivism. That said, one could ar­gue that de­ci­sion the­o­ries can­not be ob­jec­tively cor­rect, any­way. In most ar­eas, state­ments can only be jus­tified rel­a­tive to some foun­da­tion. Prob­a­bil­ity as­sign­ments are cor­rect rel­a­tive to a prior, the truth of the­o­rems de­pends on ax­ioms, and whether you should take some ac­tion de­pends on your goals (or meta-goals). Pri­ors, ax­ioms, and goals them­selves, on the other hand, can­not be jus­tified (un­less you have some meta-pri­ors, meta-ax­ioms, etc., but I think the chain as to end at some point, see https://​​en.wikipe­dia.org/​​wiki/​​Regress_ar­gu­ment ). Per­haps de­ci­sion the­o­ries are similar to pri­ors, ax­ioms and ter­mi­nal val­ues?

• I agree that any chain of jus­tifi­ca­tion will have to come to an end at some point, cer­tainly in prac­tice and pre­sum­ably in prin­ci­ple. But it does not fol­low that the thing at the be­gin­ning which has no ad­di­tional jus­tifi­ca­tion is not ob­jec­tively cor­rect or in­cor­rect. The typ­i­cal re­al­ist re­sponse in all of these cases, with which I agree, is that your start­ing point is cor­rect or in­cor­rect by its re­la­tion­ship with re­al­ity, not by a re­la­tion­ship to some jus­tifi­ca­tion. Of course if it is re­ally your start­ing point, you will not be able to prove that it is cor­rect or in­cor­rect. That does not mean it is not one or the other, un­less you are as­sum­ing from the be­gin­ning that none of your start­ing points have any re­la­tion­ship at all with re­al­ity. But in that case, it would be equally rea­son­able to con­clude that your start­ing points are ob­jec­tively in­cor­rect.

Let me give some ex­am­ples:

An ax­iom: a state­ment can­not be both true and false in the same way. It does not seem pos­si­ble to prove this, since if it is open to ques­tion, any­thing you say while try­ing to prove it, even if you think it true, might also be false. But if this is the way re­al­ity ac­tu­ally works, then it is ob­jec­tively cor­rect even though you can­not prove that it is. Say­ing that it can­not be ob­jec­tively cor­rect be­cause you can­not prove it, in this case, seems similar to say­ing that there is no such thing as re­al­ity—in other words, again, say­ing that your ax­ioms have no re­la­tion­ship at all to re­al­ity.

A prior: if there are three pos­si­bil­ities and noth­ing gives me rea­son to sus­pect one more than an­other, then each has a prob­a­bil­ity of 13. Math­e­mat­i­cally it is pos­si­ble to prove this, but in an­other sense there is noth­ing to prove: it re­ally just says that if there are three equal pos­si­bil­ities, they have to be con­sid­ered as equal pos­si­bil­ities and not as un­equal ones. In that sense it is ex­actly like the above ax­iom: if re­al­ity is the way the ax­iom says, it is also the way this prior says, even though no one can prove it.

A ter­mi­nal goal: con­tin­u­ing to ex­ist. A goal is what some­thing tends to­wards. Every­thing tends to ex­ist and does not tend to not ex­ist—and this is nec­es­sar­ily so, ex­actly be­cause of the above ax­iom. If a thing ex­ists, it ex­ists and does not not ex­ist—and it is just an­other way of de­scribing this to say, “Ex­ist­ing things tend to ex­ist.” Again, as with the case of the prior, there is some­thing like an ar­gu­ment here, but not re­ally. Once again, though, even if you can­not es­tab­lish the goal by refer­ence to some ear­lier goal, the goal is an ob­jec­tive goal by re­la­tion­ship with re­al­ity: this is how ten­den­cies ac­tu­ally work in re­al­ity.

• How­ever, if you be­lieve that the agent in world 2 is not an in­stan­ti­a­tion of you, then nat­u­ral­ized in­duc­tion con­cludes that world 2 isn’t ac­tual and so press­ing the but­ton is safe.

By “isn’t ac­tual” do you just mean that the agent isn’t in world 2? World 2 might still ex­ist, though?

• No, I ac­tu­ally mean that world 2 doesn’t ex­ist. In this ex­per­i­ment, the agent be­lieves that ei­ther world 1 or world 2 is ac­tual and that they can­not be ac­tual at the same time. So, if the agent thinks that it is in world 1, world 2 doesn’t ex­ist.

• I just re­mem­bered that in Naive TDT, Bayes nets, and coun­ter­fac­tual mug­ging, Stu­art Arm­strong made the point that it shouldn’t mat­ter whether you are simu­lated (in a way that you might be the simu­la­tion) or just pre­dicted (in such a way that you don’t be­lieve that you could be the simu­la­tion).