Did EDT get it right all along? Introducing yet another medical Newcomb problem

One of the main ar­gu­ments given against Ev­i­den­tial De­ci­sion The­ory (EDT) is that it would “one-box” in med­i­cal New­comb prob­lems. Whether this is the win­ning ac­tion has been a hotly de­bated is­sue on LessWrong. A ma­jor­ity, in­clud­ing ex­perts in the area such as Eliezer Yud­kowsky and Wei Dai, seem to think that one should two-box (See e.g. Yud­kowsky 2010, p.67). Others have tried to ar­gue in fa­vor of EDT by claiming that the win­ning ac­tion would be to one-box, or by offer­ing rea­sons why EDT would in some cases two-box af­ter all. In this blog post, I want to ar­gue that EDT gets it right: one-box­ing is the cor­rect ac­tion in med­i­cal New­comb prob­lems. I in­tro­duce a new thought ex­per­i­ment, the Coin Flip Creation prob­lem, in which I be­lieve the win­ning move is to one-box. This new prob­lem is struc­turally similar to other med­i­cal New­comb prob­lems such as the Smok­ing Le­sion, though it might elicit the in­tu­ition to one-box even in peo­ple who would two-box in some of the other prob­lems. I dis­cuss both how EDT and other de­ci­sion the­o­ries would rea­son in the prob­lem and why peo­ple’s in­tu­itions might di­verge in differ­ent for­mu­la­tions of med­i­cal New­comb prob­lems.

Two kinds of New­comblike problems

There are two differ­ent kinds of New­comblike prob­lems. In New­comb’s origi­nal para­dox, both EDT and Log­i­cal De­ci­sion The­o­ries (LDT), such as Time­less De­ci­sion The­ory (TDT) would one-box and there­fore, un­like CDT, win $1 mil­lion. In med­i­cal New­comb prob­lems, EDT’s and LDT’s de­ci­sions di­verge. This is be­cause in the lat­ter, a (phys­i­cal) causal node that isn’t it­self a de­ci­sion al­gorithm in­fluences both the cur­rent world state and our de­ci­sions – re­sult­ing in a cor­re­la­tion be­tween ac­tion and en­vi­ron­ment but, un­like the origi­nal New­comb, no “log­i­cal” cau­sa­tion.

It’s of­ten un­clear ex­actly how a causal node can ex­ert in­fluence on our de­ci­sions. Does it change our de­ci­sion the­ory, util­ity func­tion, or the in­for­ma­tion available to us? In the case of the Smok­ing Le­sion prob­lem, it seems plau­si­ble that it’s our util­ity func­tion that is be­ing in­fluenced. But then it seems that as soon as we ob­serve our util­ity func­tion (“no­tice a tickle”; see Eells 1982), we lose “ev­i­den­tial power” (Al­mond 2010a, p.39), i.e. there’s noth­ing new to learn about our health by act­ing a cer­tain way if we already know our util­ity func­tion. In any case, as long as we don’t know and there­fore still have the ev­i­den­tial power, I be­lieve we should use it.

The Coin Flip Creation Prob­lem is an adap­tion of Cas­par Oester­held’s “Two-Box­ing Gene” prob­lem and, like the the lat­ter, at­tempts to take New­comb’s origi­nal prob­lem and make it into a med­i­cal New­comb prob­lem, trig­ger­ing the in­tu­ition that we should one-box. In Oester­held’s Two-Box­ing Gene, it’s stated that a cer­tain gene cor­re­lates with our de­ci­sion to one-box or two-box in New­comb’s prob­lem, and that Omega, in­stead of simu­lat­ing our de­ci­sion al­gorithm, just looks at this gene.

Un­for­tu­nately, it’s not speci­fied how the cor­re­la­tion be­tween two-box­ing and the gene arises, cast­ing doubt on whether it’s a med­i­cal New­comb prob­lem at all, and whether other de­ci­sion al­gorithms would dis­agree with one-box­ing. Wei Dai ar­gues that in the Two-Box­ing Gene, if Omega con­ducts a study to find out which genes cor­re­late with which de­ci­sion al­gorithm, then Up­date­less De­ci­sion The­ory (UDT) could just com­mit to one-box­ing and thereby de­ter­mine that all the genes UDT agents have will always cor­re­late with one-box­ing. So in some sense, UDT’s genes will still in­di­rectly con­sti­tute a “simu­la­tion” of UDT’s al­gorithm, and there is a log­i­cal in­fluence be­tween the de­ci­sion to one-box and Omega’s de­ci­sion to put $1 mil­lion in box A. Similar con­sid­er­a­tions could ap­ply for other LDTs.

The Coin Flip Creation prob­lem is in­tended as an ex­am­ple of a prob­lem in which EDT would give the right an­swer, but all causal and log­i­cal de­ci­sion the­o­ries would fail. It works ex­plic­itly through a causal in­fluence on the de­ci­sion the­ory it­self, thus re­duc­ing am­bivalence about the ori­gin of the cor­re­la­tion.

The Coin Flip Creation problem

One day, while pon­der­ing the mer­its and de­mer­its of differ­ent acausal de­ci­sion the­o­ries, you’re vis­ited by Omega, a be­ing as­sumed to pos­sess flawless pow­ers of pre­dic­tion and ab­solute trust­wor­thi­ness. You’re pre­sented with New­comb’s para­dox, but with one ad­di­tional caveat: Omega in­forms you that you weren’t born like a nor­mal hu­man be­ing, but were in­stead cre­ated by Omega. On the day you were born, Omega flipped a coin: If it came up heads, Omega cre­ated you in such a way that you would one-box when pre­sented with the Coin Flip Creation prob­lem, and it put $1 mil­lion in box A. If the coin came up tails, you were cre­ated such that you’d two-box, and Omega didn’t put any money in box A. We don’t know how Omega made sure what your de­ci­sion would be. For all we know, it may have in­serted ei­ther CDT or EDT into your source code, or even just added one hard-coded de­ci­sion rule on top of your messy hu­man brain. Do you choose both boxes, or only box A?

It seems like EDT gets it right: one-box­ing is the win­ning ac­tion here. There’s a cor­re­la­tion be­tween our de­ci­sion to one-box, the coin flip, and Omega’s de­ci­sion to put money in box A. Con­di­tional on us one-box­ing, the prob­a­bil­ity that there is money in box A in­creases, and we “re­ceive the good news” – that is, we dis­cover that the coin must have come up heads, and we thus get the mil­lion dol­lars. In fact, we can be ab­solutely cer­tain of the bet­ter out­come if we one-box. How­ever, the prob­lem per­sists if the cor­re­la­tion be­tween our ac­tions and the con­tent of box A isn’t perfect. As long as the cor­re­la­tion is high enough, it is bet­ter to one-box.

Nev­er­the­less, nei­ther causal nor log­i­cal coun­ter­fac­tu­als seem to im­ply that we can de­ter­mine whether there is money in box A. The coin flip isn’t a de­ci­sion al­gorithm it­self, so we can’t de­ter­mine its out­come. The log­i­cal un­cer­tainty about our own de­ci­sion out­put doesn’t seem to co­in­cide with the em­piri­cal un­cer­tainty about the out­come of the coin flip. In ab­sence of a causal or log­i­cal link be­tween their de­ci­sion and the con­tent of box A, CDT and TDT would two-box.

Up­date­less De­ci­sion Theory

As far as I un­der­stand, UDT would come to a similar con­clu­sion. AlephNeil writes in a post about UDT:

In the Smok­ing Le­sion prob­lem, the pres­ence of a ‘le­sion’ is some­how sup­posed to cause Player’s to choose to smoke (with­out al­ter­ing their util­ity func­tion), which can only mean that in some sense the Player’s source code is ‘par­tially writ­ten’ be­fore the Player can ex­er­cise any con­trol over it. How­ever, UDT wants to ‘wipe the slate clean’ and delete what­ever half-writ­ten non­sense is there be­fore de­cid­ing what code to write.

Ul­ti­mately this means that when UDT en­coun­ters the Smok­ing Le­sion, it sim­ply throws away the sup­posed cor­re­la­tion be­tween the le­sion and the de­ci­sion and acts as though that were never a part of the prob­lem.

This ap­proach seems wrong to me. If we use an al­gorithm that changes our own source code, then this change, too, has been phys­i­cally de­ter­mined and can there­fore cor­re­late with events that aren’t copies of our own de­ci­sion al­gorithm. If UDT rea­sons as though it could just rewrite its own source code and dis­card the cor­re­la­tion with the coin flip al­to­gether, then UDT two-boxes and thus by defi­ni­tion ends up in the world where there is no money in box A.

Note that up­date­less­ness seem­ingly makes no differ­ence in this prob­lem, since it in­volves no a pri­ori de­ci­sion: Be­fore the coin flip, there’s a 50% chance of be­com­ing ei­ther a one-box­ing or a two-box­ing agent. In any case, we can’t do any­thing about the coin flip, and there­fore also can’t in­fluence whether box A con­tains any money.

I am un­cer­tain how UDT works, though, and would be cu­ri­ous about oth­ers peo­ple’s thoughts. Maybe UDT rea­sons that by one-box­ing, it be­comes a de­ci­sion the­ory of the sort that would never be in­stalled into an agent in a tails world, thus ren­der­ing im­pos­si­ble all hy­po­thet­i­cal tails wor­lds with UDT agents in them. But if so, why wouldn’t UDT “one-box” in the Smok­ing Le­sion? As far as the thought ex­per­i­ments are speci­fied, the causal con­nec­tion be­tween coin flip and two-box­ing in the Coin Flip Creation ap­pears to be no differ­ent from the con­nec­tion be­tween gene and smok­ing in the Smok­ing Le­sion.

More adap­ta­tions and differ­ent for­mal­iza­tions of LDTs ex­ist, e.g. Proof-Based De­ci­sion The­ory. I could very well imag­ine that some of those might one-box in the thought ex­per­i­ment I pre­sented. If so, then I’m once again cu­ri­ous as to where the benefits of such de­ci­sion the­o­ries lie in com­par­i­son to plain EDT (aside from up­date­less­ness – see Con­clud­ing thoughts).

Coin Flip Creation, Ver­sion 2

Let’s as­sume UDT would two-box in the Coin Flip Creation. We could al­ter our thought ex­per­i­ment a bit so that UDT would prob­a­bly one-box af­ter all:

The situ­a­tion is iden­ti­cal to the Coin Flip Creation, with one key differ­ence: After Omega flips the coin and cre­ates you with the al­tered de­ci­sion al­gorithm, it ac­tu­ally simu­lates your de­ci­sion, just as in New­comb’s origi­nal para­dox. Only af­ter Omega has de­ter­mined your de­ci­sion via simu­la­tion does it de­cide whether to put money in box A, con­di­tional on your de­ci­sion. Do you choose both boxes, or only box A?

Here is a causal graph for the first and sec­ond ver­sion of the Coin Flip Creation prob­lem. In the first ver­sion, a coin flip de­ter­mines whether there is money in box A. In the sec­ond one, a simu­la­tion of your de­ci­sion al­gorithm de­cides:

Since in Ver­sion 2, there’s a simu­la­tion in­volved, UDT would prob­a­bly one-box. I find this to be a cu­ri­ous con­clu­sion. The situ­a­tion re­mains ex­actly the same – we can rule out any changes in the cor­re­la­tion be­tween our de­ci­sion and our pay­off. It seems con­fus­ing to me, then, that the op­ti­mal de­ci­sion should be a differ­ent one.

Copy-al­tru­ism and multi-worlds

The Coin Flip Creation prob­lem as­sumes a sin­gle world and an ego­is­tic agent. In the fol­low­ing, I want to in­clude a short dis­cus­sion of how the Coin Flip Creation would play out in a multi-world en­vi­ron­ment.

Sup­pose Omega’s coin is based on a quan­tum num­ber gen­er­a­tor and pro­duces 50% heads wor­lds and 50% tails wor­lds. If we’re copy-ego­ists, EDT still recom­mends to one-box, since do­ing so would re­veal to us that we’re in one of the branches in which the coin came up heads. If we’re copy-al­tru­ists, then in prac­tice, we’d prob­a­bly care a bit less about copies whose de­ci­sion al­gorithms have been tam­pered with, since they would make less effec­tive use of the re­sources they gain than we our­selves would (i.e. their de­ci­sion al­gorithm some­times be­haves differ­ently). But in the­ory, if we care about all the copies equally, we should be in­differ­ent with re­spect to one-box­ing or two-box­ing, since there will always be 50% of us in ei­ther of the wor­lds no mat­ter what we do. The two groups always take the op­po­site ac­tion. The only thing we can change is whether our own copy be­longs to the tails or the heads group.

To sum­ma­rize, UDT and EDT would both be in­differ­ent in the al­tru­is­tic multi-world case, but UDT would (pre­sum­ably) two-box, and EDT would one-box, in both the copy-ego­is­tic multi-wor­lds and in the sin­gle-world case.

“But I don’t have a choice”

There seems to be an es­pe­cially strong in­tu­ition of “ab­sence of free will” in­her­ent to the Coin Flip Creation prob­lem. When pre­sented with the prob­lem, many re­spond that if some­one had cre­ated their source code, they didn’t have any choice to be­gin with. But that’s the ex­act situ­a­tion in which we all find our­selves at all times! Our de­ci­sion ar­chi­tec­ture and choices are de­ter­mined by physics, just like a hy­po­thet­i­cal AI’s source code, and all of our choices will thus be de­ter­mined by our “cre­ator.” When we’re con­fronted with the two boxes, we know that our de­ci­sions are pre­de­ter­mined, just like ev­ery word of this blog­post has been pre­de­ter­mined. But that knowl­edge alone won’t help us make any de­ci­sion. As far as I’m aware, even an agent with com­plete knowl­edge of its own source code would have to treat its own de­ci­sion out­puts as un­cer­tain, or it would fail to im­ple­ment a de­ci­sion al­gorithm that takes coun­ter­fac­tu­als into ac­count.

Note that our de­ci­sion in the Coin Flip Creation is also no less de­ter­mined than in New­comb’s para­dox. In both cases, the pre­dic­tion has been made, and physics will guide our thoughts and our de­ci­sion in a de­ter­minis­tic and pre­dictable man­ner. Nev­er­the­less, we can still as­sume that we have a choice un­til we make our de­ci­sion, at which point we merely “find out” what has been our des­tiny all along.

Con­clud­ing thoughts

I hope that the Coin Flip Creation mo­ti­vates some peo­ple to re­con­sider EDT’s an­swers in New­comblike prob­lems. A thought ex­per­i­ment some­what similar to the Coin Flip Creation can be found in Arif Ahmed 2014.

Of course, the par­tic­u­lar setup of the Coin Flip Creation means it isn’t di­rectly rele­vant to the ques­tion of which de­ci­sion the­ory we should pro­gram into an AI. We ob­vi­ously wouldn’t flip a coin be­fore cre­at­ing an AI. Also, the situ­a­tion doesn’t re­ally look like a de­ci­sion prob­lem from the out­side; an im­par­tial ob­server would just see Omega forc­ing you to pick ei­ther A or B. Still, the ex­am­ple demon­strates that from the in­side view, ev­i­dence from the ac­tions we take can help us achieve our goals bet­ter. Why shouldn’t we use this in­for­ma­tion? And if ev­i­den­tial knowl­edge can help us, why shouldn’t we al­low a fu­ture AI to take it into ac­count? In any case, I’m not overly con­fi­dent in my anal­y­sis and would be glad to have any mis­takes pointed out to me.

Med­i­cal New­comb is also not the only class of prob­lems that challenge EDT. Ev­i­den­tial black­mail is an ex­am­ple of a differ­ent prob­lem, wherein giv­ing the agent ac­cess to spe­cific com­pro­mis­ing in­for­ma­tion is used to ex­tract money from EDT agents. The prob­lem at­tacks EDT from a differ­ent an­gle, though: namely by ex­ploit­ing it’s lack of up­date­less­ness, similar to the challenges in Trans­par­ent New­comb, Parfit’s Hitch­hiker, Coun­ter­fac­tual Mug­ging, and the Ab­sent-Minded Driver. I plan to ad­dress ques­tions re­lated to up­date­less­ness, e.g. whether it makes sense to give in to ev­i­den­tial black­mail if you already have ac­cess to the in­for­ma­tion and haven’t pre­com­mit­ted not to give in, at a later point.


I wrote this post while work­ing for the Foun­da­tional Re­search In­sti­tute, which is now the Cen­ter on Long-Term Risk.