# Open-Box Newcomb’s Problem and the limitations of the Erasure framing

One of the most con­fus­ing as­pects of the Era­sure Ap­proach to New­comb’s prob­lem is that in Open-Box New­comb’s it re­quires you to for­get that you’ve seen that the box is full. This is re­ally a strange thing to do so it de­serves fur­ther ex­pla­na­tion. And as we’ll see, this might not be the best way to think about what it hap­pen­ing.

Let’s be­gin by re­cap­ping the prob­lem. In a room there are two boxes, with one-con­tain­ing \$1000 and the other be­ing a trans­par­ent box that con­tains ei­ther noth­ing or \$1 mil­lion. Be­fore you en­tered the room, a perfect pre­dic­tor pre­dicted what you would do if you saw \$1 mil­lion in the trans­par­ent box. If it pre­dicted that you would one-boxed, then it put \$1 mil­lion in the trans­par­ent box, oth­er­wise it left the box empty. If you can see \$1 mil­lion in the trans­par­ent box, which choice should you pick?

The ar­gu­ment I pro­vided be­fore was as fol­lows: If you see a full box, then you must be go­ing to one-box if the pre­dic­tor re­ally is perfect. So there would only be one de­ci­sion con­sis­tent with the prob­lem de­scrip­tion and to pro­duce a non-triv­ial de­ci­sion the­ory prob­lem we’d have to erase some in­for­ma­tion. And the most log­i­cal thing to erase would be what you see in the box.

I still mostly agree with this ar­gu­ment, but I feel the rea­son­ing is a bit sparse, so this post will try to break it down in more de­tail. I’ll just note in ad­vance that when you start break­ing it down, you end up perform­ing a kind of psy­cholog­i­cal or so­cial anal­y­sis. How­ever, I think this is in­evitable when deal­ing with am­bigu­ous prob­lems; if you could provide a math­e­mat­i­cal proof of what an am­bigu­ous prob­lem meant then it wouldn’t be am­bigu­ous.

As I noted in De­con­fus­ing Log­i­cal Coun­ter­fac­tu­als, there is only one choice con­sis­tent with the prob­lem (one-box­ing), so in or­der to an­swer this ques­tion we’ll have to con­struct some coun­ter­fac­tu­als. A good way to view this is that in­stead of ask­ing what choice should the agent make, we will ask whether the agent made the best choice.

Now, in or­der to con­struct these coun­ter­fac­tu­als we’ll have to con­sider situ­a­tions with at least one of the above as­sump­tions miss­ing. Now we want to con­sider coun­ter­fac­tu­als in­volv­ing both one-box­ing and two-box­ing. Un­for­tu­nately, it is im­pos­si­ble for a two-boxer to a) see \$1 mil­lion in a box if b) the money is only in the box if the pre­dic­tor pre­dicts the agent will one-box in this situ­a­tion and c) the pre­dic­tor is perfect. So we’ll have to re­lax at least one of these as­sump­tions.

Speak­ing very roughly, it is gen­er­ally un­der­stood that the way to re­solve this is to re­lax the as­sump­tion that the agent must re­ally be in that situ­a­tion and to al­low the pos­si­bil­ity that the agent may only be simu­lated as be­ing in such as situ­a­tion by the pre­dic­tor. I want to re­it­er­ate that what counts as the same prob­lem is re­ally just a mat­ter of so­cial con­ven­tion.

Another note: I said speak­ing very roughly be­cause some peo­ple claim that the agent could ac­tu­ally be in the simu­la­tion. In my mind these peo­ple are con­fused; in or­der to pre­dict an agent, we may only need to simu­late the de­ci­sion the­ory parts of its mind, not all the other parts that make you you. A sec­ond rea­son why this isn’t pre­cise is be­cause it isn’t defined how to simu­late an im­pos­si­ble situ­a­tion; one of my pre­vi­ous posts points out that we can get around this by simu­lat­ing what an agent would do when given in­put rep­re­sent­ing an im­pos­si­ble situ­a­tion. There may also be some peo­ple have doubts about whether a perfect pre­dic­tor is pos­si­ble even in the­ory. I’d sug­gest that these peo­ple read one of my past posts on why the sense in which you “could have cho­sen oth­er­wise” doesn’t break the pre­dic­tion and how there’s a sense that you are pre-com­mited to ev­ery ac­tion you take.

In any case, once we have re­laxed this as­sump­tion, the con­sis­tent coun­ter­fac­tu­als be­come ei­ther a) the agent ac­tu­ally see­ing the full box and one-box­ing b) the agent see­ing the empty box. In case b), it is ac­tu­ally con­sis­tent for the agent to one-box or two-box since the pre­dic­tor only pre­dicts what would hap­pen if the agent saw a full box. It is then triv­ial to pick the best coun­ter­fac­tual.

This prob­lem ac­tu­ally demon­strates a limi­ta­tion of the era­sure fram­ing. After all, we didn’t just jus­tify the coun­ter­fac­tu­als by re­mov­ing the as­sump­tion that you saw a full box; we in­stead mod­ified it to see­ing a full box OR be­ing simu­lated see­ing a full box. In one sense, this is es­sen­tially the same thing—since we already knew you were be­ing simu­lated by the pre­dic­tor, we es­sen­tially just re­moved the as­sump­tion. On the other hand, it is eas­ier to jus­tify that it is the same prob­lem by turn­ing it into an OR than by just re­mov­ing the as­sump­tion.

In other words, think­ing about coun­ter­fac­tu­als in terms of era­sure can be in­cred­ibly mis­lead­ing and in this case ac­tively made it harder jus­tify our coun­ter­fac­tu­als. The key ques­tion seems to be not, “What should I erase?”, but, “What as­sump­tion should I erase or re­lax?”. I’m be­gin­ning to think that I’ll need to choose a bet­ter term, but I re­luc­tant to re­name this ap­proach un­til I have a bet­ter un­der­stand­ing of what ex­actly is go­ing on.

At risk of re­peat­ing my­self, the fact that it is nat­u­ral to re­lax this as­sump­tion is a mat­ter of so­cial con­ven­tion and not math­e­mat­ics. My next post on this topic will try to help clar­ify how cer­tain as­pects of a prob­lem may make it seem nat­u­ral to re­lax or re­move cer­tain as­sump­tions.

• If you see a full box, then you must be go­ing to one-box if the pre­dic­tor re­ally is perfect.

Huh? If I’m a two-boxer, the pre­dic­tor can still make a simu­la­tion of me, show it a simu­lated full box, and see what hap­pens. It’s easy to for­mal­ize, with com­puter pro­grams for the agent and the pre­dic­tor.

• I’ve already ad­dressed this in the ar­ti­cle above, but my un­der­stand­ing is as fol­lows: This is one of those cir­cum­stances where it is im­por­tant to differ­en­ti­ate be­tween you be­ing in a situ­a­tion and a simu­la­tion of you be­ing in a situ­a­tion. I re­ally should write a post about this—but in or­der for a simu­la­tion to be ac­cu­rate it sim­ply has to make the same de­ci­sions in de­ci­sion the­ory prob­lems. It doesn’t have to have any­thing else the same—in fact, it could be an anti-ra­tio­nal agent with the op­po­site util­ity func­tion.

Note, that I’m not claiming that an agent can ever tell whether it is in the real world or in a simu­la­tion, but that’s not the point. I’m adopt­ing the view­point of an ex­ter­nal ob­server which can tell the differ­ence.

I think the key here is to think about what is hap­pen­ing both in terms of philos­o­phy and math­e­mat­ics, but you only seem in­ter­ested in the former?

• I couldn’t un­der­stand your com­ment, so I wrote a small Haskell pro­gram to show that two-box­ing in the trans­par­ent New­comb prob­lem is a con­sis­tent out­come. What parts of it do you dis­agree with?

• Okay, I have to ad­mit that that’s kind of cool; but on the other hand, that also com­pletely misses the point.

I think we need to back­track. A maths proof can be valid, but the con­clu­sion false if at least one premise is false right? So un­less a prob­lem has already been for­mally defined it’s not enough to just throw down a maths proof, but you also have to jus­tify that you’ve for­mal­ised it cor­rectly.

• Well, the pro­gram is my for­mal­iza­tion. All the premises are right there. You should be able to point out where you dis­agree.

• In other words, the claim isn’t that your pro­gram is in­cor­rect, it’s that it re­quires more jus­tifi­ca­tion than you might think in or­der to per­sua­sively show that it cor­rectly rep­re­sents New­comb’s prob­lem. Maybe you think un­der­stand­ing this isn’t par­tic­u­larly im­por­tant, but I think know­ing ex­actly what is go­ing on is key to un­der­stand­ing how to con­struct log­i­cal-coun­ter­fac­tu­als in gen­eral.

• I ac­tu­ally don’t know Haskell, but I’ll take a stab at de­cod­ing it tonight or to­mor­row. Open-box New­comb’s is nor­mally stated as “you see a full box”, not “you or a simu­la­tion of you sees a full box”. I agree with this rein­ter­pre­ta­tion, but I dis­agree with gloss­ing it over.

My point was that if we take the prob­lem de­scrip­tion su­per-liter­ally as you see­ing the box and not a simu­la­tion of you, then you must one-box. Of course, since this pro­vides a triv­ial de­ci­sion prob­lem, we’ll want to rein­ter­pret it in some way and that’s what I’m pro­vid­ing a jus­tifi­ca­tion for.

• I (some­what) agree that there are cases where you need to keep iden­tity sep­a­rate be­tween lev­els of simu­la­tion (which “you” may or may not be at the out­er­most of). But I don’t think it mat­ters to this prob­lem. When you add “perfect” to the de­scrip­tor, it’s pretty much just you. It makes ev­ery rele­vant de­ci­sion iden­ti­cally.

• When you are try­ing to re­ally break down a prob­lem, I think it is good prac­tise to as­sume they are sep­a­rate at the start. You can then im­me­di­ately jus­tify talk­ing about a simu­la­tion as you in a cer­tain sense, but start­ing with them sep­a­rate is key.

• I may not have got­ten to the part where it mat­ters that they’re sep­a­rate (in perfect simu­la­tion/​al­ign­ment cases). But no harm in it. Just please don’t ob­scure the fun­da­men­tal im­pli­ca­tion that in such a uni­verse, free will is purely an illu­sion.

• I haven’t been defend­ing free will in my posts at all

• in fact, it could be an anti-ra­tio­nal agent with the op­po­site util­ity func­tion.

Th­ese two peo­ple might look the same, the might be iden­ti­cal on a quan­tum level, but one of them is a largely ra­tio­nal agent, and the other is an anti-ra­tio­nal agent with the op­po­site util­ity func­tion.

I think that call­ing some­thing an anti-ra­tio­nal agent with the op­po­site util­ity func­tion is a wierd de­scrip­tion that doesn’t cut re­al­ity at its joints. The is a sim­ple no­tion of a perfect sphere. There is also a sim­ple no­tion of a perfect op­ti­mizer. Real world ob­jects aren’t perfect spheres, but some are pretty close. Thus “sphere” is a use­ful ap­prox­i­ma­tion, and “sphere + er­ror term” is a use­ful de­scrip­tion. Real agents aren’t perfect op­ti­misers, (ig­nor­ing con­tived goals like “1 for do­ing what­ever you were go­ing to do any­way, 0 else”) but some are pretty close, hence “util­ity func­tion + bi­ases” is a use­ful de­scrip­tion. This makes the no­tion of an anti-ra­tio­nal agent with op­po­site util­ity func­tion like an in­side out sphere with its sur­face offset in­wards by twice the ra­dius. Its a cack handed de­scrip­tion of a sim­ple ob­ject in terms of a to­tally differ­ent sim­ple ob­ject and a huge er­ror term.

This is one of those cir­cum­stances where it is im­por­tant to differ­en­ti­ate be­tween you be­ing in a situ­a­tion and a simu­la­tion of you be­ing in a situ­a­tion.

I ac­tu­ally don’t think that there is a gen­eral pro­ce­dure to tell what is you, and what is a simu­la­tion of you. Stan­dard ar­gu­ment about slowly re­plac­ing neu­rons with nanoma­chines, slowly port­ing it to soft­ware, slowly ab­stract­ing and prov­ing the­o­rems about it rather than run­ning it di­rectly.

It is an en­tirely mean­ingful util­ity func­tion to only care about copies of your al­gorithm that are run­ning on cer­tain kinds of hard­ware. That makes you a “bio­chem­i­cal brains run­ning this al­gorithm” maz­imizer. The pa­per­clip max­i­mizer doesn’t care about any copy of its al­gorithm. Hu­mans wor­ry­ing about whether the pre­dic­tors simu­la­tion is de­tailed enough to re­ally suffer is due to spe­cific fea­tures of hu­man moral­ity. From the per­spec­tive of the pa­per­clip max­i­mizer do­ing de­ci­sion the­ory, what we care about is log­i­cal cor­re­la­tion.

• “I ac­tu­ally don’t think that there is a gen­eral pro­ce­dure to tell what is you, and what is a simu­la­tion of you”—Let’s sup­pose I promise to sell you an au­to­graphed Michael Jack­son CD. But then it turns out that the CD wasn’t signed by Michael, but by me. Now I’m re­ally good at forg­eries, so good in fact that my sig­na­ture matches his atom to atom. Haven’t I still lied?

• Imag­ine sit­ting out­side the uni­verse, and be­ing given an ex­act de­scrip­tion of ev­ery­thing that hap­pened within the uni­verse. From this per­spec­tive you can see who signed what.

You can also see whether your thoughts are hap­pen­ing in biol­ogy or sili­con or what­ever.

My point isn’t “you can’t tell whether or not your in a simu­la­tion so there is no differ­ence”, my point is that there is no sharp cut off point be­tween simu­la­tion and not simu­la­tion. We have a “know it when you see it” defi­ni­tion with am­bigu­ous edge cases. De­ci­sion the­ory can’t have differ­ent rules for deal­ing with dogs and not dogs be­cause some things are on the am­bigu­ous edge of dog­gi­ness. Like­wise de­ci­sion the­ory can’t have differ­ent rules for you, copies of you and simu­la­tions of you as there is no sharp cut off. If you want to pro­pose a con­tin­u­ous “simu­lat­ed­ness” pa­ram­e­ter, and ex­plain where that gets added to de­ci­sion the­ory, go ahead. (Or pro­pose some sharp cut­off)

• Some peo­ple want to act as though a simu­la­tion of you is au­to­mat­i­cally you and my ar­gu­ment is that it is bad prac­tise to as­sume this. I’m much more open to the idea that some simu­la­tions might be you in some sense than the claim that all are. This seems com­pat­i­ble with a fuzzy cut-off.

• I still don’t un­der­stand the fas­ci­na­tion with this prob­lem. A perfect pre­dic­tor pretty strongly im­plies some form of de­ter­minism, right? If it pre­dicts one-box­ing and it’s perfect, you don’t ac­tu­ally have a choice—you are go­ing to one-box, and jus­tify it to your­self how­ever you need to.

• Thanks for this com­ment. I ac­ci­den­tally left a sen­tence out of the origi­nal post: “A good way to view this is that in­stead of ask­ing what choice should the agent make, we will ask whether the agent made the best choice”

• There may also be some peo­ple [who] have doubts about whether a perfect pre­dic­tor is pos­si­ble even in the­ory.

While perfect pre­dic­tors are pos­si­ble, perfect pre­dic­tors who give you some in­for­ma­tion about their pre­dic­tion are of­ten im­pos­si­ble. Since you learn of their pre­dic­tion, you re­ally can just do the op­po­site. This is not a prob­lem here, be­cause Omega doesn’t care if he leaves the box empty and you one-box any­way, but its not some­thing to for­get about in gen­eral.

• The trick in open box New­comb’s is that it ei­ther pre­dicts whether you will one-box if you see a full box or not. If you are the kind of agent who always does “the op­po­site” you’ll see an empty box and one-box. Which isn’t ac­tu­ally a prob­lem as it only pre­dicted whether you’d one-box if you saw a full-box.

• Thats… ex­acty what my last sen­tence meant. Are you re­peat­ing on pur­pose or was my ex­pla­na­tion so un­clear?