Why Bayesians should two-box in a one-shot

Con­sider New­comb’s prob­lem.

Let ‘gen­eral’ be the claim that Omega is always right.

Let ‘in­stance’ be the claim that Omega is right about a par­tic­u­lar pre­dic­tion.

As­sume you, the player, are not told the rules of the game un­til af­ter Omega has made its pre­dic­tion.

Con­sider 2 var­i­ants of New­comb’s prob­lem.

1. Omega is a perfect pre­dic­tor. In this var­i­ant, you as­sign a prior of 1 to P(gen­eral). You are then obli­gated to be­lieve that Omega has cor­rectly pre­dicted your ac­tion. In this case Eliezer’s con­clu­sion is cor­rect, and you should one-box. It’s still un­clear whether you have free will, and hence have any choice in what you do next, but you can’t lose by one-box­ing.

But you can’t as­sign a prior of 1 to P(gen­eral), be­cause you’re a Bayesian. You de­rive your prior for P(gen­eral) from the (finite) em­piri­cal data. Say you be­gin with a prior of 0.5 be­fore con­sid­er­ing any ob­ser­va­tions. Then you ob­serve all of Omega’s N pre­dic­tions, and each time, Omega gets it right, and you up­date:

P(gen­eral | in­stance) = P(in­stance | gen­eral) P(in­stance) /​ P(gen­eral)
= P(in­stance) /​ P(gen­eral)

Omega would need to make an in­finite num­ber of cor­rect pre­dic­tions be­fore you could as­sign a prior of 1 to P(gen­eral). So this case is the­o­ret­i­cally im­pos­si­ble, and should not be con­sid­ered.

2. Omega is a “nearly perfect” pre­dic­tor. You as­sign P(gen­eral) a value very, very close to 1. You must, how­ever, do the math and try to com­pare the ex­pected pay­offs, at least in an or­der-of-mag­ni­tude way, and not just use ver­bal rea­son­ing as if we were me­dieval scholas­tics.

The ar­gu­ment for two-box­ing is that your ac­tion now can’t af­fect what Omega did in the past. That is, we are us­ing a model which in­cludes not just P(in­stance | gen­eral), but also the in­ter­ac­tion of your ac­tion, the con­tents of the boxes, and the claim that Omega can­not vi­o­late causal­ity. P ( P($1M box is empty | you one-box) = P($1M box is empty | you two-box) ) >= P(Omega can­not vi­o­late causal­ity), and that needs to be en­tered into the com­pu­ta­tion.

Numer­i­cally, two-box­ers claim that the high prob­a­bil­ity they as­sign to our un­der­stand­ing of causal­ity be­ing ba­si­cally cor­rect more than can­cels out the high prob­a­bil­ity of Omega be­ing cor­rect.

The ar­gu­ment for one-box­ing is that you aren’t en­tirely sure you un­der­stand physics, but you know Omega has a re­ally good track record—so good that it is more likely that your un­der­stand­ing of physics is false than that you can falsify Omega’s pre­dic­tion. This is a strict re­li­ance on em­piri­cal ob­ser­va­tions as op­posed to ab­stract rea­son: count up how of­ten Omega has been right and com­pute a prior.

How­ever, if we’re go­ing to be strict em­piri­cists, we should dou­ble down on that, and set our prior on P(can­not vi­o­late causal­ity) strictly em­piri­cally—based on all ob­ser­va­tions re­gard­ing whether or not things in the pre­sent can af­fect things in the past.

This in­cludes up to ev­ery par­ti­cle in­ter­ac­tion in our ob­serv­able uni­verse. The num­ber is not so high as that, as prob­a­bly a large num­ber of in­ter­ac­tions could oc­cur in which the fu­ture af­fects the past with­out our notic­ing. But the num­ber of ob­ser­va­tions any one per­son has made in which events in the fu­ture seem to have failed to af­fect events in the pre­sent is cer­tainly very large, and the ac­cu­mu­lated wis­dom of the en­tire hu­man race on the is­sue must provide more bits in fa­vor of the hy­poth­e­sis that causal­ity can’t be vi­o­lated, than the bits for Omega’s in­fal­li­bil­ity based on the com­par­a­tively paltry num­ber of ob­ser­va­tions of Omega’s pre­dic­tions, un­less Omega is very busy in­deed. And even if Omega has some­how made enough ob­ser­va­tions, most of them are as in­ac­cessible to you as ob­ser­va­tions of the laws of causal­ity work­ing on the dark side of the moon. You, per­son­ally, can­not have ob­served Omega make more cor­rect pre­dic­tions than the num­ber of events you have ob­served in which the fu­ture failed to af­fect the pre­sent.

You could com­pute a new pay­off ma­trix that made it ra­tio­nal to one-box, but the ra­tio be­tween the pay­offs would need to be many or­ders of mag­ni­tude higher. You’d have to com­pute it in utilons rather than dol­lars, be­cause the util­ity of dol­lars doesn’t scale lin­early. And that means you’d run into the prob­lem that hu­mans have some up­per bound on util­ity—they aren’t cog­ni­tively com­plex enough to achieve util­ity lev­els 10^10 times greater than “won $1,000”. So it still might not be ra­tio­nal to one-box, be­cause the util­ity pay­off un­der the one box might need to be larger than you, as a hu­man, could ex­pe­rience.


The case in which you get to think about what to do be­fore Omega stud­ies you and makes its de­ci­sion is more com­pli­cated, be­cause your prob­a­bil­ity calcu­la­tion then also de­pends on what you think you would have done be­fore Omega made its de­ci­sion. This only af­fects the par­ti­tion of your prob­a­bil­ity calcu­la­tion in which Omega can al­ter the past, how­ever, so nu­mer­i­cally it doesn’t make a big differ­ence.

The trick here is that most state­ments of New­comb’s are am­bigu­ous as to whether you are told the rules be­fore Omega stud­ies you, and as to which de­ci­sion they’re ask­ing you about when they ask if you one-box or two-box. Are they ask­ing about what you pre-com­mit to, or what you even­tu­ally do? Th­ese de­ci­sions are sep­a­rate, but not iso­lat­able.

As long as we fo­cus on the sin­gle de­ci­sion at the point of ac­tion, then the anal­y­sis above (mod­ified as just men­tioned) still fol­lows. If we ask what the player should plan to do be­fore Omega makes its de­ci­sion, then the ques­tion is just whether you have a good enough poker face to fool Omega. Here it takes no causal­ity vi­o­la­tion for Omega to fill the boxes in ac­cor­dance with your plans, so that fac­tor does not en­ter in, and you should plan to one-box.

If you are a de­ter­minis­tic AI, that im­plies that you will one-box. If you’re a GOFAI built ac­cord­ing to the old-fash­ioned sym­bolic logic AI de­signs talked about on LW (which, BTW, don’t work), it im­plies you will prob­a­bly one-box even if you’re not de­ter­minis­tic, as oth­er­wise you would need to be in­con­sis­tent, which is not al­lowed with GOFAI ar­chi­tec­tures. If you’re a hu­man, you’d the­o­ret­i­cally be bet­ter off if you could sud­denly see things differ­ently when it’s time to choose boxes, but that’s not psy­cholog­i­cally plau­si­ble. In no case is there a para­dox, or any real difficulty to the de­ci­sion to one-box.

Iter­ated Games

Every­thing changes with iter­ated in­ter­ac­tions. It’s use­ful to de­velop a rep­u­ta­tion for one-box­ing, be­cause this may con­vince peo­ple that you will keep your word even when it seems dis­ad­van­ta­geous to you. It’s use­ful to con­vince peo­ple that you would one-box, and it’s even benefi­cial, in cer­tain re­spects, to spread the false be­lief in the Bayesian com­mu­nity that Bayesi­ans should one-box.

Read Eliezer’s post care­fully, and I think you’ll agree that the rea­son­ing Eliezer gives for one-box­ing is not that it is the ra­tio­nal solu­tion to a one-off game—it’s that it’s a win­ning policy to be the kind of per­son who one-boxes. That’s not an ar­gu­ment that the pay­off ma­trix of an in­stan­ta­neous de­ci­sion fa­vors one-box­ing; it’s an ar­gu­ment for a LessWron­gian moral­ity. It’s the same ba­sic ar­gu­ment as that hon­or­ing com­mit­ments is a good long-term strat­egy. But the way Eliezer stated it has given many peo­ple the false im­pres­sion that one-box­ing is ac­tu­ally the ra­tio­nal choice in an in­stan­ta­neous one-shot game (and that’s the only in­ter­pre­ta­tion which would make it in­ter­est­ing).

The one-box­ing ar­gu­ment is so ap­peal­ing be­cause it offers a solu­tion to difficult co­or­di­na­tion prob­lems. It makes it ap­pear that ra­tio­nal al­tru­ism and a ra­tio­nal utopia are within our reach.

But this is wish­ful think­ing, not math, and I be­lieve that the so­cial norm of do­ing the math is even more im­por­tant than a so­cial norm of one-box­ing.