Newcomb’s Problem and Regret of Rationality

The fol­low­ing may well be the most con­tro­ver­sial dilemma in the his­tory of de­ci­sion the­ory:

A su­per­in­tel­li­gence from an­other galaxy, whom we shall call Omega, comes to Earth and sets about play­ing a strange lit­tle game. In this game, Omega se­lects a hu­man be­ing, sets down two boxes in front of them, and flies away.

Box A is trans­par­ent and con­tains a thou­sand dol­lars.
Box B is opaque, and con­tains ei­ther a mil­lion dol­lars, or noth­ing.

You can take both boxes, or take only box B.

And the twist is that Omega has put a mil­lion dol­lars in box B iff Omega has pre­dicted that you will take only box B.

Omega has been cor­rect on each of 100 ob­served oc­ca­sions so far—ev­ery­one who took both boxes has found box B empty and re­ceived only a thou­sand dol­lars; ev­ery­one who took only box B has found B con­tain­ing a mil­lion dol­lars. (We as­sume that box A van­ishes in a puff of smoke if you take only box B; no one else can take box A af­ter­ward.)

Be­fore you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.

Omega drops two boxes on the ground in front of you and flies off.

Do you take both boxes, or only box B?

And the stan­dard philo­soph­i­cal con­ver­sa­tion runs thusly:

One-boxer: “I take only box B, of course. I’d rather have a mil­lion than a thou­sand.”

Two-boxer: “Omega has already left. Either box B is already full or already empty. If box B is already empty, then tak­ing both boxes nets me $1000, tak­ing only box B nets me $0. If box B is already full, then tak­ing both boxes nets $1,001,000, tak­ing only box B nets $1,000,000. In ei­ther case I do bet­ter by tak­ing both boxes, and worse by leav­ing a thou­sand dol­lars on the table—so I will be ra­tio­nal, and take both boxes.”

One-boxer: “If you’re so ra­tio­nal, why ain’cha rich?”

Two-boxer: “It’s not my fault Omega chooses to re­ward only peo­ple with ir­ra­tional dis­po­si­tions, but it’s already too late for me to do any­thing about that.”

There is a large liter­a­ture on the topic of New­comblike prob­lems—es­pe­cially if you con­sider the Pri­soner’s Dilemma as a spe­cial case, which it is gen­er­ally held to be. “Para­doxes of Ra­tion­al­ity and Co­op­er­a­tion” is an ed­ited vol­ume that in­cludes New­comb’s origi­nal es­say. For those who read only on­line ma­te­rial, this PhD the­sis sum­ma­rizes the ma­jor stan­dard po­si­tions.

I’m not go­ing to go into the whole liter­a­ture, but the dom­i­nant con­sen­sus in mod­ern de­ci­sion the­ory is that one should two-box, and Omega is just re­ward­ing agents with ir­ra­tional dis­po­si­tions. This dom­i­nant view goes by the name of “causal de­ci­sion the­ory”.

As you know, the pri­mary rea­son I’m blog­ging is that I am an in­cred­ibly slow writer when I try to work in any other for­mat. So I’m not go­ing to try to pre­sent my own anal­y­sis here. Way too long a story, even by my stan­dards.

But it is agreed even among causal de­ci­sion the­o­rists that if you have the power to pre­com­mit your­self to take one box, in New­comb’s Prob­lem, then you should do so. If you can pre­com­mit your­self be­fore Omega ex­am­ines you; then you are di­rectly caus­ing box B to be filled.

Now in my field—which, in case you have for­got­ten, is self-mod­ify­ing AI—this works out to say­ing that if you build an AI that two-boxes on New­comb’s Prob­lem, it will self-mod­ify to one-box on New­comb’s Prob­lem, if the AI con­sid­ers in ad­vance that it might face such a situ­a­tion. Agents with free ac­cess to their own source code have ac­cess to a cheap method of pre­com­mit­ment.

What if you ex­pect that you might, in gen­eral, face a New­comblike prob­lem, with­out know­ing the ex­act form of the prob­lem? Then you would have to mod­ify your­self into a sort of agent whose dis­po­si­tion was such that it would gen­er­ally re­ceive high re­wards on New­comblike prob­lems.

But what does an agent with a dis­po­si­tion gen­er­ally-well-suited to New­comblike prob­lems look like? Can this be for­mally speci­fied?

Yes, but when I tried to write it up, I re­al­ized that I was start­ing to write a small book. And it wasn’t the most im­por­tant book I had to write, so I shelved it. My slow writ­ing speed re­ally is the bane of my ex­is­tence. The the­ory I worked out seems, to me, to have many nice prop­er­ties be­sides be­ing well-suited to New­comblike prob­lems. It would make a nice PhD the­sis, if I could get some­one to ac­cept it as my PhD the­sis. But that’s pretty much what it would take to make me un­shelve the pro­ject. Other­wise I can’t jus­tify the time ex­pen­di­ture, not at the speed I cur­rently write books.

I say all this, be­cause there’s a com­mon at­ti­tude that “Ver­bal ar­gu­ments for one-box­ing are easy to come by, what’s hard is de­vel­op­ing a good de­ci­sion the­ory that one-boxes”—co­her­ent math which one-boxes on New­comb’s Prob­lem with­out pro­duc­ing ab­surd re­sults el­se­where. So I do un­der­stand that, and I did set out to de­velop such a the­ory, but my writ­ing speed on big pa­pers is so slow that I can’t pub­lish it. Believe it or not, it’s true.

Nonethe­less, I would like to pre­sent some of my mo­ti­va­tions on New­comb’s Prob­lem—the rea­sons I felt im­pel­led to seek a new the­ory—be­cause they illus­trate my source-at­ti­tudes to­ward ra­tio­nal­ity. Even if I can’t pre­sent the the­ory that these mo­ti­va­tions mo­ti­vate...

First, fore­most, fun­da­men­tally, above all else:

Ra­tional agents should WIN.

Don’t mis­take me, and think that I’m talk­ing about the Hol­ly­wood Ra­tion­al­ity stereo­type that ra­tio­nal­ists should be self­ish or short­sighted. If your util­ity func­tion has a term in it for oth­ers, then win their hap­piness. If your util­ity func­tion has a term in it for a mil­lion years hence, then win the eon.

But at any rate, WIN. Don’t lose rea­son­ably, WIN.

Now there are defen­ders of causal de­ci­sion the­ory who ar­gue that the two-box­ers are do­ing their best to win, and can­not help it if they have been cursed by a Pre­dic­tor who fa­vors ir­ra­tional­ists. I will talk about this defense in a mo­ment. But first, I want to draw a dis­tinc­tion be­tween causal de­ci­sion the­o­rists who be­lieve that two-box­ers are gen­uinely do­ing their best to win; ver­sus some­one who thinks that two-box­ing is the rea­son­able or the ra­tio­nal thing to do, but that the rea­son­able move just hap­pens to pre­dictably lose, in this case. There are a lot of peo­ple out there who think that ra­tio­nal­ity pre­dictably loses on var­i­ous prob­lems—that, too, is part of the Hol­ly­wood Ra­tion­al­ity stereo­type, that Kirk is pre­dictably su­pe­rior to Spock.

Next, let’s turn to the charge that Omega fa­vors ir­ra­tional­ists. I can con­ceive of a su­per­be­ing who re­wards only peo­ple born with a par­tic­u­lar gene, re­gard­less of their choices. I can con­ceive of a su­per­be­ing who re­wards peo­ple whose brains in­scribe the par­tic­u­lar al­gorithm of “De­scribe your op­tions in English and choose the last op­tion when or­dered alpha­bet­i­cally,” but who does not re­ward any­one who chooses the same op­tion for a differ­ent rea­son. But Omega re­wards peo­ple who choose to take only box B, re­gard­less of which al­gorithm they use to ar­rive at this de­ci­sion, and this is why I don’t buy the charge that Omega is re­ward­ing the ir­ra­tional. Omega doesn’t care whether or not you fol­low some par­tic­u­lar rit­ual of cog­ni­tion; Omega only cares about your pre­dicted de­ci­sion.

We can choose what­ever rea­son­ing al­gorithm we like, and will be re­warded or pun­ished only ac­cord­ing to that al­gorithm’s choices, with no other de­pen­dency—Omega just cares where we go, not how we got there.

It is pre­cisely the no­tion that Na­ture does not care about our al­gorithm, which frees us up to pur­sue the win­ning Way—with­out at­tach­ment to any par­tic­u­lar rit­ual of cog­ni­tion, apart from our be­lief that it wins. Every rule is up for grabs, ex­cept the rule of win­ning.

As Miyamoto Musashi said—it’s re­ally worth re­peat­ing:

“You can win with a long weapon, and yet you can also win with a short weapon. In short, the Way of the Ichi school is the spirit of win­ning, what­ever the weapon and what­ever its size.”

(Another ex­am­ple: It was ar­gued by McGee that we must adopt bounded util­ity func­tions or be sub­ject to “Dutch books” over in­finite times. But: The util­ity func­tion is not up for grabs. I love life with­out limit or up­per bound: There is no finite amount of life lived N where I would pre­fer a 80.0001% prob­a­bil­ity of liv­ing N years to an 0.0001% chance of liv­ing a googol­plex years and an 80% chance of liv­ing for­ever. This is a suffi­cient con­di­tion to im­ply that my util­ity func­tion is un­bounded. So I just have to figure out how to op­ti­mize for that moral­ity. You can’t tell me, first, that above all I must con­form to a par­tic­u­lar rit­ual of cog­ni­tion, and then that, if I con­form to that rit­ual, I must change my moral­ity to avoid be­ing Dutch-booked. Toss out the los­ing rit­ual; don’t change the defi­ni­tion of win­ning. That’s like de­cid­ing to pre­fer $1000 to $1,000,000 so that New­comb’s Prob­lem doesn’t make your preferred rit­ual of cog­ni­tion look bad.)

“But,” says the causal de­ci­sion the­o­rist, “to take only one box, you must some­how be­lieve that your choice can af­fect whether box B is empty or full—and that’s un­rea­son­able! Omega has already left! It’s phys­i­cally im­pos­si­ble!”

Un­rea­son­able? I am a ra­tio­nal­ist: what do I care about be­ing un­rea­son­able? I don’t have to con­form to a par­tic­u­lar rit­ual of cog­ni­tion. I don’t have to take only box B be­cause I be­lieve my choice af­fects the box, even though Omega has already left. I can just… take only box B.

I do have a pro­posed al­ter­na­tive rit­ual of cog­ni­tion which com­putes this de­ci­sion, which this mar­gin is too small to con­tain; but I shouldn’t need to show this to you. The point is not to have an el­e­gant the­ory of win­ning—the point is to win; el­e­gance is a side effect.

Or to look at it an­other way: Rather than start­ing with a con­cept of what is the rea­son­able de­ci­sion, and then ask­ing whether “rea­son­able” agents leave with a lot of money, start by look­ing at the agents who leave with a lot of money, de­velop a the­ory of which agents tend to leave with the most money, and from this the­ory, try to figure out what is “rea­son­able”. “Rea­son­able” may just re­fer to de­ci­sions in con­for­mance with our cur­rent rit­ual of cog­ni­tion—what else would de­ter­mine whether some­thing seems “rea­son­able” or not?

From James Joyce (no re­la­tion), Foun­da­tions of Causal De­ci­sion The­ory:

Rachel has a perfectly good an­swer to the “Why ain’t you rich?” ques­tion. “I am not rich,” she will say, “be­cause I am not the kind of per­son the psy­chol­o­gist thinks will re­fuse the money. I’m just not like you, Irene. Given that I know that I am the type who takes the money, and given that the psy­chol­o­gist knows that I am this type, it was rea­son­able of me to think that the $1,000,000 was not in my ac­count. The $1,000 was the most I was go­ing to get no mat­ter what I did. So the only rea­son­able thing for me to do was to take it.”

Irene may want to press the point here by ask­ing, “But don’t you wish you were like me, Rachel? Don’t you wish that you were the re­fus­ing type?” There is a ten­dency to think that Rachel, a com­mit­ted causal de­ci­sion the­o­rist, must an­swer this ques­tion in the nega­tive, which seems ob­vi­ously wrong (given that be­ing like Irene would have made her rich). This is not the case. Rachel can and should ad­mit that she does wish she were more like Irene. “It would have been bet­ter for me,” she might con­cede, “had I been the re­fus­ing type.” At this point Irene will ex­claim, “You’ve ad­mit­ted it! It wasn’t so smart to take the money af­ter all.” Un­for­tu­nately for Irene, her con­clu­sion does not fol­low from Rachel’s premise. Rachel will pa­tiently ex­plain that wish­ing to be a re­fuser in a New­comb prob­lem is not in­con­sis­tent with think­ing that one should take the $1,000 what­ever type one is. When Rachel wishes she was Irene’s type she is wish­ing for Irene’s op­tions, not sanc­tion­ing her choice.

It is, I would say, a gen­eral prin­ci­ple of ra­tio­nal­ity—in­deed, part of how I define ra­tio­nal­ity—that you never end up en­vy­ing some­one else’s mere choices. You might envy some­one their genes, if Omega re­wards genes, or if the genes give you a gen­er­ally hap­pier dis­po­si­tion. But Rachel, above, en­vies Irene her choice, and only her choice, ir­re­spec­tive of what al­gorithm Irene used to make it. Rachel wishes just that she had a dis­po­si­tion to choose differ­ently.

You shouldn’t claim to be more ra­tio­nal than some­one and si­mul­ta­neously envy them their choice—only their choice. Just do the act you envy.

I keep try­ing to say that ra­tio­nal­ity is the win­ning-Way, but causal de­ci­sion the­o­rists in­sist that tak­ing both boxes is what re­ally wins, be­cause you can’t pos­si­bly do bet­ter by leav­ing $1000 on the table… even though the sin­gle-box­ers leave the ex­per­i­ment with more money. Be care­ful of this sort of ar­gu­ment, any time you find your­self defin­ing the “win­ner” as some­one other than the agent who is cur­rently smil­ing from on top of a gi­ant heap of util­ity.

Yes, there are var­i­ous thought ex­per­i­ments in which some agents start out with an ad­van­tage—but if the task is to, say, de­cide whether to jump off a cliff, you want to be care­ful not to define cliff-re­frain­ing agents as hav­ing an un­fair prior ad­van­tage over cliff-jump­ing agents, by virtue of their un­fair re­fusal to jump off cliffs. At this point you have covertly re­defined “win­ning” as con­for­mance to a par­tic­u­lar rit­ual of cog­ni­tion. Pay at­ten­tion to the money!

Or here’s an­other way of look­ing at it: Faced with New­comb’s Prob­lem, would you want to look re­ally hard for a rea­son to be­lieve that it was perfectly rea­son­able and ra­tio­nal to take only box B; be­cause, if such a line of ar­gu­ment ex­isted, you would take only box B and find it full of money? Would you spend an ex­tra hour think­ing it through, if you were con­fi­dent that, at the end of the hour, you would be able to con­vince your­self that box B was the ra­tio­nal choice? This too is a rather odd po­si­tion to be in. Or­di­nar­ily, the work of ra­tio­nal­ity goes into figur­ing out which choice is the best—not find­ing a rea­son to be­lieve that a par­tic­u­lar choice is the best.

Maybe it’s too easy to say that you “ought to” two-box on New­comb’s Prob­lem, that this is the “rea­son­able” thing to do, so long as the money isn’t ac­tu­ally in front of you. Maybe you’re just numb to philo­soph­i­cal dilem­mas, at this point. What if your daugh­ter had a 90% fatal dis­ease, and box A con­tained a serum with a 20% chance of cur­ing her, and box B might con­tain a serum with a 95% chance of cur­ing her? What if there was an as­ter­oid rush­ing to­ward Earth, and box A con­tained an as­ter­oid deflec­tor that worked 10% of the time, and box B might con­tain an as­ter­oid deflec­tor that worked 100% of the time?

Would you, at that point, find your­self tempted to make an un­rea­son­able choice?

If the stake in box B was some­thing you could not leave be­hind? Some­thing over­whelm­ingly more im­por­tant to you than be­ing rea­son­able? If you ab­solutely had to win—re­ally win, not just be defined as win­ning?

Would you wish with all your power that the “rea­son­able” de­ci­sion was to take only box B?

Then maybe it’s time to up­date your defi­ni­tion of rea­son­able­ness.

Alleged ra­tio­nal­ists should not find them­selves en­vy­ing the mere de­ci­sions of alleged non­ra­tional­ists, be­cause your de­ci­sion can be what­ever you like. When you find your­self in a po­si­tion like this, you shouldn’t chide the other per­son for failing to con­form to your con­cepts of rea­son­able­ness. You should re­al­ize you got the Way wrong.

So, too, if you ever find your­self keep­ing sep­a­rate track of the “rea­son­able” be­lief, ver­sus the be­lief that seems likely to be ac­tu­ally true. Either you have mi­s­un­der­stood rea­son­able­ness, or your sec­ond in­tu­ition is just wrong.

Now one can’t si­mul­ta­neously define “ra­tio­nal­ity” as the win­ning Way, and define “ra­tio­nal­ity” as Bayesian prob­a­bil­ity the­ory and de­ci­sion the­ory. But it is the ar­gu­ment that I am putting forth, and the moral of my ad­vice to Trust In Bayes, that the laws gov­ern­ing win­ning have in­deed proven to be math. If it ever turns out that Bayes fails—re­ceives sys­tem­at­i­cally lower re­wards on some prob­lem, rel­a­tive to a su­pe­rior al­ter­na­tive, in virtue of its mere de­ci­sions—then Bayes has to go out the win­dow. “Ra­tion­al­ity” is just the la­bel I use for my be­liefs about the win­ning Way—the Way of the agent smil­ing from on top of the gi­ant heap of util­ity. Cur­rently, that la­bel refers to Bayescraft.

I re­al­ize that this is not a knock­down crit­i­cism of causal de­ci­sion the­ory—that would take the ac­tual book and/​or PhD the­sis—but I hope it illus­trates some of my un­der­ly­ing at­ti­tude to­ward this no­tion of “ra­tio­nal­ity”.

You shouldn’t find your­self dis­t­in­guish­ing the win­ning choice from the rea­son­able choice. Nor should you find your­self dis­t­in­guish­ing the rea­son­able be­lief from the be­lief that is most likely to be true.

That is why I use the word “ra­tio­nal” to de­note my be­liefs about ac­cu­racy and win­ning—not to de­note ver­bal rea­son­ing, or strate­gies which yield cer­tain suc­cess, or that which is log­i­cally prov­able, or that which is pub­li­cly demon­stra­ble, or that which is rea­son­able.

As Miyamoto Musashi said:

“The pri­mary thing when you take a sword in your hands is your in­ten­tion to cut the en­emy, what­ever the means. When­ever you parry, hit, spring, strike or touch the en­emy’s cut­ting sword, you must cut the en­emy in the same move­ment. It is es­sen­tial to at­tain this. If you think only of hit­ting, spring­ing, strik­ing or touch­ing the en­emy, you will not be able ac­tu­ally to cut him.”