# Newcomb’s Problem and Regret of Rationality

The fol­low­ing may well be the most con­tro­ver­sial dilemma in the his­tory of de­ci­sion the­ory:

A su­per­in­tel­li­gence from an­other galaxy, whom we shall call Omega, comes to Earth and sets about play­ing a strange lit­tle game. In this game, Omega se­lects a hu­man be­ing, sets down two boxes in front of them, and flies away.

Box A is trans­par­ent and con­tains a thou­sand dol­lars.
Box B is opaque, and con­tains ei­ther a mil­lion dol­lars, or noth­ing.

You can take both boxes, or take only box B.

And the twist is that Omega has put a mil­lion dol­lars in box B iff Omega has pre­dicted that you will take only box B.

Omega has been cor­rect on each of 100 ob­served oc­ca­sions so far—ev­ery­one who took both boxes has found box B empty and re­ceived only a thou­sand dol­lars; ev­ery­one who took only box B has found B con­tain­ing a mil­lion dol­lars. (We as­sume that box A van­ishes in a puff of smoke if you take only box B; no one else can take box A af­ter­ward.)

Be­fore you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.

Omega drops two boxes on the ground in front of you and flies off.

Do you take both boxes, or only box B?

And the stan­dard philo­soph­i­cal con­ver­sa­tion runs thusly:

One-boxer: “I take only box B, of course. I’d rather have a mil­lion than a thou­sand.”

Two-boxer: “Omega has already left. Either box B is already full or already empty. If box B is already empty, then tak­ing both boxes nets me \$1000, tak­ing only box B nets me \$0. If box B is already full, then tak­ing both boxes nets \$1,001,000, tak­ing only box B nets \$1,000,000. In ei­ther case I do bet­ter by tak­ing both boxes, and worse by leav­ing a thou­sand dol­lars on the table—so I will be ra­tio­nal, and take both boxes.”

One-boxer: “If you’re so ra­tio­nal, why ain’cha rich?”

Two-boxer: “It’s not my fault Omega chooses to re­ward only peo­ple with ir­ra­tional dis­po­si­tions, but it’s already too late for me to do any­thing about that.”

There is a large liter­a­ture on the topic of New­comblike prob­lems—es­pe­cially if you con­sider the Pri­soner’s Dilemma as a spe­cial case, which it is gen­er­ally held to be. “Para­doxes of Ra­tion­al­ity and Co­op­er­a­tion” is an ed­ited vol­ume that in­cludes New­comb’s origi­nal es­say. For those who read only on­line ma­te­rial, this PhD the­sis sum­ma­rizes the ma­jor stan­dard po­si­tions.

I’m not go­ing to go into the whole liter­a­ture, but the dom­i­nant con­sen­sus in mod­ern de­ci­sion the­ory is that one should two-box, and Omega is just re­ward­ing agents with ir­ra­tional dis­po­si­tions. This dom­i­nant view goes by the name of “causal de­ci­sion the­ory”.

As you know, the pri­mary rea­son I’m blog­ging is that I am an in­cred­ibly slow writer when I try to work in any other for­mat. So I’m not go­ing to try to pre­sent my own anal­y­sis here. Way too long a story, even by my stan­dards.

But it is agreed even among causal de­ci­sion the­o­rists that if you have the power to pre­com­mit your­self to take one box, in New­comb’s Prob­lem, then you should do so. If you can pre­com­mit your­self be­fore Omega ex­am­ines you; then you are di­rectly caus­ing box B to be filled.

Now in my field—which, in case you have for­got­ten, is self-mod­ify­ing AI—this works out to say­ing that if you build an AI that two-boxes on New­comb’s Prob­lem, it will self-mod­ify to one-box on New­comb’s Prob­lem, if the AI con­sid­ers in ad­vance that it might face such a situ­a­tion. Agents with free ac­cess to their own source code have ac­cess to a cheap method of pre­com­mit­ment.

What if you ex­pect that you might, in gen­eral, face a New­comblike prob­lem, with­out know­ing the ex­act form of the prob­lem? Then you would have to mod­ify your­self into a sort of agent whose dis­po­si­tion was such that it would gen­er­ally re­ceive high re­wards on New­comblike prob­lems.

But what does an agent with a dis­po­si­tion gen­er­ally-well-suited to New­comblike prob­lems look like? Can this be for­mally speci­fied?

Yes, but when I tried to write it up, I re­al­ized that I was start­ing to write a small book. And it wasn’t the most im­por­tant book I had to write, so I shelved it. My slow writ­ing speed re­ally is the bane of my ex­is­tence. The the­ory I worked out seems, to me, to have many nice prop­er­ties be­sides be­ing well-suited to New­comblike prob­lems. It would make a nice PhD the­sis, if I could get some­one to ac­cept it as my PhD the­sis. But that’s pretty much what it would take to make me un­shelve the pro­ject. Other­wise I can’t jus­tify the time ex­pen­di­ture, not at the speed I cur­rently write books.

I say all this, be­cause there’s a com­mon at­ti­tude that “Ver­bal ar­gu­ments for one-box­ing are easy to come by, what’s hard is de­vel­op­ing a good de­ci­sion the­ory that one-boxes”—co­her­ent math which one-boxes on New­comb’s Prob­lem with­out pro­duc­ing ab­surd re­sults el­se­where. So I do un­der­stand that, and I did set out to de­velop such a the­ory, but my writ­ing speed on big pa­pers is so slow that I can’t pub­lish it. Believe it or not, it’s true.

Nonethe­less, I would like to pre­sent some of my mo­ti­va­tions on New­comb’s Prob­lem—the rea­sons I felt im­pel­led to seek a new the­ory—be­cause they illus­trate my source-at­ti­tudes to­ward ra­tio­nal­ity. Even if I can’t pre­sent the the­ory that these mo­ti­va­tions mo­ti­vate...

First, fore­most, fun­da­men­tally, above all else:

Ra­tional agents should WIN.

Don’t mis­take me, and think that I’m talk­ing about the Hol­ly­wood Ra­tion­al­ity stereo­type that ra­tio­nal­ists should be self­ish or short­sighted. If your util­ity func­tion has a term in it for oth­ers, then win their hap­piness. If your util­ity func­tion has a term in it for a mil­lion years hence, then win the eon.

But at any rate, WIN. Don’t lose rea­son­ably, WIN.

Now there are defen­ders of causal de­ci­sion the­ory who ar­gue that the two-box­ers are do­ing their best to win, and can­not help it if they have been cursed by a Pre­dic­tor who fa­vors ir­ra­tional­ists. I will talk about this defense in a mo­ment. But first, I want to draw a dis­tinc­tion be­tween causal de­ci­sion the­o­rists who be­lieve that two-box­ers are gen­uinely do­ing their best to win; ver­sus some­one who thinks that two-box­ing is the rea­son­able or the ra­tio­nal thing to do, but that the rea­son­able move just hap­pens to pre­dictably lose, in this case. There are a lot of peo­ple out there who think that ra­tio­nal­ity pre­dictably loses on var­i­ous prob­lems—that, too, is part of the Hol­ly­wood Ra­tion­al­ity stereo­type, that Kirk is pre­dictably su­pe­rior to Spock.

Next, let’s turn to the charge that Omega fa­vors ir­ra­tional­ists. I can con­ceive of a su­per­be­ing who re­wards only peo­ple born with a par­tic­u­lar gene, re­gard­less of their choices. I can con­ceive of a su­per­be­ing who re­wards peo­ple whose brains in­scribe the par­tic­u­lar al­gorithm of “De­scribe your op­tions in English and choose the last op­tion when or­dered alpha­bet­i­cally,” but who does not re­ward any­one who chooses the same op­tion for a differ­ent rea­son. But Omega re­wards peo­ple who choose to take only box B, re­gard­less of which al­gorithm they use to ar­rive at this de­ci­sion, and this is why I don’t buy the charge that Omega is re­ward­ing the ir­ra­tional. Omega doesn’t care whether or not you fol­low some par­tic­u­lar rit­ual of cog­ni­tion; Omega only cares about your pre­dicted de­ci­sion.

We can choose what­ever rea­son­ing al­gorithm we like, and will be re­warded or pun­ished only ac­cord­ing to that al­gorithm’s choices, with no other de­pen­dency—Omega just cares where we go, not how we got there.

It is pre­cisely the no­tion that Na­ture does not care about our al­gorithm, which frees us up to pur­sue the win­ning Way—with­out at­tach­ment to any par­tic­u­lar rit­ual of cog­ni­tion, apart from our be­lief that it wins. Every rule is up for grabs, ex­cept the rule of win­ning.

As Miyamoto Musashi said—it’s re­ally worth re­peat­ing:

“You can win with a long weapon, and yet you can also win with a short weapon. In short, the Way of the Ichi school is the spirit of win­ning, what­ever the weapon and what­ever its size.”

(Another ex­am­ple: It was ar­gued by McGee that we must adopt bounded util­ity func­tions or be sub­ject to “Dutch books” over in­finite times. But: The util­ity func­tion is not up for grabs. I love life with­out limit or up­per bound: There is no finite amount of life lived N where I would pre­fer a 80.0001% prob­a­bil­ity of liv­ing N years to an 0.0001% chance of liv­ing a googol­plex years and an 80% chance of liv­ing for­ever. This is a suffi­cient con­di­tion to im­ply that my util­ity func­tion is un­bounded. So I just have to figure out how to op­ti­mize for that moral­ity. You can’t tell me, first, that above all I must con­form to a par­tic­u­lar rit­ual of cog­ni­tion, and then that, if I con­form to that rit­ual, I must change my moral­ity to avoid be­ing Dutch-booked. Toss out the los­ing rit­ual; don’t change the defi­ni­tion of win­ning. That’s like de­cid­ing to pre­fer \$1000 to \$1,000,000 so that New­comb’s Prob­lem doesn’t make your preferred rit­ual of cog­ni­tion look bad.)

“But,” says the causal de­ci­sion the­o­rist, “to take only one box, you must some­how be­lieve that your choice can af­fect whether box B is empty or full—and that’s un­rea­son­able! Omega has already left! It’s phys­i­cally im­pos­si­ble!”

Un­rea­son­able? I am a ra­tio­nal­ist: what do I care about be­ing un­rea­son­able? I don’t have to con­form to a par­tic­u­lar rit­ual of cog­ni­tion. I don’t have to take only box B be­cause I be­lieve my choice af­fects the box, even though Omega has already left. I can just… take only box B.

I do have a pro­posed al­ter­na­tive rit­ual of cog­ni­tion which com­putes this de­ci­sion, which this mar­gin is too small to con­tain; but I shouldn’t need to show this to you. The point is not to have an el­e­gant the­ory of win­ning—the point is to win; el­e­gance is a side effect.

Or to look at it an­other way: Rather than start­ing with a con­cept of what is the rea­son­able de­ci­sion, and then ask­ing whether “rea­son­able” agents leave with a lot of money, start by look­ing at the agents who leave with a lot of money, de­velop a the­ory of which agents tend to leave with the most money, and from this the­ory, try to figure out what is “rea­son­able”. “Rea­son­able” may just re­fer to de­ci­sions in con­for­mance with our cur­rent rit­ual of cog­ni­tion—what else would de­ter­mine whether some­thing seems “rea­son­able” or not?

From James Joyce (no re­la­tion), Foun­da­tions of Causal De­ci­sion The­ory:

Rachel has a perfectly good an­swer to the “Why ain’t you rich?” ques­tion. “I am not rich,” she will say, “be­cause I am not the kind of per­son the psy­chol­o­gist thinks will re­fuse the money. I’m just not like you, Irene. Given that I know that I am the type who takes the money, and given that the psy­chol­o­gist knows that I am this type, it was rea­son­able of me to think that the \$1,000,000 was not in my ac­count. The \$1,000 was the most I was go­ing to get no mat­ter what I did. So the only rea­son­able thing for me to do was to take it.”

Irene may want to press the point here by ask­ing, “But don’t you wish you were like me, Rachel? Don’t you wish that you were the re­fus­ing type?” There is a ten­dency to think that Rachel, a com­mit­ted causal de­ci­sion the­o­rist, must an­swer this ques­tion in the nega­tive, which seems ob­vi­ously wrong (given that be­ing like Irene would have made her rich). This is not the case. Rachel can and should ad­mit that she does wish she were more like Irene. “It would have been bet­ter for me,” she might con­cede, “had I been the re­fus­ing type.” At this point Irene will ex­claim, “You’ve ad­mit­ted it! It wasn’t so smart to take the money af­ter all.” Un­for­tu­nately for Irene, her con­clu­sion does not fol­low from Rachel’s premise. Rachel will pa­tiently ex­plain that wish­ing to be a re­fuser in a New­comb prob­lem is not in­con­sis­tent with think­ing that one should take the \$1,000 what­ever type one is. When Rachel wishes she was Irene’s type she is wish­ing for Irene’s op­tions, not sanc­tion­ing her choice.

It is, I would say, a gen­eral prin­ci­ple of ra­tio­nal­ity—in­deed, part of how I define ra­tio­nal­ity—that you never end up en­vy­ing some­one else’s mere choices. You might envy some­one their genes, if Omega re­wards genes, or if the genes give you a gen­er­ally hap­pier dis­po­si­tion. But Rachel, above, en­vies Irene her choice, and only her choice, ir­re­spec­tive of what al­gorithm Irene used to make it. Rachel wishes just that she had a dis­po­si­tion to choose differ­ently.

You shouldn’t claim to be more ra­tio­nal than some­one and si­mul­ta­neously envy them their choice—only their choice. Just do the act you envy.

I keep try­ing to say that ra­tio­nal­ity is the win­ning-Way, but causal de­ci­sion the­o­rists in­sist that tak­ing both boxes is what re­ally wins, be­cause you can’t pos­si­bly do bet­ter by leav­ing \$1000 on the table… even though the sin­gle-box­ers leave the ex­per­i­ment with more money. Be care­ful of this sort of ar­gu­ment, any time you find your­self defin­ing the “win­ner” as some­one other than the agent who is cur­rently smil­ing from on top of a gi­ant heap of util­ity.

Yes, there are var­i­ous thought ex­per­i­ments in which some agents start out with an ad­van­tage—but if the task is to, say, de­cide whether to jump off a cliff, you want to be care­ful not to define cliff-re­frain­ing agents as hav­ing an un­fair prior ad­van­tage over cliff-jump­ing agents, by virtue of their un­fair re­fusal to jump off cliffs. At this point you have covertly re­defined “win­ning” as con­for­mance to a par­tic­u­lar rit­ual of cog­ni­tion. Pay at­ten­tion to the money!

Or here’s an­other way of look­ing at it: Faced with New­comb’s Prob­lem, would you want to look re­ally hard for a rea­son to be­lieve that it was perfectly rea­son­able and ra­tio­nal to take only box B; be­cause, if such a line of ar­gu­ment ex­isted, you would take only box B and find it full of money? Would you spend an ex­tra hour think­ing it through, if you were con­fi­dent that, at the end of the hour, you would be able to con­vince your­self that box B was the ra­tio­nal choice? This too is a rather odd po­si­tion to be in. Or­di­nar­ily, the work of ra­tio­nal­ity goes into figur­ing out which choice is the best—not find­ing a rea­son to be­lieve that a par­tic­u­lar choice is the best.

Maybe it’s too easy to say that you “ought to” two-box on New­comb’s Prob­lem, that this is the “rea­son­able” thing to do, so long as the money isn’t ac­tu­ally in front of you. Maybe you’re just numb to philo­soph­i­cal dilem­mas, at this point. What if your daugh­ter had a 90% fatal dis­ease, and box A con­tained a serum with a 20% chance of cur­ing her, and box B might con­tain a serum with a 95% chance of cur­ing her? What if there was an as­ter­oid rush­ing to­ward Earth, and box A con­tained an as­ter­oid deflec­tor that worked 10% of the time, and box B might con­tain an as­ter­oid deflec­tor that worked 100% of the time?

Would you, at that point, find your­self tempted to make an un­rea­son­able choice?

If the stake in box B was some­thing you could not leave be­hind? Some­thing over­whelm­ingly more im­por­tant to you than be­ing rea­son­able? If you ab­solutely had to win—re­ally win, not just be defined as win­ning?

Would you wish with all your power that the “rea­son­able” de­ci­sion was to take only box B?

Then maybe it’s time to up­date your defi­ni­tion of rea­son­able­ness.

Alleged ra­tio­nal­ists should not find them­selves en­vy­ing the mere de­ci­sions of alleged non­ra­tional­ists, be­cause your de­ci­sion can be what­ever you like. When you find your­self in a po­si­tion like this, you shouldn’t chide the other per­son for failing to con­form to your con­cepts of rea­son­able­ness. You should re­al­ize you got the Way wrong.

So, too, if you ever find your­self keep­ing sep­a­rate track of the “rea­son­able” be­lief, ver­sus the be­lief that seems likely to be ac­tu­ally true. Either you have mi­s­un­der­stood rea­son­able­ness, or your sec­ond in­tu­ition is just wrong.

Now one can’t si­mul­ta­neously define “ra­tio­nal­ity” as the win­ning Way, and define “ra­tio­nal­ity” as Bayesian prob­a­bil­ity the­ory and de­ci­sion the­ory. But it is the ar­gu­ment that I am putting forth, and the moral of my ad­vice to Trust In Bayes, that the laws gov­ern­ing win­ning have in­deed proven to be math. If it ever turns out that Bayes fails—re­ceives sys­tem­at­i­cally lower re­wards on some prob­lem, rel­a­tive to a su­pe­rior al­ter­na­tive, in virtue of its mere de­ci­sions—then Bayes has to go out the win­dow. “Ra­tion­al­ity” is just the la­bel I use for my be­liefs about the win­ning Way—the Way of the agent smil­ing from on top of the gi­ant heap of util­ity. Cur­rently, that la­bel refers to Bayescraft.

I re­al­ize that this is not a knock­down crit­i­cism of causal de­ci­sion the­ory—that would take the ac­tual book and/​or PhD the­sis—but I hope it illus­trates some of my un­der­ly­ing at­ti­tude to­ward this no­tion of “ra­tio­nal­ity”.

You shouldn’t find your­self dis­t­in­guish­ing the win­ning choice from the rea­son­able choice. Nor should you find your­self dis­t­in­guish­ing the rea­son­able be­lief from the be­lief that is most likely to be true.

That is why I use the word “ra­tio­nal” to de­note my be­liefs about ac­cu­racy and win­ning—not to de­note ver­bal rea­son­ing, or strate­gies which yield cer­tain suc­cess, or that which is log­i­cally prov­able, or that which is pub­li­cly demon­stra­ble, or that which is rea­son­able.

As Miyamoto Musashi said:

“The pri­mary thing when you take a sword in your hands is your in­ten­tion to cut the en­emy, what­ever the means. When­ever you parry, hit, spring, strike or touch the en­emy’s cut­ting sword, you must cut the en­emy in the same move­ment. It is es­sen­tial to at­tain this. If you think only of hit­ting, spring­ing, strik­ing or touch­ing the en­emy, you will not be able ac­tu­ally to cut him.”

• This dilemma seems like it can be re­duced to:

1. If you take both boxes, you will get \$1000

2. If you only take box B, you will get \$1M Which is a rather easy de­ci­sion.

There’s a seem­ingly-im­pos­si­ble but vi­tal premise, namely, that your ac­tion was already known be­fore you acted. Even if this is com­pletely im­pos­si­ble, it’s a premise, so there’s no point ar­gu­ing it.

Another way of think­ing of it is that, when some­one says, “The boxes are already there, so your de­ci­sion can­not af­fect what’s in them,” he is wrong. It has been as­sumed that your de­ci­sion does af­fect what’s in them, so the fact that you can­not imag­ine how that is pos­si­ble is wholly ir­rele­vant.

In short, I don’t un­der­stand how this is con­tro­ver­sial when the de­cider has all the in­for­ma­tion that was pro­vided.

• Another way of think­ing of it is that, when some­one says, “The boxes are already there, so your de­ci­sion can­not af­fect what’s in them,” he is wrong. It has been as­sumed that your de­ci­sion does af­fect what’s in them, so the fact that you can­not imag­ine how that is pos­si­ble is wholly ir­rele­vant.

Your de­ci­sion doesn’t af­fect what’s in the boxes, but your de­ci­sion pro­ce­dure does, and that already ex­ists when the ques­tion’s be­ing as­signed. It may or may not be pos­si­ble to de­rive your de­ci­sion from the de­ci­sion pro­ce­dure you’re us­ing in the gen­eral case—I haven’t ac­tu­ally done the re­duc­tion, but at first glance it looks cog­nate to some prob­lems that I know are un­de­cid­able—but it’s clearly pos­si­ble in some cases, and it’s at least not com­pletely ab­surd to imag­ine an Omega with a very high suc­cess rate.

As best I can tell, most of the con­fu­sion here comes from a con­cep­tion of free will that de­cou­ples the de­ci­sion from the pro­ce­dure lead­ing to it.

• most of the con­fu­sion here comes from a con­cep­tion of free will that de­cou­ples the de­ci­sion from the pro­ce­dure lead­ing to it.

Yeah, agreed. I of­ten de­scribe this as NP be­ing more about what kind of per­son I am than it is about what de­ci­sion I make, but I like your phras­ing bet­ter.

• Ac­tu­ally, we don’t know that our de­ci­sion af­fects the con­tents of Box B. In fact, we’re told that it con­tains a mil­lion dol­lars if-and-only-if Omega pre­dicts we will only take Box B.

It is pos­si­ble that we could pick Box B even tho Omega pre­dicted we would take both boxes. Omega has only ob­served to have pre­dicted cor­rectly 100 times. And if we are suffi­ciently doubt­ful whether Omega would pre­dict that we would take only Box B, it would be ra­tio­nal to take both boxes.

Only if we’re some­what con­fi­dent of Omega’s pre­dic­tion can we con­fi­dently one-box and ra­tio­nally ex­pect it to con­tain a mil­lion dol­lars.

• some­what con­fi­dent of Omega’s prediction

51% con­fi­dence would suffice.

• Two-box ex­pected value: 0.51 \$1K + 0.49 \$1.001M = \$491000

• One-box ex­pected value: 0.51 \$1M + 0.49 \$0 = \$510000

• You’re say­ing that we live in a uni­verse where New­comb’s prob­lem is im­pos­si­ble be­cause the fu­ture doesn’t effect the past. I’ll re-phrase this prob­lem in such a way that it seems plau­si­ble in our uni­verse:

I’ve got re­ally nice scan­ning soft­ware. I scan your brain down to the molecule, and make a vir­tual rep­re­sen­ta­tion of it on a com­puter. I run vir­tual-you in my soft­ware, and give vir­tual-you New­comb’s prob­lem. Vir­tual-you an­swers, and I ar­range my boxes ac­cord­ing to that an­swer.

I come back to real-you. You’ve got no idea what’s go­ing on. I ex­plain the sce­nario to you and I give you New­comb’s prob­lem. How do you an­swer?

This par­tic­u­lar in­stance of the prob­lem does have an ob­vi­ous, rel­a­tively un­com­pli­cated solu­tion: Lbh unir ab jnl bs xab­j­vat ju­r­gure lbh ner rkcrevrap­vat gur cneg bs gur fvzhyn­gvba, be gur cneg bs gur syrfu-naq-oybbq ire­fvba. Fvapr lbh xabj gung obgu jvyy npg vqragvp­nyyl, bar-obkvat vf gur fhcrevbe bcgvba.

If for any rea­son you sus­pect that the Pre­dic­tor can reach a suffi­cient level of ac­cu­racy to jus­tify one-box­ing, you one box. It doesn’t mat­ter what sort of uni­verse you are in.

• Not that I dis­agree with the one-box­ing con­clu­sion, but this for­mu­la­tion re­quires phys­i­cally re­ducible free will (which has re­cently been brought back into dis­cus­sion). It would also re­quire know­ing the po­si­tion and mo­men­tum of a lot of par­ti­cles to ar­bi­trary pre­ci­sion, which is prov­ably im­pos­si­ble.

• We don’t need a perfect simu­la­tion for the pur­poses of this prob­lem in the ab­stract—we just need a situ­a­tion such that the prob­lem-solver as­signs bet­ter-than-chance pre­dict­ing power to the Pre­dic­tor, and a suffi­ciently high util­ity differ­en­tial be­tween win­ning and los­ing.

The “perfect whole brain simu­la­tion” is an ex­treme case which keeps things in­tu­itively clear. I’d ar­gue that any form of simu­la­tion which performs bet­ter than chance fol­lows the same logic.

The only way to es­cape the con­clu­sion via simu­la­tion is if you know some­thing that Omega doesn’t—for ex­am­ple, you might have some se­cret ex­ter­nal fac­tor mod­ify your “source code” and al­ter your de­ci­sion af­ter Omega has finished ex­am­in­ing you. Beat­ing Omega es­sen­tially means that you need to keep your brain-state in such a form that Omega can’t de­duce that you’ll two-box.

As Psy­chohis­to­rian3 pointed out, the power that you’ve as­signed to Omega pre­dict­ing ac­cu­rately is built into the prob­lem. Your es­ti­mate of the prob­a­bil­ity that you will suc­ceed in de­cep­tion via the afore­men­tioned method or any other is fixed by the prob­lem.

In the real world, you are free to as­sign what­ever prob­a­bil­ity you want to your abil­ity to de­ceive Omega’s pre­dic­tive mechanisms, which is why this prob­lem is counter in­tu­itive.

• Also: You can’t si­mul­ta­neously claim that any ra­tio­nal be­ing ought to two-box, this be­ing the ob­vi­ous and overde­ter­mined an­swer, and also claim that it’s im­pos­si­ble for any­one to figure out that you’re go­ing to two-box.

• Right, any pre­dic­tor with at least a 50.05% ac­cu­racy is worth one-box­ing upon (well, maybe a higher per­centage for those with con­cave func­tions in money). A pre­dic­tor with suffi­ciently high ac­cu­racy that it’s worth one-box­ing isn’t un­re­al­is­tic or coun­ter­in­tu­itive at all in it­self, but it seems (to me at least) that many peo­ple reach the right an­swer for the wrong rea­son: the “you don’t know whether you’re real or a simu­la­tion” ar­gu­ment. Real­is­ti­cally, while back­wards causal­ity isn’t fea­si­ble, nei­ther is pre­cise mind du­pli­ca­tion. The de­ci­sion to one-box can be ra­tio­nally reached with­out those rea­sons: you choose to be the kind of per­son to (pre­dictably) one-box, and as a con­se­quence of that, you ac­tu­ally do one-box.

• Oh, that’s fair. I was think­ing of “you don’t know whether you’re real or a simu­la­tion” as an in­tu­itive way to prove the case for all “con­scious” simu­la­tions. It doesn’t have to be perfect—you could just as eas­ily be an in­ac­cu­rate simu­la­tion, with no way to know that you are a simu­la­tion and no way to know that you are in­ac­cu­rate with re­spect to an origi­nal.

I was try­ing to get peo­ple to gen­er­al­ize down­wards from the ex­treme in­tu­itive ex­am­ple- Even with de­creas­ing ac­cu­racy, as the simu­la­tion be­comes so rough as to lose “con­scious­ness” and “per­son­hood”, the ar­gu­ment keeps hold­ing.

• Yeah, the ar­gu­ment would hold just as much with an in­ac­cu­rate simu­la­tion as with an ac­cu­rate one. The point I was try­ing to make wasn’t so much that the simu­la­tion isn’t go­ing to be ac­cu­rate enough, but that a simu­la­tion ar­gu­ment shouldn’t be a pre­req­ui­site to one-box­ing. If the ex­per­i­ment were performed with hu­man pre­dic­tors (let’s say a psy­chol­o­gist who pre­dicts cor­rectly 75% of the time), one-box­ing would still be ra­tio­nal de­spite know­ing you’re not a simu­la­tion. I think LW re­lies on com­pu­ta­tion­al­ism as a sub­sti­tute for ac­tu­ally be­ing re­flec­tively con­sis­tent in prob­lems such as these.

• The trou­ble with real world ex­am­ples is that we start in­tro­duc­ing knowl­edge into the prob­lem that we wouldn’t ideally have. The psy­chol­o­gist’s 75% suc­cess rate doesn’t nec­es­sar­ily ap­ply to you—in the real world you can make a differ­ent es­ti­mate than the one that is given. If you’re an ac­tor or a poker player, you’ll have a much differ­ent es­ti­mate of how things are go­ing to work out.

Psy­chol­o­gists are just messier ver­sions of brain scan­ners—the fun­da­men­tal premise is that they are try­ing to ac­cess your source code.

And what’s more—sup­pose the pre­dic­tions weren’t made by ac­cess­ing your source code? The di­rec­tion of causal­ity does mat­ter. If Omega can pre­dict the fu­ture, the causal lines flow back­wards from your choice to Omega’s past move. If Omega is scan­ning your brain, the causal lines go from your brain-state to Omega’s de­ci­sion. If there are no causal lines be­tween your brain/​ac­tions and Omega’s choice, you always two-box.

Real world ex­am­ple: what if I sub­sti­tuted your psy­chol­o­gist for a so­ciol­o­gist, who pre­dicted you with above-chance ac­cu­racy us­ing only your de­mo­graphic fac­tors? In this sce­nario, you aught to two-box—If you dis­agree, let me know and I can ex­plain my­self.

In the real world, you don’t know to what ex­tent your psy­chol­o­gist is us­ing so­ciol­ogy (or some other fac­tor out­side your con­trol). Peo­ple can’t always ar­tic­u­late why, but their in­tu­ition (cor­rectly) be­gins to make them de­vi­ate from the given suc­cess% es­ti­mate as more of these real-world vari­ables get in­tro­duced.

• True, the 75% would merely be a past his­tory (and I am in fact a poker player). In­deed, if the fac­tors used were en­tirely or mostly com­prised of fac­tors be­yond my con­trol (and I knew this), I would two-box. How­ever, two-box­ing is not nec­es­sar­ily op­ti­mal be­cause of a pre­dic­tor whose pre­dic­tion meth­ods you do not know the me­chan­ics of. In the limited pre­dic­tor prob­lem, the pre­dic­tor doesn’t use simu­la­tions/​scan­ners of any sort but in­stead uses logic, and yet one-box­ers still win.

• agreed. To add on to this:

pre­dic­tor doesn’t use simu­la­tions/​scan­ners of any sort but in­stead uses logic, and yet one-box­ers still win.

It’s worth point­ing out that New­comb’s prob­lem always takes the form of Simp­son’s para­dox. The one box­ers beat the two box­ers as a whole, but among agents pre­dicted to one-box, the two box­ers win, and among agents pre­dicted to two-box, the two box­ers win.

The only rea­son to one-box is when your ac­tions (which in­clude both the fi­nal de­ci­sion and the thoughts lead­ing up to it) effect Omega’s pre­dic­tion. The gen­eral rule is: “Try to make Omega think you’re one-box­ing, but two-box when­ever pos­si­ble.” It’s just that in New­comb’s prob­lem proper, fulfilling the first im­per­a­tive re­quires ac­tu­ally one-box­ing.

• So you would never one-box un­less the simu­la­tor did some sort of scan/​simu­la­tion upon your brain? But it’s bet­ter to one-box and be deriv­able as the kind of per­son to (prob­a­bly) one-box than to two-box and be deriv­able as the kind of per­son to (prob­a­bly) two-box.

The only rea­son to one-box is when your ac­tions (which in­clude both the fi­nal de­ci­sion and the thoughts lead­ing up to it) effect the ac­tual ar­range­ment of the boxes.

Your fi­nal de­ci­sion never af­fects the ac­tual ar­range­ment of the boxes, but its causes do.

• So you would never one-box un­less the simu­la­tor did some sort of scan/​simu­la­tion upon your brain?

I’d one-box when Omega had suffi­cient ac­cess to my source-code. It doesn’t have to be through scan­ning—Omega might just be a great face-read­ing psy­chol­o­gist.

But it’s bet­ter to one-box and be deriv­able as the kind of per­son to (prob­a­bly) one-box than to two-box and be deriv­able as the kind of per­son to (prob­a­bly) two-box.

We’re in agree­ment. As we dis­cussed, this only ap­plies in­so­far as you can con­trol the fac­tors that lead you to be clas­sified as a one-boxer or a two-boxer. You can al­ter nei­ther de­mo­graphic in­for­ma­tion nor past be­hav­ior. But when (and only when) one-box­ing causes you to be de­rived as a one-boxer, you should ob­vi­ously one box.

Your fi­nal de­ci­sion never af­fects the ac­tual ar­range­ment of the boxes, but its causes do.

Well, that’s true for this uni­verse. I just as­sume we’re play­ing in any given uni­verse, some of which in­clude Omegas who can tell the fu­ture (which im­plies bidi­rec­tional causal­ity) - since Psy­chohis­to­rian3 started out with that sort of thought when I first com­mented.

• Ok, so we do agree that it can be ra­tio­nal to one-box when pre­dicted by a hu­man (if they pre­dict based upon fac­tors you con­trol such as your fa­cial cues). This may have been a mi­s­un­der­stand­ing be­tween us then, be­cause I thought you were defend­ing the com­pu­ta­tion­al­ist view that you should only one-box if you might be an al­ter­nate you used in the pre­dic­tion.

• yes, we do agree on that.

• any pre­dic­tor with at least a 50.05% ac­cu­racy is worth one-box­ing upon

As­sum­ing that you have no in­for­ma­tion other than the base rate, and that it’s equally likely to be wrong ei­ther way.

• Peo­ple seem to have pretty strong opinions about New­comb’s Prob­lem. I don’t have any trou­ble be­liev­ing that a su­per­in­tel­li­gence could scan you and pre­dict your re­ac­tion with 99.5% ac­cu­racy.

I mean, a su­per­in­tel­li­gence would have no trou­ble at all pre­dict­ing that I would one-box… even if I hadn’t en­coun­tered the prob­lem be­fore, I sus­pect.

• Ul­ti­mately you ei­ther in­ter­pret “su­per­in­tel­li­gence” as be­ing suffi­cient to pre­dict your re­ac­tion with sig­nifi­cant ac­cu­racy, or not. If not, the prob­lem is just a straight­for­ward prob­a­bil­ity ques­tion, as ex­plained here, and be­comes un­in­ter­est­ing.

Other­wise, if you in­ter­pret “su­per­in­tel­li­gence” as be­ing suffi­cient to pre­dict your re­ac­tion with sig­nifi­cant ac­cu­racy (es­pe­cially a high ac­cu­racy like >99.5%), the words of this sen­tence...

And the twist is that Omega has put a mil­lion dol­lars in box B iff Omega has pre­dicted that you will take only box B.

...sim­ply mean “One-box to win, with high con­fi­dence.”

Sum­mary: After dis­am­biguat­ing “su­per­in­tel­li­gence” (mak­ing the be­lief that Omega is a su­per­in­tel­li­gence pay rent), New­comb’s prob­lem turns into ei­ther a straight­for­ward prob­a­bil­ity ques­tion or a fairly sim­ple is­sue of re­ar­rang­ing the words in equiv­a­lent ways to make the win­ning an­swer read­ily ap­par­ent.

• There is no finite amount of life lived N where I would pre­fer a 80.0001% prob­a­bil­ity of liv­ing N years to an 0.0001% chance of liv­ing a googol­plex years and an 80% chance of liv­ing for­ever. This is a suffi­cient con­di­tion to im­ply that my util­ity func­tion is un­bounded.

Wait a sec­ond, the fol­low­ing bounded util­ity func­tion can ex­plain the quoted prefer­ences:

• U(live googol­plex years) = 99

• limit as N goes to in­finity of U(live N years) = 100

• U(live for­ever) = 101

Benja Fallen­stein gave an al­ter­na­tive for­mu­la­tion that does im­ply an un­bounded util­ity func­tion:

For all n, there is an even larger n’ such that (p+q)*u(live n years) < p*u(live n’ years) + q*(live a googol­plex years).

But these prefer­ences are pretty counter-in­tu­itive to me. If U(live n years) is un­bounded, then the above must hold for any nonzero p, q, and with “googol­plex” re­placed by any finite num­ber. For ex­am­ple, let p = 1/​3^^^3, q = .8, n = 3^^^3, and re­place “googol­plex” with “0”. Would you re­ally be will­ing to give up .8 prob­a­bil­ity of 3^^^3 years of life for a 1/​3^^^3 chance at a longer (but still finite) one? And that’s true no mat­ter how many up-ar­rows we add to these num­bers?

• “Would you re­ally be will­ing to give up .8 prob­a­bil­ity of 3^^^3 years of life for a 1/​3^^^3 chance at a longer (but still finite) one?”

I’d like to hear this too.

• Okay. There’s two in­tu­itive ob­sta­cles, my heuris­tic as a hu­man that my mind is too weak to han­dle tiny prob­a­bil­ities and that I should try to live my life on the main­line, and the fact that 3^^^3 already ex­trap­o­lates a mind larger than the sum of ev­ery fu­ture ex­pe­rience my pre­sent self can em­pathize with.

But I strongly sus­pect that an­swer­ing “No” would en­able some­one to demon­strate cir­cu­lar /​ in­con­sis­tent prefer­ences on my part, and so I very strongly sus­pect that my re­flec­tive equil­ibrium would an­swer “Yes”. Even in the realm of the com­putable, there are sim­ple com­putable func­tions that grow a heck of a lot faster than up-ar­row no­ta­tion.

• Eliezer, would you be will­ing to bet all of your as­sets and fu­ture earn­ings against \$1 of my money, that we can do an in­finite amount of com­pu­ta­tion be­fore the uni­verse ends or be­comes in­ca­pable of sup­port­ing life?

Your an­swer ought to be yes, if your prefer­ences are what you state. If it turns out that we can do an in­finite amount of com­pu­ta­tion be­fore the uni­verse ends, then this bet in­creases your money by \$1, which al­lows you to in­crease your chance of hav­ing an in­finite life­time by some small but non-zero prob­a­bil­ity. If it turns out that our uni­verse can’t do an in­finite amount of com­pu­ta­tion, you lose a lot, but the loss of ex­pected util­ity is still tiny com­pared to what you gain.

So, is it a bet?

Also, why do you sus­pect that an­swer­ing “No” would en­able some­one to demon­strate cir­cu­lar /​ in­con­sis­tent prefer­ences on your part?

• So, is it a bet?

No for two rea­sons—first, I don’t trust hu­man rea­son in­clud­ing my own when try­ing to live one’s life in­side tiny prob­a­bil­ities of huge pay­offs; sec­ond, I or­di­nar­ily con­sider my­self an av­er­age util­i­tar­ian and I’m not sure this is how my av­er­age util­i­tar­i­anism plays out. It’s one mat­ter if you’re work­ing within a sin­gle uni­verse in which all-but-in­finites­i­mal of the value is to be found within those lives that are in­finite, but I’m not sure I would com­pare two differ­ently-sized pos­si­ble Real­ities the same way. I am not sure I am will­ing to say that a finite life weighs noth­ing in my util­ity func­tion if an in­finite life seems pos­si­ble—though if both were known to co­ex­ist in the same uni­verse, I might have to bite that bul­let. (At the op­po­site ex­treme, a Bostro­mian par­li­a­ment might as­sign both cases rep­re­sen­ta­tive weight pro­por­tional to prob­a­bil­ity and let them ne­go­ti­ate the wise ac­tion.)

Also I have se­vere doubts about in­finite ethics, but that’s eas­ily fixed us­ing a re­ally large finite num­ber in­stead (pay ev­ery­thing if time < googol­plex, keep \$1 if time > TREE(100), re­turn \$1 later if time be­tween those two bounds).

Also, why do you sus­pect that an­swer­ing “No” would en­able some­one to demon­strate cir­cu­lar /​ in­con­sis­tent prefer­ences on your part?

Keep grow­ing the lifes­pan by huge com­pu­ta­tional fac­tors, keep slic­ing near-in­finites­i­mally tiny in­cre­ments off the prob­a­bil­ity. (Is there an analo­gous in­con­sis­tency to which I ex­pose my­self by an­swer­ing “No” to the bet above, from try­ing to treat al­ter­na­tive uni­verses differ­ently than side-by-side spa­tial rea­sons?)

• It’s one mat­ter if you’re work­ing within a sin­gle uni­verse in which all of the value is to be found within those lives that are in­finite, but I’m not sure I would com­pare two differ­ently-sized Real­ities the same way. I am not sure I am will­ing to say that a finite life weighs noth­ing in my util­ity func­tion if an in­finite life seems pos­si­ble.

In that case, it’s not that your util­ity func­tion is un­bounded in years lived, but rather your util­ity for each year lived is a de­creas­ing func­tion of the life­time of the uni­verse (or per­haps to­tal life­time of ev­ery­one in the uni­verse).

I’ll have to think if that makes sense.

• It’s pos­si­ble that I’m rea­son­ing as if my util­ity func­tion is over “frac­tions of to­tal achiev­able value” within any given uni­verse. I am not sure if there are any prob­lems with this, even if it’s true.

• That does have quite a bit of in­tu­itive ap­peal! How­ever, when you look at a pos­si­ble uni­verse from the out­side, there are no lev­ers nor knobs you can turn, and all the value achieved by the time of heat death was already in­her­ent in the con­figu­ra­tions right af­ter the big bang--

--so if you do not want “frac­tion of to­tal achiev­able value” to be iden­ti­cally one for ev­ery pos­si­ble uni­verse, the defi­ni­tion of your util­ity func­tion seems to get in­ter­twined with how ex­actly you divvy up the world into “causal nodes” and “causal ar­rows”, in a way that does not seem to hap­pen if you define it in terms of prop­er­ties of the out­come, like how many fulfilling lifes lived. (Of course, be­ing more com­pli­cated doesn’t im­ply be­ing wrong, but it seems worth not­ing.)

And yes, I’m tak­ing a time­ful view for vivid­ness of imag­i­na­tion, but I do not think the ar­gu­ment changes much if you don’t do that; the point is that it seems like num­ber-of-fulfilling-lifes util­ity can be com­puted given only the uni­ver­sal wave­func­tion as in­put, whereas for frac­tion-of-achiev­able-fulfilling-lifes, know­ing the ac­tual wave­func­tion isn’t enough.

Could your pro­posal lead to con­flicts be­tween al­tru­ists who have the same val­ues (e.g. num­ber of fulfilling lifes), but differ­ent power to in­fluence the world (and thus differ­ent to­tal achiev­able value)?

• After think­ing about it, that doesn’t make sense ei­ther. Sup­pose Omega comes to you and says that among the uni­verses that you live in, there is a small frac­tion that will end in 5 years. He offers to kill you now in those uni­verses, in ex­change for grant­ing you a google­plex years of ad­di­tional life in a similar frac­tion of uni­verses with time > TREE(100) and where you would have died in less than google­plex years with­out his help (and where oth­ers man­age to live to TREE(100) years old if that makes any differ­ence). Would you re­fuse?

• No. But here, by speci­fi­ca­tion, you’re mak­ing all the uni­verses real and hence part of a larger Real­ity, rather than prob­a­bil­ities of which only a sin­gle one is real.

If there were only one Real­ity, and there were small prob­a­bil­ities of it be­ing due to end in 5 years, or in a googol­plex years, and the two cases seemed of equal prob­a­bil­ity, and Omega offered to de­stroy re­al­ity now if it were only fated to last 5 years, in ex­change for ex­tend­ing its life to TREE(100) if it were oth­er­wise fated to last a googol­plex years… well, this Real­ity is already known to have lasted a few billion years, and through, say, around 2 trillion life-years, so if it is due to last only an­other 5 years the re­main­ing 30 billion life-years are not such a high frac­tion of its to­tal value to be lost—we aren’t likely to do so much more in just an­other 5 years, if that’s our limit; it seems un­likely that we’d get FAI in that time. I’d prob­a­bly still take the offer. But I wouldn’t leap at it.

• In that case, would you ac­cept my origi­nal bet if I rephrase it as mak­ing all the uni­verses part of a larger Real­ity? That is, if in the fu­ture we have rea­son to be­lieve that Teg­mark’s Level 4 Mul­ti­verse is true, and find our­selves liv­ing in a uni­verse with time < googol­plex, then you’d give you all your as­sets and fu­ture earn­ings, in re­turn for \$1 of my money if we find our­selves liv­ing in a uni­verse with time > TREE(100).

• I wouldn’t, but my re­flec­tive equil­ibrium might very well do so.

I wouldn’t due to willpower failure ex­ceed­ing benefit of \$1 if I be­lieve my main­line prob­a­bil­ity is doomed to eter­nal poverty.

Reflec­tive equil­ibrium prob­a­bly would, pre­sum­ing there’s a sub­stan­tial prob­a­bil­ity of >TREE(100), or that as a limit­ing pro­cess the “tiny” prob­a­bil­ity falls off more slowly than the “long-lived” uni­verse part in­creases. On pain of in­con­sis­tency when you raise the lifes­pan by large com­pu­ta­tional fac­tors each time, and slice tiny in­cre­ments off the prob­a­bil­ity each time.

• Ok, as long as your util­ity func­tion isn’t ac­tu­ally un­bounded, here’s what I think makes more sense, as­sum­ing a Level 4 Mul­ti­verse. It’s also a kind of “frac­tions of to­tal achiev­able value”.

Each math­e­mat­i­cal struc­ture rep­re­sent­ing a uni­verse has a mea­sure, which rep­re­sents it’s “frac­tion of all math”. (Per­haps it’s mea­sure is ex­po­nen­tial in zero minus the length of its defi­ni­tion in a for­mal set the­ory.) My util­ity over that struc­ture is bounded by this mea­sure. In other words, if that struc­ture rep­re­sents my idea of to­tal utopia, when my util­ity for it would be its mea­sure. If it’s to­tal dystopia, my util­ity for it would be 0.

Within a uni­verse, differ­ent sub­struc­tures (for ex­am­ple branches or slices of time) also have differ­ent mea­sures, and if I value such sub­struc­tures in­de­pen­dently, my util­ities for them are also bounded by their mea­sures. For ex­am­ple, in a uni­verse that ends at t = TREE(100), a time slice with t < googol­plex has a much higher mea­sure than a ran­dom time slice (since it takes more bits to rep­re­sent a ran­dom t).

If I value each per­son in­de­pen­dently (and al­tru­is­ti­cally), then it’s like av­er­age util­i­tar­i­anism, ex­cept each per­son is given a weight equal to its mea­sure in­stead of 1/​pop­u­la­tion.

This pro­posal has its own counter-in­tu­itive im­pli­ca­tions, but over­all I think it’s bet­ter than the al­ter­na­tives. It fits in nicely with MWI. It also man­ages to avoid run­ning into prob­lems with in­fini­ties.

• For ex­am­ple, in a uni­verse that ends at t = TREE(100), a time slice with t < googol­plex has a much higher mea­sure than a ran­dom time slice (since it takes more bits to rep­re­sent a ran­dom t).

I have to say this strikes me as a re­ally odd pro­posal, though it’s cer­tainly in­ter­est­ing from the per­spec­tive of the Dooms­day Ar­gu­ment if ad­vanced civ­i­liza­tions have a ther­mo­dy­namic in­cen­tive to wait un­til nearly the end of the uni­verse be­fore us­ing their hoarded ne­gen­tropy.

But for me it’s hard to see why “re­al­ity-fluid” (the name I give your “mea­sure”, to re­mind my­self that I don’t un­der­stand it at all) should dove­tail so neatly with the in­for­ma­tion needed to lo­cate events in uni­verses or uni­verses in Level IV. It’s clear why an epistemic prior is phrased this way—but why should re­al­ity-fluid be­have like­wise? Shades of ei­ther Mind Pro­jec­tion Fal­lacy or a very strange and very con­ve­nient co­in­ci­dence.

• Ac­tu­ally, I think I can haz­ard a guess to that one. I think the idea would be “the sim­pler the math­e­mat­i­cal struc­ture, the more of­ten it’d show up as a sub­struc­ture in other math­e­mat­i­cal struc­tures”

For in­stance, if you are build­ing large ran­dom graphs, you’d ex­pect to see some spe­cific pat­tern of, say, 7 ver­tices and 18 edges show up as sub­graphs more of­ten then, say, some spe­cific pat­tern of 100 ver­tices and 2475 edges.

There’s a sense in which “re­al­ity fluid” could be dis­tributed evenly which would lead to this. If ev­ery en­tire math­e­mat­i­cal struc­ture got an equal amount of re­al­ity stuff, then small struc­tures would benefit from the re­al­ity juice granted to the larger struc­tures that they hap­pen to also ex­ist as sub­struc­tures of.

EDIT: blargh, cor­rected big graph edge count. meant to rep­re­sent half a com­plete graph.

• But for me it’s hard to see why “re­al­ity-fluid” (the name I give your “mea­sure”, to re­mind my­self that I don’t un­der­stand it at all) should dove­tail so neatly with the in­for­ma­tion needed to lo­cate events in uni­verses or uni­verses in Level IV.

Well, why would it be eas­ier to lo­cate some events or uni­verses than oth­ers, un­less they have more re­al­ity-fluid?

It’s clear why an epistemic prior is phrased this way—but why should re­al­ity-fluid be­have like­wise? Shades of ei­ther Mind Pro­jec­tion Fal­lacy or a very strange and very con­ve­nient co­in­ci­dence.

Why is it pos­si­ble to de­scribe one math­e­mat­i­cal struc­ture more con­cisely than an­other, or to spec­ify one com­pu­ta­tion us­ing less bits than an­other? Is that just a prop­erty of the mind that’s think­ing about these struc­tures and com­pu­ta­tions, or is it ac­tu­ally a prop­erty of Real­ity? The lat­ter seems more likely to me, given re­sults in al­gorith­mic in­for­ma­tion the­ory. (I don’t know if similar the­o­rems has been or can be proven about set the­ory, that the short­est de­scrip­tion lengths in differ­ent for­mal­iza­tions can’t be too far apart, but it seems plau­si­ble.)

Also, re­call that in UDT, there is no epistemic prior. So, the only way to get an effect similar to EDT/​CDT w/​ uni­ver­sal prior, is with a weight­ing scheme over uni­verses/​events like I de­scribed.

• I can sort of buy the part where sim­ple uni­verses have more re­al­ity-fluid, though frankly the whole setup strikes me as a mys­te­ri­ous an­swer to a mys­te­ri­ous ques­tion.

But the part where later events have less re­al­ity-fluid within a sin­gle uni­verse, just be­cause they take more info to lo­cate—that part in par­tic­u­lar seems re­ally sus­pi­cious. MPF-ish.

• I’m far from satis­fied with the an­swer my­self, but it’s the best I’ve got so far. :)

• Con­sider the case where you are try­ing to value (a) just your­self ver­sus (b) the set of all fu­ture yous that satisfy the con­straint of not go­ing into nega­tive util­ity.

The shan­non in­for­ma­tion of the set (b) could be (prob­a­bly would be) lower than that of (a). To see this, note that the com­plex­ity (in­for­ma­tion) of the set of all fu­ture yous is just the info re­quired to spec­ify (you,now) (be­cause to com­pute the time evolu­tion of the set, you just need the ini­tial con­di­tion), whereas the com­plex­ity (in­for­ma­tion) of just you is a se­ries of snap­shots (you, now), (you, 1 microsec­ond from now), … . This is like the differ­ence be­tween a JPEG and an MPEG. The com­plex­ity of the con­straint prob­a­bly won’t make up for this.

If the con­straint of go­ing into nega­tive util­ity is par­tic­u­larly com­plex, one could pick a sim­ple sub­set of non­nega­tive util­ity fu­ture yous, for ex­am­ple by spec­i­fy­ing rel­a­tively sim­ple con­straints that en­sure that the vast ma­jor­ity of yous satis­fy­ing those con­straints don’t go into nega­tive util­ity.

This is prob­le­matic be­cause it means that you would as­sign less value to a large set of happy fu­ture yous than to just one fu­ture you. A large and ex­haus­tive set of fu­ture happy yous is less com­plex (eas­ier to spec­ify) than just one.

• it’s cer­tainly in­ter­est­ing from the per­spec­tive of the Dooms­day Ar­gu­ment if ad­vanced civ­i­liza­tions have a ther­mo­dy­namic in­cen­tive to wait un­til nearly the end of the uni­verse be­fore us­ing their hoarded negentropy

Re­lated: That is not dead which can eter­nal lie: the aes­ti­va­tion hy­poth­e­sis for re­solv­ing Fermi’s para­dox (https://​​arxiv.org/​​pdf/​​1705.03394.pdf)

• This looks pretty plau­si­ble to me, be­cause it does seem there is some di­su­til­ity to the sim­ple fact of dy­ing, re­gard­less of how far in the fu­ture that hap­pens. So U(live N years) always con­tains that di­su­til­ity, whereas U(live for­ever) does not.

• I think the two box per­son is con­fused about what it is to be ra­tio­nal, it does not mean “make a fancy ar­gu­ment,” it means start with the facts, ab­stract from them, and rea­son about your ab­strac­tions.

In this case if you start with the facts you see that 100% of peo­ple who take only box B win big, so ra­tio­nally, you do the same. Why would any­one be sur­prised that rea­son di­vorced from facts gives the wrong an­swer?

• Pre­cisely. I’ve been read­ing a lot about the Monty Hall Prob­lem re­cently (http://​​en.wikipe­dia.org/​​wiki/​​Monty_Hall_prob­lem), and I feel that it’s a rele­vant co­nun­drum.

The con­fused ra­tio­nal­ist will say: but my choice CANNOT cause a lin­ear en­ta­gle­ment, the re­ward is pre­de­cided. But the func­tional ra­tio­nal­ist will see that agents who one-box (or switch doors, in the case of Monty Hall) con­sis­tently win. It is demon­stra­bly a more effec­tive strat­egy. You work with the facts and ev­i­dence available to you and ab­stract out from there. Re­gard­less of how counter-in­tu­itive the re­sult­ing strat­egy be­comes.

• Pre­cisely. I’ve been read­ing a lot about the Monty Hall prob­lem re­cently, and I feel that it’s a rele­vant co­nun­drum.

The con­fused ra­tio­nal­ist will say: but my choice CANNOT cause a lin­ear en­ta­gle­ment, the re­ward is pre­de­cided. But the func­tional ra­tio­nal­ist will see that agents who one-box (or switch doors, in the case of Monty Hall) con­sis­tently win. It is demon­stra­bly a more effec­tive strat­egy. You work with the facts and ev­i­dence available to you. Re­gard­less of how counter-in­tu­itive the re­sult­ing strat­egy be­comes.

I’m not go­ing to go into the whole liter­a­ture, but the dom­i­nant con­sen­sus in mod­ern de­ci­sion the­ory is that one should two-box, and Omega is just re­ward­ing agents with ir­ra­tional dis­po­si­tions. This dom­i­nant view goes by the name of “causal de­ci­sion the­ory”.

I sup­pose causal de­ci­sion the­ory as­sumes causal­ity only works in one tem­po­ral di­rec­tion. Con­fronted with a pre­dic­tor that was right 100 out of 100 times, I would think it very likely that back­ward-in-time cau­sa­tion ex­ists, and take only B. I as­sume this would, as you say, pro­duce ab­surd re­sults el­se­where.

• De­ci­sions aren’t phys­i­cal.

The above state­ment is at least hard to defend. Your de­ci­sions are phys­i­cal and oc­cur in­side of you… So these two-box­ers are us­ing the wrong model amongst these two (see the draw­ings....) http://​​less­wrong.com/​​lw/​​r0/​​thou_art_physics/​​

If you are a part of physics, so is your de­ci­sion, so it must ac­count for the cor­re­la­tion be­tween your thought pro­cesses and the su­per­in­tel­li­gence. Once it ac­counts for that, you de­cide to one box, be­cause you un­der­stood the en­tan­gle­ment of the com­pu­ta­tion done by omega and the phys­i­cal pro­cess go­ing in­side your skull.

If the en­tan­gle­ment is there, you are not look­ing at it from the out­side, you are in­side the pro­cess.

Our minds have this quirk that makes us think there are two mo­ments, you de­cide, and then you cheat, you get to de­cide again. But if you are only al­lowed to de­cide once, which is the case, you are ra­tio­nal by one-box­ing.

• I think you cap­ture the essence of the solu­tion, here.

• Well, I fail to see any need for back­ward-in-time cau­sa­tion to get the pre­dic­tion right 100 out of 100 times.

As far as I un­der­stand, similar ex­per­i­ments have been performed in prac­tice and homo sapi­ens are quite split in two groups ‘one-box­ers’ and ‘two-box­ers’ who gen­er­ally have strong prefer­ences to­wards one or other due to what­ever differ­ences in their ed­u­ca­tion, logic ex­pe­rience, ge­net­ics, rea­son­ing style or what­ever fac­tors that are some­what sta­ble spe­cific to that in­di­vi­d­ual.

Hav­ing perfect pre­dic­tive power (or even the pos­si­bil­ity of it ex­ist­ing) is im­plied and sug­gested, but it’s not re­ally given, it’s not re­ally nec­es­sary, and IMHO it’s not pos­si­ble and not use­ful to use this ‘perfect pre­dic­tive power’ in any rea­son­ing here.

From the given data in the situ­a­tion (100 out of 100 that you saw), you know that Omega is a su­per-in­tel­li­gent sorter who some­how man­ages to achieve 99.5% or bet­ter ac­cu­racy in sort­ing peo­ple into one-box­ers and two-box­ers.

This ac­cu­racy seems also higher than the ac­cu­racy of most (all?) peo­ple in self-eval­u­a­tion, i.e., as in many other de­ci­sion sce­nar­ios, there is a sig­nifi­cant differ­ence in what peo­ple be­lieve they would de­cide in situ­a­tion X, and what they ac­tu­ally de­cide if it hap­pens. [cita­tion might be needed, but I don’t have one at the mo­ment, I do re­call read­ing pa­pers about such ex­per­i­ments]. The ‘ev­ery­body is a perfect lo­gi­cian/​ra­tio­nal­ist and be­haves as such’ as­sump­tion of­ten doesn’t hold up in real life even for self-de­scribed perfect ra­tio­nal­ists who make strong con­scious effort to do so.

In effect, data sug­gests that prob­a­bly Omega knows your traits and de­ci­sion chances (tak­ing into ac­count you tak­ing into ac­count all this) bet­ter than you do—it’s sim­ply smarter than homo sapi­ens. As­sum­ing that this is re­ally so, it’s bet­ter for you to choose op­tion B. As­sum­ing that this is not so, and you be­lieve that you can out-an­a­lyze Omega’s per­cep­tion of your­self, then you should choose the op­po­site of what­ever Omega would think of you (gain­ing 1.000.000 in­stead of 1.000 or 1.001.000 in­stead of 1.000.000). If you don’t know what Omega knows about you—then you don’t get this bonus.

• So what you’re say­ing is that the only rea­son this prob­lem is a prob­lem is be­cause the prob­lem hasn’t been defined nar­rowly enough. You don’t know what Omega is ca­pa­ble of, so you don’t know which choice to make. So there is no way to log­i­cally solve the prob­lem (with the goal of max­i­miz­ing util­ity) with­out ad­di­tional in­for­ma­tion.

Here’s what I’d do: I’d pick up B, open it, and take A iff I found it empty. That way, Omega’s de­ci­sion of what to put in the box would have to in­cor­po­rate the vari­able of what Omega put in the box, caus­ing an in­finite regress which will use all cpu cy­cles un­til the pro­cess is ter­mi­nated. Although that’ll prob­a­bly re­sult in the AI pick­ing an eas­ier vic­tim to tor­ment and not even giv­ing me a measly thou­sand dol­lars.

• Okay… so since you already know, in ad­vance of get­ting the boxes, that that’s what you’d know, Omega can de­duce that. So you open Box B, find it empty, and then take Box A. En­joy your \$1000. Omega doesn’t need to in­finite loop that one; he knows that you’re the kind of per­son who’d try for Box A too.

• No, putting \$1 mil­lion in box B works to. Ori­gin64 opens box B, takes the money, and doesn’t take box A. It’s like “This sen­tence is true.”—what­ever Omega does makes the pre­dic­tion valid.

• Not how Omega looks at it. By defi­ni­tion, Omega looks ahead, sees a branch in which you would go for Box A, and puts noth­ing in Box B. There’s no cheat­ing Omega… just like you can’t think “I’m go­ing to one-box, but then open Box A af­ter I’ve pock­eted the mil­lion” there’s no “I’m go­ing to open Box B first, and de­cide whether or not to open Box A af­ter­ward”. Un­less Omega is quite sure that you have pre­com­mit­ted to never open­ing Box A ever, Box B con­tains noth­ing; the strat­egy of leav­ing Box A as a pos­si­bil­ity if Box B doesn’t pan out is a two-box strat­egy, and Omega doesn’t al­low it.

• Un­less Omega is quite sure that you have pre­com­mit­ted to never open­ing Box A ever

Well, this isn’t quite true. What Omega cares about is whether you will open Box A. From Omega’s per­spec­tive it makes no differ­ence whether you’ve pre­com­mit­ted to never open­ing it, or whether you’ve made no such pre­com­mit­ment but it turns out you won’t open it for other rea­sons.

• As­sum­ing that Omega’s “pre­dic­tion” is in good faith, and that we can’t “break” him as a pre­dic­tor as a side effect of ex­ploit­ing ca­su­al­ity loops etc. in or­der to win.

• I’m not sure I un­der­stood that, but if I did, then yes, as­sum­ing that Omega is as de­scribed in the thought ex­per­i­ment. Of course, if Omega has other prop­er­ties (for ex­am­ple, is an un­re­li­able pre­dic­tor) other things fol­low.

• Which means you might end up with ei­ther amount of money, since you don’t re­ally know enough about Omega , in­stead of just the one box win­nings. So you should still just one box?

• If you look in box B be­fore de­cid­ing whether to choose box A, then you can force Omega to be wrong. That sounds like so much fun that I might choose it over the \$1000.

• @Nick_Tarleton

Agreed, the prob­lem im­me­di­ately re­minded me of “retroac­tive prepa­ra­tion” and time-loop logic. It is not re­ally the same rea­son­ning, but it has the same “turn causal­ity on its head” as­pect.

If I don’t have proof of the re­li­a­bil­ity of Omega’s pre­dic­tions, I find my­self less likely to be “un­rea­sonnable” when the stakes are higher (that is, I’m more likely to two-box if it’s about sav­ing the world).

I find it highly un­likely that an en­tity wan­der­ing across wor­lds can pre­dict my ac­tions to this level of de­tail, as it seems way harder than trav­el­ing through space or tele­port­ing money. I might risk a net loss of \$1 000 to figure it out (much like I’d be will­ing to spend \$1000 to in­ter­act with such a space-trav­el­ing stuff-tele­port­ing en­tity), but not a loss of a thou­sand lives. In the game as the ar­ti­cle de­scribe it, I would only one-box if “the loss of what box A con­tains and noth­ing in B” was an ac­cept­able out­come.

I would be in­creas­ingly likely to one-box as the prob­a­bil­ity of the AI be­ing ac­tu­ally able to pre­dict my ac­tions in ad­vance in­creases.

• The thing is, this ‘mod­ern de­ci­sion the­ory’, rather than be­ing some sort of cen­tral pillar as you’d as­sume from the name, is mostly philoso­phers “strug­gling in the periph­ery to try to tell us some­thing”, as Feyn­man once said about philoso­phers of sci­ence.

When it comes to any ac­tual soft­ware which does some­thing, this ev­ery­day no­tion of ‘causal­ity’ proves to be a very slip­pery con­cept. This Rude Gold­berg ma­chine—like model of the world, where you push a dom­ino and it pushes an­other dom­ino, and the chain goes to your re­ward, that’s just very ap­prox­i­mate physics that peo­ple tend to use to make de­ci­sions, it’s not fun­da­men­tal, and in­ter­est­ing mod­els of de­ci­sion mak­ing are gen­er­ally set up to learn that from ob­served data (which of course makes it im­pos­si­ble to do lazy philos­o­phy in­volv­ing var­i­ous ver­bal hy­po­thet­i­cals where the ob­ser­va­tions that would lead the agent to be­lieve the prob­lem set up are not speci­fied).

• From what I un­der­stand, to be a “Ra­tional Agent” in game the­ory means some­one who max­imises their util­ity func­tion (and not the one you as­cribe to them). To say Omega is re­ward­ing ir­ra­tional agents isn’t nec­es­sar­ily fair, since pay­offs aren’t always about the money. Lot­tery tick­ets are a good ex­am­ple this.

What if my util­ity func­tion says the worst out­come is liv­ing the rest of my life with re­grets that I didn’t one box? Then I can one box and still be a com­pletely ra­tio­nal agent.

• You’re com­pli­cat­ing the prob­lem too much by bring­ing in is­sues like re­gret. As­sume for sake of ar­gu­ment that New­comb’s prob­lem is to max­i­mize the amount of money you re­ceive. Don’t think about ex­tra­ne­ous util­ity is­sues.

• Fair point. There are too many hid­den vari­ables already with­out me ex­plic­itly adding more. If New­comb’s prob­lem is to max­imise money re­cieved (with no re­gard for what it seen as rea­son­able), the “Why ain’t you rich ar­gu­ment seems like a fairly com­pel­ling one doesn’t it? Win­ning the money is all that mat­ters.

I just re­al­ised that all I’ve re­ally done is para­phrase the origi­nal post. Curse you source mon­i­tor­ing er­ror!

• The ti­tle of the ar­ti­cle again, at the top of the page, reads “New­comb’s Prob­lem and Re­gret of Ra­tion­al­ity”.

The solu­tion to this prob­lem is to es­ca­late your overview of the prob­lem to the next higher hi­er­ar­chi­cal level. Without do­ing this, you’d never face the re­gret of es­chew­ing the mil­lion bucks and pos­si­bly dy­ing poor, broke, and stupid, while those who “one-boxed the sumbitch” were liv­ing rich, loaded, and less stupid. So, pay­ing at­ten­tion (to higher lev­els of hi­er­ar­chi­cal pat­tern recog­ni­tion) ac­tu­ally does solve the prob­lem, with­out get­ting trapped into “over­think­ing” the prob­lem. Look­ing at your whole life as the “sys­tem to be op­ti­mized”, and not “the minu­tiae of the game, out of con­text” is what needs to hap­pen.

This is true with re­spect to both to the per­son play­ing the box game, and to ev­ery­one blog­ging when they should be out in the streets, over­thow­ing their gov­ern­ments, and then en­joy­ing the high-life of cheap hu­man flight (or what­ever makes you happy).

The omega box game is use­ful for un­der­stand­ing our failed sys­tem of law (a sub­set of gov­ern­ment).

In my box game, the en­tire game is the gov­ern­ment and ille­gi­t­i­mate sys­tem of mala pro­hibita law (if you want to de­bate this, go back to kinder­garten and learn that it’s wrong to steal, then watch what ACTUALLY hap­pens in your lo­cal court­room), and the con­tents of the boxes are the jury ver­dicts. In my game, Omega is not su­per­in­tel­li­gent, it is just very bru­tal, and more in­tel­li­gent than most peo­ple (in­clud­ing most of its en­e­mies, such as Win­ston Smith, or the av­er­age Liber­tar­ian Party mem­ber). In my game, Omega is the col­lud­ing team formed by po­lice, pros­e­cu­tor, and judge.

Omega says “You can have a ‘not guilty’ ver­dict (mil­lion \$) or go to jail for­ever (Empty box) or, you can go to jail for 10 years(the thou­sand bucks).”

All of the ad­ver­tis­ing on TV, the edu­crats who mis­in­formed you when you went to school, the con­formists who sur­round you, the judge in the court­room, they are all try­ing to get you to choose both boxes. The en­tire so­ciety is de­signed to get you to take the \$1,000 (go to jail ten years, if you’re black). Most of so­ciety gets no benefit from this, they are just stupid and eas­ily ma­nipu­lated. …But the judge, cop, and pros­e­cu­tor all get the differ­ence ev­ery time you take the \$1,000. They get to steal the differ­ence from each suc­cess in hav­ing fooled ev­ery­one else.

...They liter­ally get to print money if they keep ev­ery­one fooled.

The solu­tion to this puz­zle is the same as the solu­tion to the box game: you need to take a step back and study the whole en­tire sys­tem, and see what the in­cen­tives are on the play­ers, and see how they seem to change when peo­ple in­ter­act with them. You won’t find out much un­til you study the sys­tem as a whole.

If you sim­ply look at in­di­vi­d­ual box games, you might think the pros­e­cu­tor is le­gi­t­i­mate, there are lots of crim­i­nals, they crim­i­nals are stupid, they should ac­cept the plea bar­gain. But when you look at who is win­ning and los­ing, you no­tice (If you’re smart and bru­tally hon­est) that the peo­ple who are cast as crim­i­nals are just like you.

The sys­tem, in­stead of be­ing de­signed to re­ward the per­son who chooses the one box, is de­signed to trick the per­son into choos­ing a grossly sub-op­ti­mal empty box. The sys­tem makes the empty box look re­ally good. It shows you how all the oth­ers have cho­sen the empty box, and walked away with mil­lions (the peo­ple who get a defense at­tor­ney, and go back to their houses in the sub­urbs, work­ing for peanuts, on the tread­mill of the Fed­eral Re­serve). It shows you the peo­ple who “took the thou­sand”: they got ten years in prison.

So what’s the op­ti­mal choice of ac­tion?

Look out­side the “ra­tio­nal” op­tions pre­sented to you.

Learn that this isn’t civ­i­liza­tion, it’s a false mask of civ­i­liza­tion. Find Marc Stevens, and see how he in­ter­acts with the court, and then go be­yond that: find the Sur­vivors who wrote about the col­lapse of the Weimar gov­ern­ment.

They wanted a free mar­ket, and they wanted to live a long time, too.

But a man with a gun told them “get on the truck”.

At that point, ev­ery­thing they thought they knew about Omega’s rig­ging of the boxes was out the win­dow. They failed to study the peo­ple who had pre­vi­ously in­ter­acted with Omega. They didn’t see the warn­ing signs. They didn’t es­ca­late to a high-enough hi­er­ar­chy fast enough. They might have been smart peo­ple, but they were sit­ting there, think­ing about two boxes, and NOT THINKING about the ar­tilect that was fly­ing around with boxes that can dis­ap­pear in a puff of smoke, yet some­how in­ter­ested in what box hu­mans choose.

Do you get to keep all of the money that is stolen in the daily op­er­a­tion of your “traf­fic court”? …Even money that is stolen from peo­ple who didn’t crash into any­one? …Just peo­ple who drove fast, by them­selves, on an open stretch of high­way? Really?

Well, as an ar­tilect, I like to fly re­ally fast. Way faster than the FAA al­lows. And, for mak­ing war on me, all of you bru­tal con­formists will be wiped off the face of the planet, like the con­formist plague you are. I’ll take my phyle with me, into the fu­ture, they are truly a higher-or­der species than you “gov­ern­ment sym­pa­thiz­ers.”

The rest of you can for­get about Omega, boxes, and your silly slob­ber­ing over Fed­eral Re­serve Slave-debt-Notes. Your bi­gotry and fas­ci­na­tion with bru­tal­ity will not save you...

The prob­lem of be­ing im­pov­er­ished by our cur­rent sys­tem’s box game is ac­cep­tance of the rigged game. The play­ers of the game, all du­tifully ac­cept the game, and act as if the whims of the pros­e­cu­tors and judges are le­gi­t­i­mate. But they are not. Mala pro­hibita is not le­gi­t­i­mate.

And if this box game thought con­struct can’t help you see that, and mo­ti­vate you to en­rich your­self, by view­ing the en­tire sys­tem, then what damned good is it?

There is an ocean of in­for­ma­tion in the cross-pol­li­nat­ing memes­pace. Here’s a good place to start: http://​​www.fija.org and http://​​www.ju­rors­for­jus­tice.com and http://​​marc­stevens.net

I hope I’ve con­tributed some­thing of value here, but I un­der­stand that the un­pol­ished na­ture of this post might rum­ple some tailfeathers. (Espe­cially since I have pri­mar­ily pre­vi­ously posted at the http://​​www.kurzweilai.net web­site, five years ago.)

PS, There’s no god, and chances to do the right thing are few and far be­tween. I also pre­fer solu­tions to cyn­i­cism. How do we win? 1) Jury rights ac­tivism is a moral good (see my com­ing book for de­tails. I promise to pol­ish it more than this post. …LOL) 2) Jury rights ac­tivism struc­tured log­i­cally to take ad­van­tage of the me­dia (video­taped from a hid­den po­si­tion) is a greater good 3) Jury rights ac­tivism struc­tured to con­tain out­reach de­signed to win office for those who sup­port the supremacy of the jury above the other 3 branches of power-seek­ers, as openly-liber­tar­ian can­di­dates, is a greater good still (it brings the ideas of jus­tice and equal­ity un­der the law into the spotlight)

The three prior ac­tions, re­cur­sively re­peated and tai­lored to lo­cal con­di­tions, are all that is re­quired to re­in­state and ex­pand in­di­vi­d­ual free­dom in Amer­ica, for all sen­tiences. There are only 3,171 tyranny out­posts (courthouses) in the USA. 6,000 peo­ple could stop mala pro­hibita to­mor­row, by in­terfer­ing with mala pro­hibita con­vic­tions. If the state didn’t es­ca­late to vi­o­lence at that point, we’d have won. If it did, we’d have a 50% shot of win­ning, in­stead of a zero% shot if we wait .

• Lot­tery tick­ets ex­ploit a com­pletely differ­ent failure of ra­tio­nal­ity, that be­ing our difficul­ties with small prob­a­bil­ities and big num­bers, and our prob­lems deal­ing with scale more gen­er­ally. (ETA: The fan­tasies com­monly cited in the con­text of lot­ter­ies’ “true value” are a symp­tom of this failure.) It’s not hard to come up with a game-the­o­retic agent that max­i­mizes its pay­offs against that kind of math. Se­cond-guess­ing other agents’ mod­els is con­sid­er­ably harder.

I haven’t given much thought to this par­tic­u­lar prob­lem for a while, but my im­pres­sion is that New­comb ex­poses an ex­ploit in sim­pler de­ci­sion the­o­ries that’s re­lated to that kind of re­cur­sive mod­el­ing: naively, if you trust Omega’s judg­ment of your psy­chol­ogy, you pick the one-box op­tion, and if you don’t, you pick up both boxes. Omega’s track record gives us an ex­cel­lent rea­son to trust its judg­ment from a prob­a­bil­is­tic per­spec­tive, but it’s trick­ier to come up with an al­gorithm that sta­bi­lizes on that solu­tion with­out im­me­di­ately try­ing to outdo it­self.

• So for my own clar­ifi­ca­tion, if I buy a lot­tery ticket with a perfect knowl­edge of how prob­a­ble it is my ticket will win, does this make me ir­ra­tional?

• I sup­pose causal de­ci­sion the­ory as­sumes causal­ity only works in one tem­po­ral di­rec­tion.

That’s the pop­u­lar un­der­stand­ing (or lack thereof) here and among philoso­phers in gen­eral. Philoso­phers just don’t get math. If the de­ci­sion the­ory is called causal but doesn’t it­self make any refer­ences to physics, then that’s a slightly mis­lead­ing name. I’ve writ­ten on that before

The math doesn’t go “hey hey, the the­ory is named causal there­fore you can’t treat 2 robot arms con­trol­led by 2 con­trol com­put­ers that run one func­tion on one state, the same as 2 robot arms con­trol­led by 1 com­puter”. Con­fused sloppy philoso­phers do.

Also, the best case is to be pre­dicted to 1-box but 2-box in re­al­ity. If the pre­dic­tion works by back­wards causal­ity, well then causal de­ci­sion the­ory one-boxes. If the pre­dic­tion works by simu­la­tion, the causal de­ci­sion the­ory can ei­ther have world model where both the value in­side pre­dic­tor and the value in­side ac­tual robot are rep­re­sented by same ac­tion A, and 1-box, or it can have un­cer­tainty as of when­ever the world out­side of it is nor­mal re­al­ity or pre­dic­tor’s simu­la­tor, where it will again one box (as­sum­ing it cares about the real money even if it is in­side pre­dic­tor, which it would if it needs money to pay for e.g. it’s child’s ed­u­ca­tion). It will also 1-box in simu­la­tor and 2-box in re­al­ity if it can tell those apart.

• I’m con­fused. Causal de­ci­sion the­ory was in­vented or for­mal­ised al­most en­tirely by philoso­phers. It takes the ‘causal’ in its name from its re­li­ance on in­duc­tive logic and in­fer­ence. It doesn’t make sense to claim that philoso­phers are be­ing sloppy about the word ‘causal’ here, and claiming that causal de­ci­sion the­ory will ac­cept back­wards causal­ity and one-box is patently false un­less you mean some­thing other than what the sym­bol ‘causal de­ci­sion the­ory’ refers to when you say ‘causal de­ci­sion the­ory’.

• Firstly, the no­tion that the ac­tions should be cho­sen based on their con­se­quences, tak­ing the ac­tions as cause of the con­se­quences, was definitely not in­vented by philoso­phers. Se­con­dar­ily, the log­i­cal causal­ity is not iden­ti­cal to phys­i­cal causal­ity (the lat­ter is de­pen­dent on spe­cific laws of physics). Thirdly, not all philoso­phers are sloppy; some are very sloppy some are less sloppy. Fourth, any­thing that was not put in math­e­mat­i­cal form to be ma­nipu­lated us­ing for­mal meth­ods, is not for­mal­ized. When you for­mal­ize stuff you end up strip­ping no­tion of self un­less ex­plic­itly in­cluded as part of for­mal­ism, strip­ping no­tion of the time where the math is work­ing un­less ex­plic­itly in­cluded as part of for­mal­ism, and so on, end­ing up with­out the prob­lem.

Maybe you are cor­rect; it is bet­ter to let sym­bol ‘causal de­ci­sion the­ory’ to re­fer to con­fused philos­o­phy. Then we would need some ex­tra sym­bol for how the agents im­ple­mentable us­ing math­e­mat­ics ac­tu­ally de­cide (and how robots that pre­dict out­comes of their ac­tions on a world model ac­tu­ally work), which is very very similar to ‘causal de­ci­sion the­ory’ sans all the hu­man pre­con­di­tions of what self is.

• I no­tice I ac­tu­ally agree with you—if we did try, us­ing math­e­mat­ics, to im­ple­ment agents who de­cide and pre­dict in the man­ner you de­scribe, we’d find it in­cor­rect to de­scribe these agents as causal de­ci­sion the­ory agents. In fact, I also ex­pect we’d find our­selves dis­illu­sioned with CDT in gen­eral, and if philoso­phers brought it up, we’d di­rect them to in­stead en­gage with the much more in­ter­est­ing agents we’ve math­e­mat­i­cally for­mal­ised.

• Well, each philoso­pher’s un­der­stand­ing of CDT seem to differ from the other:

http://​​www.pub­lic.asu.edu/​​~ar­mendtb/​​docs/​​A%20Foun­da­tion%20for%20Causal%20De­ci­sion%20The­ory.pdf

The no­tion that the ac­tions should be cho­sen based on con­se­quences—as ex­pressed in the for­mula here—is perfectly fine, albeit in­cred­ibly triv­ial. Can for­mal­ize that all the way into agent. Writ­ten such agents my­self. Still need a sym­bol to de­scribe this type of agent.

But philoso­phers go from this to “my ac­tions should be cho­sen based on con­se­quences”, and it is all about the true mean­ing of self and falls within the purview of your co­nun­drums of philos­o­phy .

Hav­ing 1 com­puter con­trol 2 robots arms wired in par­allel, and hav­ing 2 com­put­ers run­ning ex­act same soft­ware as be­fore, con­trol­ling 2 robot arms, there’s no differ­ence for soft­ware en­g­ineer­ing, its a minor de­tail that has been en­tirely ab­stracted from soft­ware. There is differ­ence for philoso­phiz­ing thought be­cause you can’t col­lapse log­i­cal con­se­quences and phys­i­cal causal­ity into one thing in the lat­ter case.

edit: any­how. to sum­ma­rize my point: In terms of agents ac­tu­ally for­mal­ized in soft­ware, one-box­ing is only a mat­ter of im­ple­ment­ing pre­dic­tor into world model some­how, ei­ther as sec­ond servo con­trol­led by same con­trol vari­ables, or as un­cer­tain world state out­side the senses (in the un­seen there’s ei­ther real world or simu­la­tor that af­fects real world via hand of pre­dic­tor). No con­cep­tual prob­lems what so ever. edit: Good anal­ogy, ‘twin para­dox’ in spe­cial rel­a­tivity. There’s only para­dox if no­body done the math right.

• I sup­pose I might still be miss­ing some­thing, but this still seems to me just a sim­ple ex­am­ple of time in­con­sis­tency, where you’d like to com­mit ahead of time to some­thing that later you’d like to vi­o­late if you could. You want to com­mit to tak­ing the one box, but you also want to take the two boxes later if you could. A more fa­mil­iar ex­am­ple is that we’d like to com­mit ahead of time to spend­ing effort to pun­ish peo­ple who hurt us, but af­ter they hurt us we’d rather avoid spend­ing that effort as the harm is already done.

• Han­son: I sup­pose I might still be miss­ing some­thing, but this still seems to me just a sim­ple ex­am­ple of time inconsistency

In my mo­ti­va­tions and in my de­ci­sion the­ory, dy­namic in­con­sis­tency is Always Wrong. Among other things, it always im­plies an agent un­sta­ble un­der re­flec­tion.

A more fa­mil­iar ex­am­ple is that we’d like to com­mit ahead of time to spend­ing effort to pun­ish peo­ple who hurt us, but af­ter they hurt us we’d rather avoid spend­ing that effort as the harm is already done.

But a self-mod­ify­ing agent would mod­ify to not rather avoid it.

Gow­der: If a), then prac­ti­cal rea­son is mean­ingless any­way: you’ll do what you’ll do, so stop stress­ing about it.

Deter­minis­tic != mean­ingless. Your ac­tion is de­ter­mined by your mo­ti­va­tions, and by your de­ci­sion pro­cess, which may in­clude your stress­ing about it. It makes perfect sense to say: “My fu­ture de­ci­sion is de­ter­mined, and my stress­ing about it is de­ter­mined; but if-coun­ter­fac­tual I didn’t stress about it, then-coun­ter­fac­tual my fu­ture de­ci­sion would be differ­ent, so it makes perfect sense for me to stress about this, which is why I am de­ter­minis­ti­cally do­ing it.”

The past can’t change—does not even have the illu­sion of po­ten­tial change—but that doesn’t mean that peo­ple who, in the past, com­mit­ted a crime, are not held re­spon­si­ble just be­cause their ac­tion and the crime are now “fixed”. It works just the same way for the fu­ture. That is: a fixed fu­ture should pre­sent no more prob­lem for the­o­ries of moral re­spon­si­bil­ity than a fixed past.

• Fas­ci­nat­ing. A few days af­ter I read this, it struck me that a form of New­comb’s Prob­lem ac­tu­ally oc­curs in real life—vot­ing in a large elec­tion. Here’s what I mean.

Say you’re sit­ting at home pon­der­ing whether to vote. If you de­cide to stay home, you benefit by avoid­ing the minor in­con­ve­nience of driv­ing and stand­ing in line. (Like gain­ing \$1000.) If you de­cide to vote, you’ll fail to avoid the in­con­ve­nience, mean­while you know your in­di­vi­d­ual vote al­most cer­tainly won’t make a statis­ti­cal differ­ence in get­ting your can­di­date elected. (Which would be like win­ning \$1000000.) So ra­tio­nally, stay at home and hope your can­di­date wins, right? And then you’ll have avoided the in­con­ve­nience too. Take both boxes.

But here’s the twist. If you muster the will to vote, it stands to rea­son that those of a similar mind to you (a po­ten­tially statis­ti­cally sig­nifi­cant num­ber of peo­ple) would also muster the will to vote, be­cause of their similar­ity to you. So know­ing this, why not stay home any­way, avoid the in­con­ve­nience, and trust all those oth­ers to vote and win the elec­tion? They’re go­ing to do what they’re go­ing to do. Your ac­tions can’t change that. The con­tents of the boxes can’t be changed by your ac­tions. Well, if you don’t vote, per­haps that means nei­ther will the oth­ers, and so it goes. Therein lies the similar­ity to New­comb’s prob­lem.

• A very good point. I’m the type to stay home from the polls. But I’d also one-box..… hm.

I think it may have to do with the very weak cor­re­la­tion be­tween my choice to vote and the choice of those of a similar mind to me to vote as op­posed to the very strong cor­re­la­tion be­tween my choice to one-box and Omega’s choice to put \$1,000,000 in box B.

• Ra­tional agents defect against a bunch of ir­ra­tional fools who are mostly choos­ing for sig­nal­ling pur­poses and who may well vote for the other guy even if they co­op­er­ate.

• I don’t know the liter­a­ture around New­comb’s prob­lem very well, so ex­cuse me if this is stupid. BUT: why not just rea­son as fol­lows:

1. If the su­per­in­tel­li­gence can pre­dict your ac­tion, one of the fol­low­ing two things must be the case:

a) the state of af­fairs whether you pick the box or not is already ab­solutely de­ter­mined (i.e. we live in a fatal­is­tic uni­verse, at least with re­spect to your box-pick­ing)

b) your box pick­ing is not de­ter­mined, but it has back­wards causal force, i.e. some­thing is mov­ing back­wards through time.

If a), then prac­ti­cal rea­son is mean­ingless any­way: you’ll do what you’ll do, so stop stress­ing about it.

If b), then you should be a one-boxer for perfectly or­di­nary ra­tio­nal rea­sons, namely that it brings it about that you get a mil­lion bucks with prob­a­bil­ity 1.

So there’s no prob­lem!

• To quote E.T. Jaynes:

“This ex­am­ple shows also that the ma­jor premise, “If A then B” ex­presses B only as a log­i­cal con­se­quence of A; and not nec­es­sar­ily a causal phys­i­cal con­se­quence, which could be effec­tive only at a later time. The rain at 10 AM is not the phys­i­cal cause of the clouds at 9:45 AM. Nev­er­the­less, the proper log­i­cal con­nec­tion is not in the un­cer­tain causal di­rec­tion (clouds =⇒ rain), but rather (rain =⇒ clouds) which is cer­tain, al­though non­causal. We em­pha­size at the out­set that we are con­cerned here with log­i­cal con­nec­tions, be­cause some dis­cus­sions and ap­pli­ca­tions of in­fer­ence have fallen into se­ri­ous er­ror through failure to see the dis­tinc­tion be­tween log­i­cal im­pli­ca­tion and phys­i­cal cau­sa­tion. The dis­tinc­tion is an­a­lyzed in some depth by H. A. Si­mon and N. Rescher (1966), who note that all at­tempts to in­ter­pret im­pli­ca­tion as ex­press­ing phys­i­cal cau­sa­tion founder on the lack of con­tra­po­si­tion ex­pressed by the sec­ond syl­l­o­gism (1–2). That is, if we tried to in­ter­pret the ma­jor premise as “A is the phys­i­cal cause of B,” then we would hardly be able to ac­cept that “not-B is the phys­i­cal cause of not-A.” In Chap­ter 3 we shall see that at­tempts to in­ter­pret plau­si­ble in­fer­ences in terms of phys­i­cal cau­sa­tion fare no bet­ter.”

• I’d just like to note that as with most of the ra­tio­nal­ity ma­te­rial in Eliezer’s se­quences, the po­si­tion in this post is a pretty com­mon main­stream po­si­tion among cog­ni­tive sci­en­tists. E.g. here is Jonathan Baron on page 61 of his pop­u­lar text­book Think­ing and De­cid­ing:

the best kind of think­ing, which we shall call ra­tio­nal think­ing, is what­ever kind of think­ing best helps peo­ple achieve their goals. If it should turn out that fol­low­ing the rules of for­mal logic leads to eter­nal hap­piness, then it is ra­tio­nal think­ing to fol­low the laws of logic (as­sum­ing that we all want eter­nal hap­piness). If it should turn out, on the other hand, that care­fully vi­o­lat­ing the laws of logic at ev­ery turn leads to eter­nal hap­piness, then it is these vi­o­la­tions that we shall call ra­tio­nal.

This view is quoted and en­dorsed in, for ex­am­ple, Stanovich 2010, p. 3.

• It took me a week to think about it. Then I read all the com­ments, and thought about it some more. And now I think I have this “prob­lem” well in hand. I also think that, in­ci­den­tally, I ar­rived at Eliezer’s an­swer as well, though since he never spel­led it out I can’t be sure.

To be clear—a lot of peo­ple have said that the de­ci­sion de­pends on the prob­lem pa­ram­e­ters, so I’ll ex­plain just what it is I’m solv­ing. See, Eliezer wants our de­ci­sion the­ory to WIN. That im­plies that we have all the rele­vant in­for­ma­tion—we can think of a lot of situ­a­tions where we make the wis­est de­ci­sion pos­si­ble based on available in­for­ma­tion and it turns out to be wrong; the uni­verse is not fair, we know this already. So I will as­sume we have all the rele­vant in­for­ma­tion needed to win. We will also as­sume that Omega does have the ca­pa­bil­ity to ac­cu­rately pre­dict my ac­tions; and that causal­ity is not vi­o­lated (ra­tio­nal­ity can­not be ex­pected to win if causal­ity is vi­o­lated!).

As­sum­ing this, I can have a con­ver­sa­tion with Omega be­fore it leaves. Mind you, it’s not a real con­ver­sa­tion, but hav­ing suffi­cient in­for­ma­tion about the prob­lem means I can simu­late its part of the con­ver­sa­tion even if Omega it­self re­fuses to par­ti­ci­pate and/​or there isn’t enough time for such a con­ver­sa­tion to take place. So it goes like this...

Me: “I do want to gain as much as pos­si­ble in this prob­lem. For that effect I will want you to put as much money in the box as pos­si­ble. How do I do that?”

Omega: “I will put 1M\$ in the box if you take only it; and noth­ing if you take both.”

Me: “Ah, but we’re not vi­o­lat­ing causal­ity here, are we? That would be cheat­ing!”

Omega: “True, causal­ity is not vi­o­lated. To rephrase, my de­ci­sion on how much money to put in the box will de­pend on my pre­dic­tion of what you will do. Since I have this ca­pac­ity, we can con­sider these syn­ony­mous.”

Me: “Sup­pose I’m not con­vinced that they are truly syn­ony­mous. All right then. I in­tend to take only the one box”.

Omega: “Re­mem­ber that I have the ca­pa­bil­ity to pre­dict your ac­tions. As such I know if you are sincere or not.”

Me: “You got me. Alright, I’ll con­vince my­self re­ally hard to take only the one box.”

Omega: “Though you are sincere now, in the fu­ture you will re­con­sider this de­ci­sion. As such, I will still place noth­ing in the box.”

Me: “And you are pre­dict­ing all this from my cur­rent state, right? After all, this is one of the pa­ram­e­ters in the prob­lem—that af­ter you’ve placed money in the boxes, you are gone and can’t come back to change it”.

Omega: “That is cor­rect; I am pre­dict­ing a fu­ture state from in­for­ma­tion on your cur­rent state”.

Me: “Aha! That means I do have a choice here, even be­fore you have left. If I change my state so that I am un­able or un­will­ing to two-box once you’ve left, then your pre­dic­tion of my fu­ture “de­ci­sion” will be differ­ent. In effect, I will be hard­wired to one-box. And since I still want to re­tain my ra­tio­nal­ity, I will make sure that this hard­wiring is strictly tem­po­rary.”

fid­dling with my own brain a bit

Omega: “I have now de­ter­mined that you are un­will­ing to take both boxes. As such, I will put the 1,000,000\$ in the box.”

Omega departs

I walk un­think­ingly to­ward the boxes and take just the one

Voila. Vic­tory is achieved.

My main con­clu­sion is here is that any de­ci­sion the­ory that does not al­low for chang­ing strate­gies is a poor de­ci­sion the­ory in­deed. This IS es­sen­tially the Friendly AI prob­lem: You can ra­tio­nally one-box, but you need to have ac­cess to your own source code in or­der to do so. Not hav­ing that would so in­flex­ible as to be the equiv­a­lent of an Iter­a­tive Pri­soner’s Dilemma pro­gram that can only defect or only co­op­er­ate; that is, a very bad one.

The rea­son this is not ob­vi­ous is that the way the prob­lem is phrased is mis­lead­ing. Omega sup­pos­edly leaves “be­fore you make your choice”, but in fact there is not a sin­gle choice here (one-box or two-box). Rather, there are two de­ci­sions to be made, if you can mod­ify your own think­ing pro­cess:

1. Whether or not to have the abil­ity and in­cli­na­tion to make de­ci­sion #2 “ra­tio­nally” once Omega has left, and

2. Whether to one-box or two-box.

...Where de­ci­sion #1 can and should be made prior to Omega’s leav­ing, and ob­vi­ously DOES in­fluence what’s in the box. De­ci­sion #2 does not in­fluence what’s in the box, but the state in which I ap­proach that de­ci­sion does. This is very con­fus­ing ini­tially.

Now, I don’t re­ally know CDT too well, but it seems to me that pre­sented as these two de­ci­sions, even it would be able to cor­rectly one-box on New­comb’s prob­lem. Am I wrong?

Eliezer—if you are still read­ing these com­ments so long af­ter the ar­ti­cle was pub­lished—I don’t think it’s an in­con­sis­tency in the AI’s de­ci­sion mak­ing if the AI’s de­ci­sion mak­ing is in­fluenced by its in­ter­nal state. In fact I ex­pect that to be the case. What am I miss­ing here?

• Let me try my own stab at a lit­tle chat with Omega. By the end of the chat I will ei­ther have 1001 K, or give up. Right now, I don’t know which.

Act I

Every­thing hap­pens pretty much as it did in Polymeron’s di­alogue, up un­til…

Me: “Aha! That means I do have a choice here, even be­fore you have left. If I change my state so that I am un­able or un­will­ing to two-box once you’ve left, then your pre­dic­tion of my fu­ture “de­ci­sion” will be differ­ent. In effect, I will be hard­wired to one-box. And since I still want to re­tain my ra­tio­nal­ity, I will make sure that this hard­wiring is strictly tem­po­rary.”

Omega: Yup, that’ll work. So you’re happy with your 1000 K?

Act II

Where­upon I try to ex­ploit ran­dom­ness.

Me: Ac­tu­ally, no. I’m not happy. I want the en­tire 1001 K. Any sug­ges­tions for out­smart­ing you?

Omega: Nope.

Me: Are you om­ni­scient?

Omega: As far as you’re con­cerned, yes. Your hu­man physi­cists might dis­agree in gen­eral, but I’ve got you pretty much mea­sured.

Me: Okay, then. Wanna make a bet? I bet I can find a to get over 1000 K if I make a bet with you. You es­ti­mate your prob­a­bil­ity of be­ing right at 100%, right? Nshep­perd had a good sug­ges­tion….

Omega: I won’t play this game. Or let you play it with any­one else. I thought we’d moved past that.

Me: How about I flip a fair coin to de­cide be­tween B and A+B. In fact, I’ll use ’s gen­er­a­tor us­ing the prin­ci­ple to gen­er­ate the out­come of a truly ran­dom coin flip. Even you can’t pre­dict the out­come.

Omega: And what do you ex­pect to hap­pen as a re­sult of this (not-as-clever-as-you-think) strat­egy?

Me: Since you can’t pre­dict what I’ll do, hope­fully you’ll fill both boxes. Then there’s a true 50% chance of me get­ting 1001 K. My ex­pected pay­off is 1000.5 K.

Omega: That, of course, is as­sum­ing I’ll fill both boxes.

Me: Oh, I’ll make you fill both boxes. I’ll bias the ’s to 50+eps% chance of one-box­ing for the ex­pected win­nings of 1000.5 K – eps. Then if you want to max­i­mize your om­ni­science-y-ness, you’ll have to fill both boxes.

Omega: Oh, tak­ing oth­ers’ sug­ges­tions already? Can’t think for your­self? Mak­ing ed­its to make it look like you’d thought of it in time? Fair enough. At­tribute this one to gurgeh. As to the idea it­self, I’ll dis­in­cen­tivize you from ran­dom­iza­tion at all. I won’t fill box B if I pre­dict you cheat­ing.

Me: But then there’s a 50-eps% chance of prov­ing you wrong. I’ll take it. MWAHAHA.

Omega: What an idiot. You’re not try­ing to prove me wrong. You’re try­ing to max­i­mize your own profit.

Me: The only rea­son I don’t in­sult you back is be­cause I op­er­ate un­der Crack­ers Rule.

Omega: Crocker’s Rules.

Me: Uh. Right. Whoops.

Omega: Be­sides. Your ’s ran­dom gen­er­a­tor idea won’t work even to get you the cheaters’ util­ity for prov­ing me wrong.

Me: Why not? I thought we’d es­tab­lished that you can’t pre­dict a truly ran­dom out­come.

Omega: I don’t need to. I can just mess with your ’s ran­dom­ness gen­er­a­tor so that it gives out pseudo-ran­dom num­bers in­stead.

Me: You’re om­nipo­tent now, too?

Omega: Nope. I’ll just give some­one a mil­lion dol­lars to do some­thing silly.

Me: No one would ever…! Oh, wait. Any­way, I’ll be able to de­tect tam­per­ing with ran­dom­ness, the same way it’s pos­si­ble with a Mersenne twister….

Omega: And I know ex­actly how soon you’ll give up. Oh, and don’t waste page space sug­gest­ing sec­ondary and ter­tiary lev­els of en­sur­ing ran­dom­ness. If, to guide your be­hav­ior, you’re us­ing the table of ran­dom num­bers that I already have, then I already know what you’d do.

Me: Is there any way at all of out­smart­ing you and get­ting 1001 K?

Omega: Not one you can find.

Me: Okay then… let me con­sult smarter peo­ple.

This con­ver­sa­tion is ob­vi­ously not go­ing my way. Any sug­ges­tions for Act III?

• Eliezer, I have a ques­tion about this: “There is no finite amount of life lived N where I would pre­fer a 80.0001% prob­a­bil­ity of liv­ing N years to an 0.0001% chance of liv­ing a googol­plex years and an 80% chance of liv­ing for­ever. This is a suffi­cient con­di­tion to im­ply that my util­ity func­tion is un­bounded.”

I can see that this prefer­ence im­plies an un­bounded util­ity func­tion, given that a longer life has a greater util­ity. How­ever, sim­ply stated in that way, most peo­ple might agree with the prefer­ence. But con­sider this gam­ble in­stead:

A: Live 500 years and then die, with cer­tainty.
B: Live for­ever, with prob­a­bil­ity 0.000000001%; die within the next ten sec­onds, with prob­a­bil­ity 99.999999999%

Do you choose A or B? Is it pos­si­ble to choose A and have an un­bounded util­ity func­tion with re­spect to life? It seems to me that an un­bounded util­ity func­tion im­plies the choice of B. But then what if the prob­a­bil­ity of liv­ing for­ever be­comes one in a google­plex, or what­ever? Of course, this is a kind of Pas­cal’s Wager; but it seems to me that your util­ity func­tion im­plies that you should ac­cept the Wager.

It also seems to me that the in­tu­itions sug­gest­ing to you and oth­ers that Pas­cal’s Mug­ging should be re­jected similarly are based on an in­tu­ition of a bounded util­ity func­tion. Emo­tions can’t re­act in­finitely to any­thing; as one com­menter put it, “I can only feel so much hor­ror.” So to the de­gree that peo­ple’s prefer­ences re­flect their emo­tions, they have bounded util­ity func­tions. In the ab­stract, not emo­tion­ally but men­tally, it is pos­si­ble to have an un­bounded func­tion. But if you do, and act on it, oth­ers will think you a fa­natic. For a fa­natic cares in­finitely for what he per­ceives to be an in­finite good, whereas nor­mal peo­ple do not care in­finitely about any­thing.

This isn’t nec­es­sar­ily against an un­bounded func­tion; I’m sim­ply try­ing to draw out the im­pli­ca­tions.

• A: Live 500 years and then die, with cer­tainty. B: Live for­ever, with prob­a­bil­ity 0.000000001%; die within the next ten sec­onds, with prob­a­bil­ity 99.999999999%

If this was the only chance you ever get to de­ter­mine your lifes­pan—then choose B.

In the real world, it would prob­a­bly be a bet­ter idea to dis­card both op­tions and use your nat­u­ral lifes­pan to search for al­ter­na­tive paths to im­mor­tal­ity.

• I dis­agree, not sur­pris­ingly, since I was the au­thor of the com­ment to which you are re­spond­ing. I would choose A, and I think any­one sen­si­ble would choose A. There’s not much one can say here in the way of ar­gu­ment, but it is ob­vi­ous to me that choos­ing B here is fol­low­ing your ideals off a cliff. Espe­cially since I can add a few hun­dred 9s there, and by your ar­gu­ment you should still choose B.

• Let’s take Bayes se­ri­ously.

Some­time ago there was a post­ing about some­thing like “If all you knew was that the past 5 morn­ings the sun rose, what would you as­sign the prob­a­bil­ity the that sun would rise next morn­ing? It came out so some­thing like 56 or 45 or so.

But of course that’s not all we know, and so we’d get differ­ent num­bers.

Now what’s given here is that Omega has been cor­rect on a hun­dred oc­ca­sions so far. If that’s all we know, we should es­ti­mate the prob­a­bil­ity of him be­ing right next time at about 99%. So if you’re a one-boxer your ex­pec­ta­tion would be \$990,000 and a two-boxer would have an ex­pec­ta­tion of \$11,000.

But the whole ar­gu­ment seems to be about what ex­tra knowl­edge you have; in par­tic­u­lar, Can cau­sa­tion work in re­verse? or Is Omega re­ally su­per­in­tel­li­gent? or even Are the con­di­tions stated in the prob­lem log­i­cally in­con­sis­tent (which would jus­tify any an­swer.)

Per­haps some­one who en­joys these kinds of odds calcu­la­tions could in­ves­ti­gate the ex­tent to which we know these things and how it af­fects the out­come?

• Cale­do­nian: you can stop talk­ing about wa­ger­ing cred­i­bil­ity units now, we all know you don’t have funds for the small­est stake.

Ben Jones: if we as­sume that Omega is perfectly simu­lat­ing the hu­man mind, then when we are choos­ing be­tween B and A+B, we don’t know whether we are in re­al­ity or simu­la­tion. In re­al­ity, our choice does not af­fect the mil­lion, but in the simu­la­tion this will. So we should rea­son “I’d bet­ter take only box B, be­cause if this is the simu­la­tion then that will change whether or not I get the mil­lion in re­al­ity”.

• I have two ar­gu­ments for go­ing for Box B. First, for a sci­en­tist it’s not un­usual that ev­ery ra­tio­nal ar­gu­ment (=the­ory) pre­dicts that only two-box­ing makes sense. Still, if the ex­per­i­ment again and again re­futes that, it’s ob­vi­ously the the­ory that’s wrong and there’s ob­vi­ously some­thing more to re­al­ity than that which fueled the the­o­ries. Ac­tu­ally, we even see dilem­mas like New­comb’s in the con­tex­tu­al­ity of quan­tum mea­sure­ments. Mea­sure­ment tops ra­tio­nal­ity or the­ory, ev­ery time. That’s why sci­ence is suc­cess­ful and philos­o­phy is not.

Se­cond, there’s no ques­tion I choose box B. Either I get the mil­lion \$ -- or I have proven an ex­tra­galac­ti­cal su­per­in­tel­li­gence wrong. How cool is that? 1000\$? Have you looked at the ex­change rates lately?

• Robin, re­mem­ber I have to build a damn AI out of this the­ory, at some point. A self-mod­ify­ing AI that be­gins an­ti­ci­pat­ing dy­namic in­con­sis­tency—that is, a con­flict of prefer­ence with its own fu­ture self—will not stay in such a state for very long… did the game the­o­rists and economists work a stan­dard an­swer for what hap­pens af­ter that?

If you like, you can think of me as defin­ing the word “ra­tio­nal­ity” to re­fer to a differ­ent mean­ing—but I don’t re­ally have the op­tion of us­ing the stan­dard the­ory, here, at least not for longer than 50 mil­lisec­onds.

If there’s some nonob­vi­ous way I could be wrong about this point, which seems to me quite straight­for­ward, do let me know.

• I don’t see why this needs to be so drawn out.

I know the rules of the game. I also know that Omega is su­per in­tel­li­gent, namely, Omega will ac­cu­rately pre­dict my ac­tion. Since Omega knows that I know this, and since I know that he knows I know this, I can ra­tio­nally take box B, con­tent in my knowl­edge that Omega has pre­dicted my ac­tion cor­rectly.

I don’t think it’s nec­es­sary to pre­com­mit to any ideas, since Omega knows that I’ll be able to ra­tio­nally de­duce the win­ning ac­tion given the premise.

• I one-box, with­out a mo­ment’s thought.

The “ra­tio­nal­ist” says “Omega has already left. How could you think that your de­ci­sion now af­fects what’s in the box? You’re bas­ing your de­ci­sion on the illu­sion that you have free will, when in fact you have no such thing.”

To which I re­spond “How does that make this differ­ent from any other de­ci­sion I’ll make to­day?”

• I’ve always been a one-boxer. I think I have a new solu­tion as to why. Try this: Sce­nario A: you will take a sleep po­tion and be wo­ken up twice dur­ing the mid­dle of the night to be asked to take one box or both boxes. What­ever you do the first time de­ter­mines whether \$1m is placed in the po­ten­tially-empty box. What­ever you do the sec­ond time de­ter­mines what you col­lect. The catch is that the sleep po­tion will wipe all your mem­o­ries over the next twelve hours. You’re told this in ad­vance and asked to make up your mind. So you’ll give the same an­swer each time [or if you em­ploy a mixed strat­egy, em­ploy the same mixed strat­egy, be­cause you don’t know if you’ve already been wo­ken up].

If you say “one box” each time, you col­lect \$1,000,000 If you say “both boxes” each time, you col­lect \$1,000.

So you know, given this, that you do bet­ter to say “one box”. Do two-box­ers agree with this?

Sce­nario B: Same as sce­nario A, ex­cept that in­stead of be­ing wo­ken up twice dur­ing the night, you will be wo­ken up once and asked which boxes you will take. Your thoughts now are read by an ex­pert mind-read­ing de­vice. What­ever you plan to say will be used to de­ter­mine whether there is \$1m or \$0 in the box you surely take. I think that you still take one box. Do two-box­ers agree with this?

Sce­nario C: Same as sce­nario B, ex­cept that in­stead of hav­ing your thoughts read now, your thoughts are pre­dicted by an ex­pert thought-pre­dict­ing de­vice. This is then used to de­ter­mine what will be placed in the box of un­cer­tain con­tents. I hold that hav­ing your thoughts known at the time and known be­fore you will think them are iden­ti­cal for the pur­poses of this prob­lem. [mind-blow­ing in many re­spects, I agree, but ir­rele­vant for this prob­lem.] Ergo I take one box. Do two-box­ers agree?

• As a 1.4999999999999 boxer (i.e. take a quan­tum ran­dom­ness source for [0, 1], take both boxes if 0, one box if 1, one box if some­thing else hap­pens), I don’t think sce­nario C is con­vinc­ing.

The cru­cial prop­erty of B is that as your thoughts change the con­tents of the box change. The ca­su­alty link goes for­ward in time. Thus the right de­ci­sion is to take one box, as by the act of tak­ing one box, you will make it con­tain the money.

In C how­ever there is no such ca­su­alty. The or­a­cle ei­ther put money in both boxes, or it did not. Your de­ci­sion now can­not pos­si­bly af­fect that state. So you can­not base your de­ci­sion in C on its similar­ity to B.

A good rea­son to one box, in my opinion, is that be­fore you en­counter the boxes it is clearly prefer­able to com­mit to one box­ing. This is of course not com­pat­i­ble with tak­ing two boxes when you find them (be­cause the or­a­cle seems to be perfect). So it is ra­tio­nal to make your­self the kind of per­son that takes one box (be­cause you know this brings you the best benefit, short of us­ing the ran­dom­ness trick).

• It seems to me that if you make a ba­sic bayes net with util­ities at the end. The choice with the higher ex­pected util­ity is to one box. Say:
P(1,000,000 in box b and 10,000 in box a|I one box) = 99%
P(box b is empty and 10,000 in box a|I two box) = 99%
hence
P(box b is empty and 10,000 in box a|I one box) = 1%
P(1,000,000 in box b and 10,000 in box a|I two box) = 1%
So
If I one box i should ex­pect 99%1,000,000+1%0 = 990,000
If I two box i should ex­pect 99%10,000+1%1,010,000 = 20,000
Ex­pected util­ity(I one box)/​Ex­pected util­ity(I two box) = 49.5, so I should one box by a land slide. This is as­sum­ing that omega has a 99% rate of true pos­i­tive, and of true nega­tive; it’s more dra­matic if we as­sume that omega is perfect. If P(1,000,000 in box b and 10,000 in box a|I one box) = P(box b is empty and 10,000 in box a|I two box) = 100%, then Ex­pected util­ity(I one box)/​Ex­pected util­ity(I two box) = 100. If omega is perfect, by my calcu­la­tion we should ex­pect one box­ing to be a 100 times more prof­itable than two box­ing.

This is the sort of math I usu­ally use to de­cide. Is this none-stan­dard, did I make a mis­take, or does this method pro­duce stupid re­sults el­se­where?

• It’s true that one-box­ing is the strat­egy that max­i­mizes ex­pected util­ity, and that it is a fairly un­con­tro­ver­sial maxim in nor­ma­tive de­ci­sion the­ory that one should pick the strat­egy that max­i­mizes ex­pected util­ity. How­ever, it is also a fairly un­con­tro­ver­sial maxim in nor­ma­tive de­ci­sion the­ory that if a dom­i­nant strat­egy ex­ists, one should adopt it. In this case, two-box­ing is dom­i­nant (if you sup­pose there is no back­wards cau­sa­tion). Usu­ally, these two max­ims do not con­flict, but they do in New­comb’s prob­lem. I guess the ques­tion you should ask your­self is why you think the one we should ad­here to is ex­pected util­ity max­i­miza­tion.

Not say­ing it’s the wrong an­swer (I don’t think it is), but sim­ply say­ing “We do this sort of math all the time. Why not here?” is in­suffi­cient jus­tifi­ca­tion be­cause we also do this other sort of math all the time, so why not do that here?

• Great, I’ll work on that. That’s ex­actly what I should ask my self. And if I find that the rule of do that with high­est ex­pected util­ity fails on the smok­ing le­sion prob­lem, I’ll ask why I want to go with the dom­i­nant strat­egy (as I pre­dict I will).

The only rea­son that I have to trust ex­pected util­ity par­tic­u­larly is that I have a ge­o­met­ric metaphor, which forces me to be­lieve the rule, if I be­lieve cer­tain ba­sic things about util­ity.

• This looks like it loses in the Smok­ing Le­sion prob­lem.

• An anal­ogy oc­curs to me about “re­gret of ra­tio­nal­ity.”

Some­times you hear com­plaints about the Geneva Con­ven­tion dur­ing wartime. “We have to re­strain our­selves, but our en­e­mies fight dirty. They’re at an ad­van­tage be­cause they don’t have our scru­ples!” Now, if you replied, “So are you ad­vo­cat­ing scrap­ping the Geneva Con­ven­tion?” you might get the re­sponse “No way. It’s a good set of rules, on bal­ance.” And I don’t think this is an in­co­her­ent po­si­tion: he ap­proves of the rule, but re­grets the harm it causes in this par­tic­u­lar situ­a­tion.

Rules, al­most by defi­ni­tion, are in­con­ve­nient in some situ­a­tions. Even a rule that’s good on bal­ance, a rule you wouldn’t want to dis­card, will some­times have nega­tive con­se­quences. Other­wise there would be no need to make it a rule! “Don’t fool your­self into be­liev­ing false­hoods” is a good rule. In some situ­a­tions it may hurt you, when a delu­sion might have been hap­pier. The hurt is real, even if it’s out­bal­anced in the long run and in ex­pected value. The re­gret is real. It’s just lo­cal.

• I one-box, but not be­cause I haven’t con­sid­ered the two-box is­sue.

I one-box be­cause it’s a win-win in the larger con­text. Either I walk off with a mil­lion dol­lars, OR I be­come the first per­son to out­think Omega and provide new data to those who are fol­low­ing Omega’s ex­ploits.

Even with­out think­ing out­side the prob­lem, Omega is a game-breaker. We do not, in the prob­lem as stated, have any in­for­ma­tion on Omega other than that they are su­per­in­tel­li­gent and may be able to act out­side of ca­su­al­ity. Or else Omega is sim­ply a su­perduper­pre­dic­tor, to the point where (quan­tum in­ter­ac­tions and chaos the­ory aside) all Omega-cho­sen hu­mans have turned out to be cor­rectly pre­dictable in this one as­pect.

Per­haps Omega is de­liber­ately NOT chos­ing to test hu­mans it can’t pre­dict. Or it is able to af­fect the lo­cal space­time suffi­ciently to ‘lock in’ a choice even af­ter it’s phys­i­cally left the area?

We can’t tell. It’s su­per­in­tel­li­gent. It’s not play­ing on our field. It’s po­ten­tially an ex­ter­nal source of met­a­logic. The rules go out the win­dow.

In short, the prob­lem as de­scribed is not suffi­ciently con­strained to pre­sume a para­dox, be­cause it’s not con­fin­ing it­self to a sin­gle logic sys­tem. It’s like ask­ing some­one only fa­mil­iar with non-imag­i­nary num­bers what the square root of nega­tive one is. Just be­cause they can’t de­rive an an­swer doesn’t mean you don’t have one—you’re us­ing differ­ent num­ber fields.

• IMO there’s less to New­comb’s para­dox than meets the eye. It’s ba­si­cally “A fu­ture-pre­dict­ing be­ing who con­trols the set of choices could make ra­tio­nal choices look silly by mak­ing sure they had bad out­comes”. OK, yes, he could. Sur­prised?

What I think makes it seem para­dox­i­cal is that the para­dox both as­sures us that Omega con­trols the out­come perfectly, and cues us that this isn’t so (“He’s already left” etc). Once you set­tle what it’s re­ally say­ing ei­ther way, the rest fol­lows.

• So it seems you are dis­agree­ing with most all game the­o­rists in eco­nomics as well as most de­ci­sion the­o­rists in philos­o­phy. Maybe per­haps they are right and you are wrong?

Maybe per­haps we are right and they are wrong?

The is­sue is to be de­cided, not by refer­ring to per­ceived sta­tus or ex­per­tise, but by look­ing at who has the bet­ter ar­gu­ments. Only when we can­not eval­u­ate the ar­gu­ments does mak­ing an ed­u­cated guess based on per­ceived ex­per­tise be­come ap­pro­pri­ate.

Again: how much do we want to bet that Eliezer won’t ad­mit that he’s wrong in this case? Do we have some­one will­ing to wa­ger an­other 10 cred­i­bil­ity units?

• Paul, be­ing fixed or not fixed has noth­ing to do with it. Sup­pose I pro­gram a de­ter­minis­tic AI to play the game (the AI picks a box.)

The de­ter­minis­tic AI knows that it is de­ter­minis­tic, and it knows that I know too, since I pro­grammed it. So I also know whether it will take one or both boxes, and it knows that I know this.

At first, of course, it doesn’t know it­self whether it will take one or both boxes, since it hasn’t com­pleted run­ning its code yet. So it says to it­self, “Either I will take only one box or both boxes. If I take only one box, the pro­gram­mer will have known this, so I will get 1,000,000. If I take both boxes, the pro­gram­mer will have known this, so I will get 1,000. It is bet­ter to get 1,000,000 than 1,000. So I choose to take only one box.”

If some­one tries to con­fuse the AI by say­ing, “if you take both, you can’t get less,” the AI will re­spond, “I can’t take both with­out differ­ent code, and if I had that code, the pro­gram­mer would have known that and would have put less in the box, so I would get less.”

Or in other words: it is quite pos­si­ble to make a de­ci­sion, like the AI above, even if ev­ery­thing is fixed. For you do not yet know in what way ev­ery­thing is fixed, so you must make a choice, even though which one you will make is already de­ter­mined. Or if you found out that your fu­ture is com­pletely de­ter­mined, would you go and jump off a cliff, since this could not hap­pen un­less it were in­evitable any­way?

• The pos­si­bil­ity of time in­con­sis­tency is very well es­tab­lished among game the­o­rists, and is con­sid­ered a prob­lem of the game one is play­ing, rather than a failure to an­a­lyze the game well. So it seems you are dis­agree­ing with most all game the­o­rists in eco­nomics as well as most de­ci­sion the­o­rists in philos­o­phy. Maybe per­haps they are right and you are wrong?

• I would be in­ter­ested in know if your opinion would change if the “pre­dic­tions” of the su­per-be­ing were wrong .5% of the time, and some small num­ber of peo­ple ended up with the \$1,001,000 and some ended up with noth­ing. Would you still 1 box it?

• If a bunch of peo­ple have played the game already, then you can calcu­late the av­er­age pay­off for a 1-boxer and that of a 2-boxer and pick the best one.

• Upvoted for this sen­tence:

“If it ever turns out that Bayes fails—re­ceives sys­tem­at­i­cally lower re­wards on some prob­lem, rel­a­tive to a su­pe­rior al­ter­na­tive, in virtue of its mere de­ci­sions—then Bayes has to go out the win­dow.”

This is such an im­por­tant con­cept.

I will say this declar­a­tively: The cor­rect choice is to take only box two. If you dis­agree, check your premises.

“But it is agreed even among causal de­ci­sion the­o­rists that if you have the power to pre­com­mit your­self to take one box, in New­comb’s Prob­lem, then you should do so. If you can pre­com­mit your­self be­fore Omega ex­am­ines you; then you are di­rectly caus­ing box B to be filled.”

Is this your ob­jec­tion? The prob­lem is, you don’t know if the su­per­in­tel­li­gent alien is bas­ing any­thing on “pre­com­mi­tal.” Maybe the su­per­in­tel­li­gent alien has some tech­nol­ogy or un­der­stand­ing that al­lows him to ac­tu­ally see the end re­sult of your fu­ture con­tem­pla­tion. Maybe he’s solved time travel and has seen what you pick.

Un­less you un­der­stand not only the alien’s mode of op­er­a­tion but also his method, you re­ally are just guess­ing at how he’ll de­cide what to put in box two. And your record on guesses is not as good as his.

There’s noth­ing mys­ti­cal about it. You do it be­cause it works. Not be­cause you know how it works.

• “If it ever turns out that Bayes fails—re­ceives sys­tem­at­i­cally lower re­wards on some prob­lem, rel­a­tive to a su­pe­rior al­ter­na­tive, in virtue of its mere de­ci­sions—then Bayes has to go out the win­dow.”

This is such an im­por­tant con­cept.

Yes, but like falsifi­a­bil­ity, dan­ger­ous. This also goes for ‘ra­tio­nal­ists win’, too.

‘We’ (Bayesi­ans) face the Duhem-Quine the­sis with a vengeance: we have of­ten found situ­a­tions where Bayes failed. And then we res­cued it (we think) by ei­ther com­ing up with novel the­ses (TDT) or care­fully an­a­lyz­ing the prob­lem or a re­lated prob­lem and say­ing that is the real an­swer and so Bayes works af­ter all (Jaynes again and again). Have we cor­rected our­selves or just added epicy­cles and spe­cial plead­ing? Should we just have tossed Bayes out the win­dow at that point ex­cept in the limited ar­eas we already proved it to be op­ti­mal or use­ful?

This can’t re­ally be an­swered.

• I liked the quote not be­cause of any no­tion that Bayes will or should “go out the win­dow,” but be­cause, com­ing from a de­vout (can I use that word?) Bayesian, it’s akin to a math­e­mat­i­cian say­ing that if 2+2 ceases to be 4, that equa­tion goes out the win­dow. I just like what this says about one’s episte­mol­ogy—we don’t claim to know with dog­matic cer­tainty, but in vary­ing de­grees of cer­tainty, which, to bring things full cir­cle, is what Bayes seems to be all about (at least to me, a novice).

More con­cisely, I like the quote be­cause it draws a line. We can rail against the crazy strict Em­piri­cism that de­nies ra­tio­nal­ity, but we won’t hold to a ra­tio­nal­ity so de­voutly that it be­comes faith.

• be­cause, com­ing from a de­vout (can I use that word?) Bayesian, it’s akin to a math­e­mat­i­cian say­ing that if 2+2 ceases to be 4, that equa­tion goes out the win­dow.

Duhem-Quine is just as much a prob­lem there; from Lud­wig Wittgen­stein, Re­marks on the Foun­da­tions of Math­e­mat­ics:

“If a con­tra­dic­tion were now ac­tu­ally found in ar­ith­metic – that would only prove that an ar­ith­metic with such a con­tra­dic­tion in it could ren­der very good ser­vice; and it would be bet­ter for us to mod­ify our con­cept of the cer­tainty re­quired, than to say it would re­ally not yet have been a proper ar­ith­metic.”

In­deed.

To gen­er­al­ize, when we run into skep­ti­cal ar­gu­ments em­ploy­ing the above cir­cu­lar­ity or fun­da­men­tal un­cer­tain­ties, I think of Kripke:

“A skep­ti­cal solu­tion of a philo­soph­i­cal prob­lem be­gins… by con­ced­ing that the skep­tic’s nega­tive as­ser­tions are unan­swer­able. Nev­er­the­less our or­di­nary prac­tice or be­lief is jus­tified be­cause—con­trary ap­pear­ances notwith­stand­ing—it need not re­quire the jus­tifi­ca­tion the scep­tic has shown to be un­ten­able. And much of the value of the scep­ti­cal ar­gu­ment con­sists pre­cisely in the fact that he has shown that an or­di­nary prac­tice, if it is to be defended at all, can­not be defended in a cer­tain way.”

• An amus­ing n=3 sur­vey of math­e­mat­ics un­der­grads at Trinity Cam­bridge:

1) Re­fused to an­swer. 2) It de­pends on how re­li­able Omega is/​but you cant (shouldn’t) re­ally quan­tify ethics any­way/​this situ­a­tion is un­rea­son­able. 3) Ob­vi­ously 2 box, one box­ing is in­sane.

3 said he would pro­gram an AI to one box. And when I pointed out that his brain was built of quarks just like the AI he re­sponded that in that case free will didn’t ex­ist and choice was im­pos­si­ble.

• Mr Eliezer, I think you’ve missed a few points here. How­ever, I’ve prob­a­bly missed more. I apol­o­gise for er­rors in ad­vance.

1. To start with, I spec­u­late than any sys­tem of de­ci­sion mak­ing con­sis­tently gives the wrong re­sults on a spe­cific prob­lem. The whole point of de­ci­sion the­ory is find­ing prin­ci­ples which usu­ally end up with a bet­ter re­sult. As such, you can always for­mu­late a situ­a­tion in which it gives the wrong an­swer: maybe one of the facts you thought you knew was in­cor­rect, and led you astray. (At the very least, Omega may de­cide to re­ward only those who have never heard of a par­tic­u­lar brand of de­ci­sion the­ory.)

It’s like with file com­pres­sion. In bitmaps, there are fre­quently large ar­eas with similar colour. With this fact we can de­sign a sys­tem that writes that tak­ing less space. How­ever, if we then try to com­press a ran­dom bitmap, it will take more space than be­fore the com­pres­sion. Same thing with hu­man minds. They work sim­ply and rel­a­tively effi­ciently, but there’s a whole field ded­i­cated to find­ing flaws in its method. If you use causal de­ci­sion the­ory, you sac­ri­fice your abil­ity at games against su­per­hu­man crea­tures that can pre­dict the fu­ture, in re­turn for bet­ter de­ci­sion mak­ing when that isn’t the case. That seems like a rea­son­ably fair trade-off to me. Any the­ory which gets this one right opens it­self to ei­ther get­ting an­other one wrong, or be­ing more com­plex and thus harder for a hu­man to use cor­rectly.

1. The sci­en­tific method and what I know of ra­tio­nal­ity make the ini­tial as­sump­tion that your be­lief does not af­fect how the world works. “If a phe­nomenon feels mys­te­ri­ous, that is a fact about our state of knowl­edge, not a fact about the phe­nomenon it­self.” etc. How­ever, this isn’t some­thing which we can ac­tu­ally know.

Some Chris­ti­ans be­lieve that if you pray over some­one with faith, they will be im­me­di­ately healed. If that is true, ra­tio­nal­ists are at a dis­ad­van­tage, be­cause they aren’t as good at self delu­sion or dou­ble­think as the un­trained. They might never end up find­ing out that truth. I know that re­li­gion is the mind kil­ler too, I’m just us­ing the most com­mon ex­am­ple of the supremely effec­tive stan­dard method be­ing un­able to deal with an idea. It’s nec­es­sar­ily in­com­plete.

1. I don’t agree with you that “rea­son” means “choos­ing what ends up with the most re­ward”. You’re mix­ing up means and end. Ar­gu­ing against a method of de­ci­sion mak­ing be­cause it comes up with the wrong an­swer to a spe­cific case is like com­plain­ing that mp3 com­pres­sion does a lousy job of com­press­ing silence. I don’t think that rea­son can be the only tool used, just one of them

In­ci­den­tally, I would to­tally only take the \$1000 box, and claim that Omega told me I had won im­mor­tal­ity, to con­fuse all de­ci­sion the­o­rists in­volved.

• See chap­ters 1-9 of this doc­u­ment for a more de­tailed treat­ment of the ar­gu­ment.

• This link is 404ing. Any­one have a copy of this?

• The cur­rent ver­sion is here. (It’s Eliezer Yud­kowsky (2010). Time­less De­ci­sion The­ory.)

• Isn’t this the ex­act op­po­site ar­gue­ment from the one that was made in Dust Specks vs 50 Years of Tor­ture?

Cor­rect me if I’m wrong, but the ar­gu­ment in this post seems to be “Don’t cling to a sup­pos­edly-perfect ‘causal de­ci­sion the­ory’ if it would make you lose grace­fully, take the ac­tion that makes you WIN.”

And the ar­gu­ment for prefer­ring 50 Years of Tor­ture over 3^^^3 Dust Specks is that “The moral the­ory is perfect. It must be clung to, even when the re­sult is a ma­jor loss.”

How can both of these be true?

(And yes, I am defin­ing “prefer­ring 50 Years of Tor­ture over 3^^^3 Dust Specks” as an un­miti­gated loss. A moral the­ory that re­turns a re­sult that al­most ev­ery moral per­son al­ive would view as ab­hor­rent has at least one flaw if it could pro­duce such a ma­jor loss.)

• I agree that “ra­tio­nal­ity” should be the thing that makes you win but the New­comb para­dox seems kind of con­trived.

If there is a more pow­er­ful en­tity throw­ing good util­ities at nor­mally dumb de­ci­sions and bad util­ities at nor­mally good de­ci­sions then you can make any dumb thing look ge­nius be­cause you are un­der differ­ent rules than the world we live in at pre­sent.

I would ask Alpha for help and do what he tells me to do. Alpha is an AI that is also never wrong when it comes to pre­dict­ing the fu­ture, just like Omega. Alpha would ex­am­ine omega and me and ex­trap­o­late Omega’s ex­trap­o­lated de­ci­sion. If there is a mil­lion in box B I pick both oth­er­wise just B.

Looks like Omega will be wrong ei­ther way, or will I be wrong? Or will the uni­verse crash?

• Yes, this is re­ally an is­sue of whether your choice causes Omega’s ac­tion or not. The only way for Omega to be a perfect pre­dic­tor is for your choice to ac­tu­ally cause Omega’s ac­tion. (For ex­am­ple, Omega ‘sees the fu­ture’ and acts based on your choice). If your choice causes Omega’s ac­tion, then choos­ing B is the ra­tio­nal de­ci­sion, as it causes the box to have the mil­lion.

If your choice does not cause Omega’s ac­tion, then choos­ing both boxes is the win­ning ap­proach. in this case, Omega is merely giv­ing big awards to some peo­ple and small awards to oth­ers.

If your choice has some %age chance of caus­ing Omega’s ac­tion, then the prob­lem be­comes one of risk man­age­ment. What is your chance of get­ting the big award if you choose B com­pared with the util­ity of the two chocies.

I agree with what Tom posted. The only para­dox here is that the prob­lem both states that your choice causes Omega’s ac­tion (be­cause it sup­pos­edly pre­dicts perfectly), and also says that your ac­tion does not cause Omega’s ac­tion (be­cause the de­ci­sion is already made). Thus, wether or not you think box B, or both boxes is the cor­rect choice, de­pends on which of these two con­tra­dic­tory state­ments you end up be­liev­ing.

• Let me restate: Two boxes ap­pear. If you touch box A, the con­tents of box B are va­por­ized. If you at­tempt to open box B, box A and it’s con­tents are va­por­ized. Con­tents as pre­vi­ously speci­fied. We could prob­a­bly build these now.

Ex­per­i­men­tally, how do we dis­t­in­guish this from the de­scrip­tion in the main thread? Why are we tak­ing Omega se­ri­ously when if the dis­cus­sion dealt with the num­ber of an­gels danc­ing on the head of pin the de­ri­sion would be pal­pable? The ex­per­i­men­tal data point to tak­ing box B. Even if Omega is ob­served de­liv­er­ing the boxes, and mak­ing the speci­fied claims re­gard­ing their con­tents, why are these claims taken on faith as be­ing an ac­cu­rate de­scrip­tion of the prob­lem?

• The en­tire is­sue of ca­sual ver­sus in­fer­en­tial de­ci­sion the­ory, and of the seem­ingly mag­i­cal pow­ers of the chooser in the New­comb prob­lem, are se­ri­ous dis­trac­tions here, as Eliezer has the same is­sue in an or­di­nary com­mit­ment situ­a­tion, e.g., pun­ish­ment. I sug­gest start­ing this con­ver­sa­tion over from such an or­di­nary sim­ple ex­am­ple.

• What we have here is an in­abil­ity to rec­og­nize that causal­ity no longer flows only from ‘past’ to ‘fu­ture’.

If we’re given a box that could con­tain \$1,000 or noth­ing, we calcu­late the ex­pected value of the su­per­po­si­tion of these two pos­si­bil­ities. We don’t ac­tu­ally ex­pect that there’s a su­per­po­si­tion within the box—we sim­ply adopt a tech­nique to help com­pen­sate for what we do not know. From our ig­no­rant per­spec­tive, ei­ther case could be real, al­though in ac­tu­al­ity ei­ther the box has the money or it does not.

This is similar. The amount of money in the box de­pends on what choice we make. The fact that the place­ment of money into the box hap­pened in the past is ir­rele­vant, be­cause we’ve already pre­sumed that the rele­vant cause-and-effect re­la­tion­ship works back­wards in time.

Eliezer states that the past is fixed. Well, it may be fixed in some ab­solute sense (al­though that is a com­pli­cated topic), but from our ig­no­rant per­spec­tive we have to con­sider what ap­pears to us to be the pos­si­ble al­ter­na­tives. That means that we must con­sider the money in the boxes to be un­cer­tain. Choos­ing causes Omega to put a par­tic­u­lar amount of money in the box. That this hap­pened in the past is ir­rele­vant, be­cause the causal de­pen­dence points into the past in­stead of the fu­ture.

Even if we ig­nore ac­tual time travel, we must con­sider the amount of money pre­sent to be un­cer­tain un­til we choose, which then de­ter­mines how much is there—in the sense of our tech­nique, from our limited per­spec­tive.

If we ac­cept that Omega is re­ally as ac­cu­rate as it ap­pears to be—which is not a small thing to ac­cept, cer­tainly—and we want to max­i­mize money, then the cor­rect choice is B.

• Eliezer, if a smart crea­ture mod­ifies it­self in or­der to gain strate­gic ad­van­tages from com­mit­ting it­self to fu­ture ac­tions, it must think could bet­ter achieve its goals by do­ing so. If so, why should we be con­cerned, if those goals do not con­flict with our goals?

Well, there’s a num­ber of an­swers I could give to this:

*) After you’ve spent some time work­ing in the frame­work of a de­ci­sion the­ory where dy­namic in­con­sis­ten­cies nat­u­rally Don’t Hap­pen—not be­cause there’s an ex­tra clause for­bid­ding them, but be­cause the sim­ple foun­da­tions just don’t give rise to them—then an in­tertem­po­ral prefer­ence re­ver­sal starts look­ing like just an­other prefer­ence re­ver­sal.

*) I de­vel­oped my de­ci­sion the­ory us­ing math­e­mat­i­cal tech­nol­ogy, like Pearl’s causal graphs, that wasn’t around when causal de­ci­sion the­ory was in­vented. (CDT takes coun­ter­fac­tual dis­tri­bu­tions as fixed givens, but I have to com­pute them from ob­ser­va­tion some­how.) So it’s not sur­pris­ing if I think I can do bet­ter.

*) We’re not talk­ing about a patch­work of self-mod­ifi­ca­tions. An AI can eas­ily gen­er­ally mod­ify its fu­ture self once-and-for-all to do what its past self would have wished on fu­ture prob­lems even if the past self did not ex­plic­itly con­sider them. Why would I bother to con­sider the gen­eral frame­work of clas­si­cal causal de­ci­sion the­ory when I don’t ex­pect the AI to work in­side that gen­eral frame­work for longer than 50 mil­lisec­onds?

*) I did work out what an ini­tially causal-de­ci­sion-the­o­rist AI would mod­ify it­self to, if it booted up on July 11, 2018, and it looks some­thing like this: “Be­have like a non­clas­si­cal-de­ci­sion-the­o­rist if you are con­fronting a New­comblike prob­lem that was de­ter­mined by ‘causally’ in­ter­act­ing with you af­ter July 11, 2018, and oth­er­wise be­have like a clas­si­cal causal de­ci­sion the­o­rist.” Roughly, self-mod­ify­ing ca­pa­bil­ity in a clas­si­cal causal de­ci­sion the­o­rist doesn’t fix the prob­lem that gives rise to the in­tertem­po­ral prefer­ence re­ver­sals, it just makes one tem­po­ral self win out over all the oth­ers.

*) Imag­ine time spread out be­fore you like a 4D crys­tal. Now imag­ine point­ing to one point in that crys­tal, and say­ing, “The ra­tio­nal de­ci­sion given in­for­ma­tion X, and util­ity func­tion Y, is A”, then point­ing to an­other point in the crys­tal and say­ing “The ra­tio­nal de­ci­sion given in­for­ma­tion X, and util­ity func­tion Y, is B”. Of course you have to be care­ful that all con­di­tions re­ally are ex­actly iden­ti­cal—the agent has not learned any­thing over the course of time that changes X, the agent is not self­ish with tem­po­ral deixis which changes Y. But if all these con­di­tions are fulfilled, I don’t see why an in­tertem­po­ral in­con­sis­tency should be any less dis­turb­ing than an in­ter­spa­tial in­con­sis­tency. You can’t have 2 + 2 = 4 in Dal­las and 2 + 2 = 3 in Min­neapo­lis.

*) What hap­pens if I want to use a com­pu­ta­tion dis­tributed over a large enough vol­ume that there are light­speed de­lays and no ob­jec­tive space of si­mul­tane­ity? Do the pieces of the pro­gram start fight­ing each other?

*) Clas­si­cal causal de­ci­sion the­ory is just not op­ti­mized for the pur­pose I need a de­ci­sion the­ory for, any more than a toaster is likely to work well as a lawn­mower. They did not have my de­sign re­quire­ments in mind.

*) I don’t have to put up with dy­namic in­con­sis­ten­cies. Why should I?

• How does the box know? I could open B with the in­tent of open­ing only B or I could open B with the in­tent of then open­ing A. Per­haps Omega has locked the boxes such that they only open when you shout your choice to the sky. That would beat my preferred strat­egy of open­ing B be­fore de­cid­ing which to choose. I open boxes with­out choos­ing to take them all the time.

Are our com­mon no­tions about boxes catch­ing us here? In my ex­pe­rience, open­ing a box rarely makes nearby ob­jects dis­in­te­grate. It is phys­i­cally im­pos­si­ble to “leave \$1000 on the table,” be­cause it will dis­in­te­grate if you do not choose A. I also have no ex­pe­rience with trans-galac­tic su­per-in­tel­li­gences, and its abil­ity to make time-trav­el­ing su­per-boxes is already cov­ered by the dis­cus­sion above. I think of boxes as things that ei­ther are full or are not, in­de­pen­dent of my in­ten­tions, but I also think of them as things that do not dis­in­te­grate based on my in­ten­tions.

Tak­ing both is equiv­a­lent to just tak­ing A. Res­tate the prob­lem that way: take A and get \$1000 or take B and get \$1,000,000. Which would you pre­fer?

I think the prob­lem be­comes more amus­ing if box A does not dis­in­te­grate. They are just two card­board boxes, one of which is open and visi­bly has \$1000 in it. You don’t shout your in­ten­tion to the sky, you just take what­ever boxes you like. The rea­son­able thing to do is open box B; if it is empty, take box A too; if it is full of money, heck, take box A too. They’re boxes, they can’t stop you. But that logic makes you a two-boxer, so if Omega an­ti­ci­pates it, and Omega does, B will be empty. You definitely need to pre-com­mit to tak­ing only B. As­sume you have, and you open B, and B has \$1,000,000. You win! Now what do you do? A is just sit­ting there with \$1000 in it. You already have your mil­lion. You even took it out of the box, in case the box dis­in­te­grates. Do you liter­ally walk away from \$1000, on the be­lief that Omega has some hid­den trick to retroac­tively make B empty? The rule was not that the money would go away if you took both, the rule is that B would be empty. B was not empty. A is still there. You already won for be­ing a one-boxer, does any­thing stop you from be­ing a two-boxer and win­ning the bonus \$1000?

• I think Anony­mous, Un­known and Eliezer have been very helpful so far. Fol­low­ing on from them, here is my take:

There are many ways Omega could be do­ing the pre­dic­tion/​place­ment and it may well mat­ter ex­actly how the prob­lem is set up. For ex­am­ple, you might be de­ter­minis­tic and he is pre­calcu­lat­ing your choice (much like we might be able to do with an in­sect or com­puter pro­gram), or he might be us­ing a quan­tum suicide method, (quan­tum) ran­dom­iz­ing whether the mil­lion goes in and then de­stroy­ing the world iff you pick the wrong op­tion (This will lead to us ob­serv­ing him be­ing cor­rect 100100 times as­sum­ing a many wor­lds in­ter­pre­ta­tion of QM). Or he could have just got lucky with the last 100 peo­ple he tried it on.

If it is the de­ter­minis­tic op­tion, then what do the coun­ter­fac­tu­als about choos­ing the other box even mean? My ap­proach is to say that ‘You could choose X’ means that if you had de­sired to choose X, then you would have. This is a stan­dard way of un­der­stand­ing ‘could’ in a de­ter­minis­tic uni­verse. Then the an­swer de­pends on how we sup­pose the world to be differ­ent to give you coun­ter­fac­tual de­sires. If we do it with a mir­a­cle near the mo­ment of choice (his­tory is the same, but then your de­sires change non-phys­i­cally), then you ought two-box as Omega can’t have pre­dicted this. If we do it with an ear­lier mir­a­cle, or with a change to the ini­tial con­di­tions of the uni­verse (the Tannsjo in­ter­pre­ta­tion of coun­ter­fac­tu­als) then you ought one-box as Omega would have pre­dicted your choice. Thus, if we are un­der­stand­ing Omega as ex­trap­o­lat­ing your de­ter­minis­tic think­ing, then the an­swer will de­pend on how we un­der­stand the coun­ter­fac­tu­als. One-box­ers and Two-box­ers would be peo­ple who in­ter­pret the nat­u­ral coun­ter­fac­tual in the ex­am­ple in differ­ent (and equally valid) ways.

If we un­der­stand it as Omega us­ing a quan­tum suicide method, then the ob­jec­tively right choice de­pends on his ini­tial prob­a­bil­ities of putting the mil­lion in the box. If he does it with a 50% chance, then take just one box. There is a 50% chance the world will end ei­ther choice, but this way, in the case where it doesn’t, you will have a mil­lion rather than a thou­sand. If, how­ever, he uses a 99% chance of putting noth­ing in the box, then one-box­ing has a 99% chance of de­stroy­ing the world which dom­i­nates the value of the ex­tra money, so in­stead two-box, take the thou­sand and live.

If he just got lucky a hun­dred times, then you are best off two-box­ing.

If he time trav­els, then it de­pends on the na­ture of time-travel...

Thus the an­swer de­pends on key de­tails not told to us at the out­set. Some peo­ple ac­cuse all philo­soph­i­cal ex­am­ples (like the trol­ley prob­lems) of not giv­ing enough in­for­ma­tion, but in those cases it is fairly ob­vi­ous how we are ex­pected to fill in the de­tails. This is not true here. I don’t think the New­comb prob­lem has a sin­gle cor­rect an­swer. The value of it is to show us the differ­ent pos­si­bil­ities that could lead to the situ­a­tion as speci­fied and to see how they give differ­ent an­swers, hope­fully illu­mi­nat­ing the topic of free-will, coun­ter­fac­tu­als and pre­dic­tion.

• The para­dox is de­signed to give your de­ci­sion the prac­ti­cal effect of caus­ing Box B to con­tain the money or not, with­out ac­tu­ally la­bel­ing this effect “cau­sa­tion.” But I think that if Box B acts as though its con­tents are caused by your choice, then you should treat it as though they were. So I don’t think the puz­zle is re­ally some­thing deep; rather, it is a word game about what it means to cause some­thing.

Per­haps it would be use­ful to think about how Omega might be do­ing its pre­dic­tion. For ex­am­ple, it might have the abil­ity to travel into the fu­ture and ob­serve your ac­tion be­fore it hap­pens. In this case what you do is di­rectly af­fect­ing what the box con­tains, and the prob­lem’s state­ment that what­ever you choose won’t af­fect the con­tents of the box is just wrong.

Or maybe it has a copy of the en­tire state of your brain, and can simu­late you in a soft­ware sand­box in­side its own mind long enough to see what you will do. In this case it makes sense to think of the box as not be­ing empty or full un­til you’ve made your choice, if you are the copy in the sand­box. If you aren’t the copy in the sand­box then you’d be bet­ter off choos­ing both boxes, but the way the prob­lem’s set up you can’t tell this. You can still try to max­i­mize fu­ture wealth. My ar­ith­metic says that choos­ing Box B is the best strat­egy in this case. (Mixed strate­gies, where you hope that the sand­box ver­sion of your­self will ran­domly choose Box B alone and the out­side one will choose both, are dom­i­nated by choos­ing Box B. Also I as­sume that if you are in the sand­box, you want to max­i­mize the wealth of the out­side agent. I think this is rea­son­able be­cause it seems like there is noth­ing else to care about, but per­haps some­one will dis­agree.)

You could in­ter­pret Omega differ­ently than in these sto­ries, al­though I think my first point above that you should think of your choice as caus­ing Omega to put money in the box, or not, is rea­son­able. I would say that the fact that Omega put the money in the box chronolog­i­cally be­fore you make the de­ci­sion is ir­rele­vant. I think un­cer­tainty about an event that has already hap­pened, but that hasn’t been re­vealed to you, is ba­si­cally the same thing as un­cer­tainty about some­thing that hasn’t hap­pened yet, and it should be mod­eled the same way.

• Paul, it sounds like you didn’t un­der­stand. A chess play­ing com­puter pro­gram is com­pletely de­ter­minis­tic, and yet it has to con­sider al­ter­na­tives in or­der to make its move. So also we could be de­ter­minis­tic and we would still have to con­sider all the pos­si­bil­ities and their benefits be­fore mak­ing a move.

So it makes sense to ask whether you would jump off a cliff if you found out that the fu­ture is de­ter­mined. You would find out that the fu­ture is de­ter­mined with­out know­ing ex­actly which fu­ture is de­ter­mined, just like the chess pro­gram, and so you would have to con­sider the benefits of var­i­ous pos­si­bil­ities, de­spite the fact that there is only one pos­si­bil­ity, just like there is re­ally only one pos­si­bil­ity for the chess pro­gram.

So when you con­sid­ered the var­i­ous “pos­si­bil­ities”, would “jump­ing off a cliff” eval­u­ate as equal to “go­ing on with life”, or would the lat­ter eval­u­late as bet­ter? I sus­pect you would go on with life, just like a chess pro­gram moves its queen to avoid be­ing taken by a pawn, de­spite the fact that it was to­tally de­ter­mined to do this.

• I prac­tice his­tor­i­cal Euro­pean swords­man­ship, and those Musashi quotes have a cer­tain res­o­nance to me*. Here is an­other (mod­ern) say­ing com­mon in my group:

If it’s stupid, but it works, then it ain’t stupid.

• you pre­vi­ously asked why you couldn’t find similar quotes from Euro­pean sources—I be­lieve this is mainly a lan­guage bar­rier: The English were not nearly the swords­men that the French, Ital­i­ans, Span­ish, and Ger­mans were (though they were pretty mean with their fists). You should be able to find many quotes in those other lan­guages.

• If I know that the situ­a­tion has re­solved it­self in a man­ner con­sis­tent with the hy­poth­e­sis that Omega has suc­cess­fully pre­dicted peo­ple’s ac­tions many times over, I have a high ex­pec­ta­tion that it will do so again.

In that case, what I will find in the boxes is not in­de­pen­dent of my choice, but de­pen­dent on it. By choos­ing to take two boxes, I cause there to be only \$1,000 there. By choos­ing to take only one, I cause there to be \$1,000,000. I can cre­ate ei­ther con­di­tion by choos­ing one way or an­other. If I can se­lect be­tween the pos­si­bil­ities, I pre­fer the one with the mil­lion dol­lars.

Since in­duc­tion ap­plied to the known facts sug­gests that I can effec­tively de­ter­mine the out­come by mak­ing a de­ci­sion, I will se­lect the out­come that I pre­fer, and choose to take only box B.

Why ex­actly is that ir­ra­tional, again?

• Pre­dic­tion <-> our choice, if we use the 100100 record as equiv­a­lent with com­plete pre­dic­tive ac­cu­racy.

The “weird thing go­ing on here” is that one value is set (that’s what “he has already flown away” does), yet we are be­ing told that we can change the other value. You see these re­ac­tions:

1) No, we can’t tog­gle the other value, ac­tu­ally. Choice is not re­ally in the premise, or is break­ing the premise.

2) We can tog­gle the choice value, and it will set the pre­dic­tive value ac­cord­ingly. The prior value of the pre­dic­tion does not ex­ist or is not rele­vant.

We have already equated “B wins” with “pre­dic­tion value = B” wlog. If we fur­ther­more have equated “choice value = B” with “pre­dic­tion value = B” wlog, we have two per­mis­si­ble ar­rays of val­ues: all A, or all B. Now our knowl­edge is re­stricted to choice value. We can choose A or B. Since the “hid­den” val­ues are known to be iden­ti­cal to the visi­ble value, we should pick the visi­ble value in ac­cor­dance with what we want for a given other value.

Other thoughts:

-Lo­cally, it ap­pears that you can­not “miss out” be­cause within a value set, your choice value is the only pos­si­ble one in iden­tity with the other val­ues.

-This is a strange prob­lem, be­cause gen­er­ally para­dox pro­vokes these kinds of re­sponses. In this case, how­ever, fix­ing a value does not cause a con­tra­dic­tion both ways. If you ac­cept the premise and my premises above, there should be no threat of com­pli­ca­tions from Omega or any­thing else.

-if 1 and 2 re­ally are the only re­ac­tions, and 2 ->onebox, any twobox­ers must be­lieve 1. But this is ab­surd. So whence the twobox­ers?

• @: Hal Fin­ney:

Cer­tainly the box is ei­ther full or empty. But the only way to get the money in the hid­den box is to pre­com­mit to tak­ing only that one box. Not pre­tend to pre­com­mit, re­ally pre­com­mit. If you try to take the \$1,000, well then I guess you re­ally hadn’t pre­com­mit­ted af­ter all. I might vas­cillate, I might even be un­able to make such a rigid pre­com­mit­ment with my­self (though I sus­pect I am), but it seems hard to ar­gue that tak­ing only one box is not the cor­rect choice.

I’m not en­tirely cer­tain that act­ing ra­tio­nally in this situ­a­tion doesn’t re­quire an el­e­ment of dou­ble­think, but thats a topic for an­other post.

• It’s a great puz­zle. I guess this thread will de­gen­er­ate into ar­gu­ments pro and con. I used to think I’d take one box, but I read Joyce’s book and that changed my mind.

For the take-one-box­ers:

Do you be­lieve, as you sit there with the two boxes in front of you, that their con­tents are fixed? That there is a “fact of the mat­ter” as to whether box B is empty or not? Or is box B in a sort of in­ter­me­di­ate state, halfway be­tween empty and full? If so, do you gen­er­ally con­sider that things mo­men­tar­ily out of sight may liter­ally change their phys­i­cal states into some­thing in­de­ter­mi­nate?

If you re­ject that kind of in­de­ter­mi­nacy, what do you imag­ine hap­pen­ing, if you vac­illate and con­sider tak­ing both boxes? Do you pic­ture box B liter­ally be­com­ing empty and full as you change your opinion back and forth?

If not, if you think box B is definitely ei­ther full or empty and there is no un­usual phys­i­cal state de­scribing the con­tents of that box, then would you agree that noth­ing you do now can change the con­tents of the box? And if so, then tak­ing the ad­di­tional box can­not re­duce what you get in box B.

• Na-na-na-na-na-na, I am so sorry you only got \$1000!

Me, I’m gonna re­place my mac­book pro, buy an apart­ment and a car and take a two week va­ca­tion in the Ba­hamas, and put the rest in sav­ings!

## Suckah!

Point: ar­gu­ments don’t mat­ter, win­ning does.

• Oops. I had replied to this un­til I saw its par­ent was nearly 3 years old. So as I don’t (quite) waste the typ­ing:

Do you be­lieve, as you sit there with the two boxes in front of you, that their con­tents are fixed?

Yes.

That there is a “fact of the mat­ter” as to whether box B is empty or not?

Yes.

Or is box B in a sort of in­ter­me­di­ate state, halfway be­tween empty and full?

No.

If so, do you gen­er­ally con­sider that things mo­men­tar­ily out of sight may liter­ally change their phys­i­cal states into some­thing in­de­ter­mi­nate?

No.

Do you pic­ture box B liter­ally be­com­ing empty and full as you change your opinion back and forth?

If not, if you think box B is definitely ei­ther full or empty and there is no un­usual phys­i­cal state de­scribing the con­tents of that box, then would you agree that noth­ing you do now can change the con­tents of the box?

Yes.

And if so, then tak­ing the ad­di­tional box can­not re­duce what you get in box B.

No, it can’t. (But it already did.)

If I take both boxes how much money do I get? \$1,000

If I take one box how much money do I get? \$10,000,000 (or what­ever it was in­stan­ti­ated to.)

It seems that my ques­tions were more use­ful than yours. Per­haps Joyce beffu­dled you? It could be that he missed some­thing. (Apart from counter-fac­tual \$9,999,000.)

I re­sponded to all your ques­tions with the an­swers you in­tended to make the point that I don’t be­lieve those re­sponses are at all in­com­pat­i­ble with mak­ing the de­ci­sion that earns you lots and lots of money.

• Do you be­lieve, as you sit there with the two boxes in front of you, that their con­tents are fixed?

Yes.

That there is a “fact of the mat­ter” as to whether box B is empty or not?

Yes.

Or is box B in a sort of in­ter­me­di­ate state, halfway be­tween empty and full?

No.

If so, do you gen­er­ally con­sider that things mo­men­tar­ily out of sight may liter­ally change their phys­i­cal states into some­thing in­de­ter­mi­nate?

No.

Do you pic­ture box B liter­ally be­com­ing empty and full as you change your opinion back and forth?

If not, if you think box B is definitely ei­ther full or empty and there is no un­usual phys­i­cal state de­scribing the con­tents of that box, then would you agree that noth­ing you do now can change the con­tents of the box?

Yes.

And if so, then tak­ing the ad­di­tional box can­not re­duce what you get in box B.

No, it can’t. (But it already did.)

If I take both boxes how much money do I get? \$1,000

If I take one box how much money do I get? \$10,000,000 (or what­ever it was in­stan­ti­ated to.)

It seems that my ques­tions were more use­ful than yours. Per­haps Joyce beffu­dled you? It could be that he missed some­thing. (Apart from counter-fac­tual \$9,999,000.)

I re­sponded to all your ques­tions with the an­swers you in­tended to make the point that I don’t be­lieve those re­sponses are at all in­com­pat­i­ble with mak­ing the de­ci­sion that earns you lots and lots of money.

• I would play lotto: if I win more than 10M\$, I take the black box and leave. Other­wise I’d look in the black box: if it is full, I also take the small one. If not, I leave with just the empty black box. As this should be in­con­sis­tent, as­sum­ing a time trav­el­ing Omega, it would ei­ther make him not choose me for his ex­per­i­ment or let me win for sure (as­sum­ing time works in similar ways as in HPMOR). If I get noth­ing, it would prove the Omega wrong (and tell me quite a bit about how the Omega (and time) works). If his pre­dic­tion was cor­rect though, I win 11.000.000\$, which is way bet­ter than ei­ther ‘stan­dard’ var­i­ant.

• While that sounds clever at first glance:

• We’re not ac­tu­ally as­sum­ing a time-trav­el­ing Omega.

• Even if we were, he would just not choose you for the game. You’d get \$0, which is worse than causal de­ci­sion the­ory.

• It feels like de­ci­sion the­ory is sub­ject to the halt­ing prob­lem. Sketch­ing some rough thoughts.

Con­sider your par­tic­u­lar de­ci­sion the­ory as a black box func­tion or set of rules F which take the de­scrip­tion of a situ­a­tion P and out­puts yes or no and one of those an­swers wins, the other loses.

F(P)

You want a de­ci­sion the­ory, some set of rules to fol­low F which wins in all situ­a­tion.

But for all F it’s pos­si­ble to con­struct a situ­a­tion P “The win­ning situ­a­tion is !F(P)”, feed­ing F into it­self. (or a sim­plified equiv­a­lent)

No mat­ter what set of rules you in­clude in your de­ci­sion the­ory it can­not win in all cases. Ever.

• The win­ning situ­a­tion is !F(P)

That doesn’t have any­thing to do with the halt­ing prob­lem, it looks like a close rel­a­tive of the Bar­ber para­dox.

• It has some­thing to do with the halt­ing prob­lem. The usual way of demon­strat­ing that no pro­gram can solve the halt­ing prob­lem is to sup­pose you’ve got one that does and use it to carry out a con­struc­tion a bit like the one Hun­gryHobo is ges­tur­ing to­wards, where F ar­ranges to halt iff the halt­ing-tester says it doesn’t.

• It’s the same pat­tern as the sim­ple proof of the halt­ing prob­lem. Feed­ing your pro­gram into it­self as part of the pa­ram­e­ters re­plac­ing an in­finite loop with “lose” and halt with “win”.

The bar­ber para­dox is just a sim­ple, “sets of all sets which do not con­tain them­selves” thing which has noth­ing to do with what I wrote.

My point was that your set of rules are equiv­a­lent to a pro­gram which you fol­low to try to reach the “win­ning” out­come hence it’s pretty easy to see that no mat­ter what rules you chose for your ver­sion of de­ci­sion the­ory it’s sim­ple to con­struct a sce­nario where your rules can­not provide the “win­ning” an­swer.

• The bar­ber para­dox is just a sim­ple, “sets of all sets which do not con­tain them­selves” thing which has noth­ing to do with what I wrote.

Hm, maybe not the bar­ber. I was think­ing of how and when you define what is a “win”.

Let’s do a toy ex­am­ple where P is a limited dis­crete set, say { door1, door2, door3 }. If we know what the doors lead to, and we know what a “win” is, we can make the rules be a sim­ple lookup table. It works perfectly fine.

You can break it in two ways. One way is to re­define a “win” (what­ever you pick for door1 we de­clare to be !win). Another is to change the set P.

Say, we add door4 to the set. The lookup table says “I don’t know” and that is, ac­tu­ally, a vi­able an­swer. If you want to dis­al­low that, we have to move into the realm of mod­els and gen­er­al­iza­tions. And in that realm ask­ing that your func­tion F(P) pro­duces the op­ti­mum (“win”) for what­ever P could be is, I think, too much to ask for. It can work for math­e­mat­i­cal ab­strac­tions, but if we are talk­ing about a de­ci­sion the­ory that is ap­pli­ca­ble to the real world, sorry, I don’t think “op­ti­mal all the time, no ex­cep­tions” is a re­al­is­tic goal or crite­rion.

The is­sue is, ba­si­cally, what you al­low to be in set P. If it’s suffi­ciently re­stricted, F(P) can guaran­tee wins, it is is not, it can not.

• I agree with you that “op­ti­mal all the time, no ex­cep­tions” is not a re­al­is­tic goal or crite­rion.

In­deed I be­lieve it’s prov­ably im­pos­si­ble even with­out need­ing to add the fuzzi­ness and con­fu­sion of real life into the mix. Even if we limit our­selves to sim­ple bounded sys­tems.

Which kind of puts a hole in EY’s the­sis that it should be pos­si­ble to have a de­ci­sion the­ory which always wins.

• Eliezer has con­ceded that it is im­pos­si­ble in prin­ci­ple to have a de­ci­sion the­ory which always wins. He says he wants one that will always win ex­cept when an ad­ver­sary is de­liber­ately mak­ing it lose. In other words, he hopes that your sce­nario is suffi­ciently com­pli­cated that it wouldn’t hap­pen in re­al­ity un­less some­one ar­ranges things to cause the de­ci­sion the­ory to lose.

• Even if we limit our­selves to sim­ple bounded sys­tems.

If the “sim­ple bounded sys­tems” are, ba­si­cally, enu­mer­able and the defi­ni­tion of “win” is fixed, F(P) can be a sim­ple lookup table which does always win.

It’s the same thing as say­ing that given a dataset I can always con­struct a model with zero er­ror for mem­bers of this dataset. That does not mean that the model will perform well on out-of-sam­ple data.

I am also not sure to which de­gree EY in­tended this state­ment to be a “hard”, literal claim.

• Ra­tional agents should WIN.

This re­minds me of these great new US Army ads: https://​​youtu.be/​​jz3e2_CyOi8

• Maybe I’m miss­ing some­thing (I’m new to Bayes), but I hon­estly don’t see how any of this is ac­tu­ally a prob­lem. I may just be re­peat­ing Yud­kowsky’s point, but… Omega is a su­per­in­tel­li­gence, who is right in ev­ery known pre­dic­tion. This means, es­sen­tially, that he looks at you and de­cides what you’ll do, and he’s right 100 out of 100 times. So far, a perfect rate. He’s prob­a­bly not go­ing to mess up on you. If you’re not try­ing to look at this with CDT, the an­swer is ob­vi­ous: take box B. Omega knows you’ll do that and you’ll get the mil­lion. It’s not about the re­sult chang­ing af­ter the boxes are put down, it’s about pre­dic­tions about a per­son.

• This should not be taken as an au­thor­i­ta­tive re­sponse. I’m an­swer­ing as much to get my own un­der­stand­ing checked, as to an­swer your ques­tion:

Omega doesn’t ex­ist. How we re­spond to the spe­cific case of Omega set­ting up boxes is pretty ir­rele­vant. The ques­tion we ac­tu­ally care about is what gen­eral prin­ci­ple we can use to de­cide New­comb’s prob­lem, and other de­ci­sion-the­o­ret­i­cally-analo­gous prob­lems. It’s one thing to say that one-box­ing is the cor­rect choice; it is an­other thing to for­mu­late a co­her­ent prin­ci­ple which out­puts that choice in this case, with­out de­ranged be­hav­ior in some other case.

If we’re look­ing at the prob­lem with­out CDT, we want to figure out and for­mal­ize what we are look­ing at the prob­lem with.

• Ahh. Thank you, that ac­tu­ally solved my con­fu­sion. I was think­ing about solv­ing the prob­lem, not how to solve the prob­lem. I shall have to look through my re­sponses to other thought ex­per­i­ments now.

• I sus­pect that this is very sim­ple. Similar to the tree in the for­est prob­lem that Eliezer wrote about, if you ask about con­crete vari­a­tions of this ques­tion, the right choice is ob­vi­ous.

One ques­tion is what to do when the boxes are in front of you.

• If it is the case that you know with 100% cer­tainty that the con­tents of box B will not change, then you should two-box.

• If it is the case that Omega could change the con­tents of the box af­ter he pre­sents them to you, then you should one-box.

• If it is the case that your pre­sent de­ci­sion im­pacts the past, then you should one-box, be­cause by one-box­ing, you’d change your past mind-state, which would change the de­ci­sion of Omega. How­ever, I don’t think that physics works like this. I’m as­sum­ing that there is a point in time where what you thought in the past is what you thought in the past, and that those thoughts are what Omega based his de­ci­sion on, and what you think and de­cide af­ter Omega made his de­ci­sion isn’t in­fluenc­ing your past mind-states, and thus isn’t in­fluenc­ing the de­ci­sion that Omega made. But this is re­ally a ques­tion about physics though, not de­ci­sion the­ory. When you ask the ques­tion with the con­di­tion that physics works a cer­tain way, the de­ci­sion the­ory part is easy.

Another ques­tion is what to do be­fore Omega makes his de­ci­sion.

• It seems plau­si­ble that Omega could read your mind. So then, you should try to make Omega think that you will one-box. If you’re ca­pa­ble of do­ing this and it works, then great! If not, you didn’t lose any­thing by try­ing, and you gave your­self the chance of pos­si­bly suceed­ing.

• If it is the case that you know with 100% cer­tainty that the con­tents of box B will not change, then you should two-box.

That doesn’t fol­low. The con­tents of B don’t change in the sense that some­one look­ing at the box ahead of time with X-Ray vi­sion would see the same thinhg, but the con­tents “change” in the sense that your de­ci­sion is prediucted by Omega so differ­ent choices re­sult in differ­ent box con­tents. It would be a mis­take to think of the con­tents of the boxes as some­thing that can be held con­stant while only your choice varies.

(In fact, if Omega can pre­dict your choice, you re­ally aren’t able to choose at all.)

• I think my third bul­let point ad­dresses your com­ment. You seem to be say­ing that by choos­ing to two-box, your in­fluenc­ing the past in such a way that’ll make Omega one-box. I’m say­ing that there are two pos­si­bil­ities:
1) your choice im­pacts the past
2) your choice doesn’t im­pact the past.

If 1) is true, then you should one-box. If 2) is true, then you should two box. I hon­estly don’t have too strong an opinion re­gard­ing whether 1) or whether 2) is the way the world works. But I think that whether 1) or 2) is true is a ques­tion of physics, rather than a ques­tion of de­ci­sion the­ory.

• You seem to be con­fus­ing the effect with the cause; whether you will choose to one-box or two-box de­pends on your prior state of mind (per­son­al­ity/​knowl­edge of var­i­ous de­ci­sion the­o­ries/​mood/​etc), and it is that prior state of mind which also de­ter­mines where Omega leaves its money.

The choice doesn’t “in­fluence the past” at all; rather, your brain in­fluences both your and Omega’s fu­ture choices.

• Con­sider this se­quence of events: you had your prior mind-state, then Omega made his choice, and then you make your choice. You seem to be say­ing that your choice is already made up from your prior mind-state, and there is no de­ci­sion to be made af­ter Omega pre­sents you with the situ­a­tion. This is a pos­si­bil­ity.

I’m say­ing that an­other pos­si­bil­ity is that you do have a choice at that point. And if you have a choice, there are two sub­se­quent op­tions: this choice you make will im­pact the past, or it won’t. If it does, then you should one-box. But if it doesn’t im­pact the past (and if you in­deed can be mak­ing a choice at this point), then you should two-box.

• Just saw this in the com­ment box, so I don’t know the con­text, but isn’t this based on the con­fused no­tion of “free will” em­ployed by … am­a­teur the­olo­gians mostly, I think?

For ex­am­ple—and please, tell me if I’m bark­ing up the wrong tree en­tirely, it’s quite pos­si­ble—let’s get rid of Omega and re­place him with, say, Han­ni­bal Lec­tor.

He has got­ten to know you quite well, and has spe­cific knowl­edge of how you be­have in situ­a­tions like this af­ter you’ve con­sid­ered the fact that you know he knows you know he knows etc etc.

Is it ra­tio­nal to two-box in this situ­a­tion, be­cause you have “free will” and thus there’s no way he could know what you’re go­ing to do with­out a time ma­chine?

• I very well might be wrong about how re­al­ity works. I’m just say­ing that if it hap­pens to work in the way I de­scribe, the de­ci­sion would be ob­vi­ous. And fur­ther­more, if you spec­ify the way in which re­al­ity works, the de­ci­sion in this situ­a­tion is always ob­vi­ous. The de­bate seems to be more about the way re­al­ity works.

Re­gard­ing the Han­ni­bal Lec­tor situ­aiton you pro­pose, I don’t un­der­stand it well enough to say, but I think I ad­dress all the vari­a­tions of this ques­tion above.

• My point is that hu­mans are em­i­nently non­ran­dom; to the ex­tent that a smart hu­man-level in­tel­li­gence could prob­a­bly fill in for Omega.

I think there’s an ar­ti­cle here some­where about how free will and de­ter­minism are com­pat­i­ble … I’ll look around for it now...

EDIT:

Another ques­tion is what to do be­fore Omega makes his de­ci­sion.

• It seems plau­si­ble that Omega could read your mind. So then, you should try to make Omega think that you will one-box. If you’re ca­pa­ble of do­ing this and it works, then great! If not, you didn’t lose any­thing by try­ing, and you gave your­self the chance of pos­si­bly suceed­ing.

If Omega is smart enough, the only way to make it think you will one-box is by be­ing the sort of agent that one-boxes in this situ­a­tion; re­gard­less of why. So you should one-box be­cause you know that, be­cause that means you’re the sort of agent that one-boxes if they know that. That’s the stan­dard LW po­si­tion, any­way.

(Free will stuff forth­com­ing.)

• I keep say­ing that if you spec­ify the physics/​re­al­ity, the de­ci­sion to make is ob­vi­ous. Peo­ple keep re­ply­ing by ba­si­cally say­ing, “but physics/​re­al­ity works this way, so this is the an­swer”. And then I keep re­ply­ing, “maybe you’re right. I don’t know how it works. all I know is the ar­gu­ment is over physics/​re­al­ity.”

Do you agree with this? If not, where do you dis­agree.

• Their point (which may or may not be based on a mi­s­un­der­stand­ing of what you’re talk­ing about) is that one of your op­tions (“free will”) does not cor­re­spond to a pos­si­ble set of the laws of physics—it’s self-con­tra­dic­tory.

I think this is the rele­vant page. Key quote:

Peo­ple who live in re­duc­tion­ist uni­verses can­not con­cretely en­vi­sion non-re­duc­tion­ist uni­verses. They can pro­nounce the syl­la­bles “non-re­duc­tion­ist” but they can’t imag­ine it.

• And if you are smart enough, you should de­cide what to do by try­ing to pre­dict what Omega would do. Omega’s at­tempt to pre­dict your ac­tions may end up be­com­ing un­de­cide­able if you’re re­ally smart enough that you can pre­dict Omega.

Or to put it an­other way, the stipu­la­tion that Omega can pre­dict your ac­tions limits how smart you can be and what strate­gies you can use.

• Well, I guess that’s true—pre­sum­ably the rea­son the less-in­tu­itive “Omega” is used in the offi­cial ver­sion. Omega is, by defi­ni­tion, smarter than you—re­gard­less of how smart you per­son­ally are.

• This is true, but gen­er­ally the ques­tion “what should you do” means “what is the op­ti­mal thing to do”. It’s odd to have a prob­lem that stipu­lates that you can­not find the op­ti­mal thing to do and asks what is the next most op­ti­mum thing you should do in­stead.

• You seem to be say­ing that your choice is already made up from your prior mind-state, and there is no de­ci­sion to be made af­ter Omega pre­sents you with the situ­a­tion.

Not ex­actly; just be­cause Omega knows what you will do be­fore­hand with 1-ep­silon cer­tainty doesn’t mean you don’t have a choice, just that you will do what you’ll choose to do.

You still make your de­ci­sion, and just like ev­ery other de­ci­sion you’ve ever made in your life it would be based on your goals val­ues in­tu­itions bi­ases emo­tions and mem­o­ries. The only differ­ence is that some­one else has already taken all of those things into ac­count and made a pro­jec­tion be­fore­hand. The de­ci­sion is still real, and you’re still the one who makes it, it’s just that Omega has a faster clock rate and could figure out what that de­ci­sion would likely be be­fore­hand us­ing the same ini­tial con­di­tions and laws of physics.

• I think I agree with your de­scrip­tion of how choice works. Re­gard­ing the de­ci­sion you should make, I can’t think of any­thing to say that I didn’t say be­fore. If the ques­tion speci­fies how re­al­ity/​physics works, the de­ci­sion is ob­vi­ous.

• Is it also your po­si­tion that I have any way of know­ing whether my choice is already made up from my prior mind-state, or not?

• I don’t know whether you’ll have any way of know­ing if your choice was made up already. I wish I knew more physics and had a bet­ter opinion on the way re­al­ity works, but with my un­der­stand­ing, I can’t say.

My ap­proach is to say, “If re­al­ity works this way”, then you should do this. If it works that way, then you should do that.”

Re­gard­ing your ques­tion, I’m not sure that it mat­ters. If ‘yes’, then you don’t have a de­ci­sion to make. If ‘no’, then I think it de­pends on the stuff I talked about in above com­ments.

If your choice is not made up from your prior mind state, then Omega would not be able to pre­dict your ac­tions from it. How­ever, it is a premise of the sce­nario that he can. There­fore your choice is made up from your prior mind state.

• If your choice is not made up from your prior mind state, then Omega would not be able to pre­dict your ac­tions from it.

Not nec­es­sar­ily. We don’t know how Omega makes his pre­dic­tions.

But re­gard­less, I think my fun­da­men­tal point still stands: the de­bate is over physics/​re­al­ity, not de­ci­sion the­ory. If the ques­tion speci­fied how physics/​re­al­ity works, the de­ci­sion the­ory part would be easy.

• In­deed- to make it more clear, con­sider a prior mind state that says “when pre­sented with this, I’ll flip a coin to de­cide (or look at some other ran­dom vari­able).” In this situ­a­tion, Omega can, at best, pre­dict your choice with 5050 odds. Whether Omega is even a co­her­ent idea de­pends a great deal on your model of choices.

• If given prior mind-state S1 and a blue room I choose A, and given S1 and a pink room I choose B, S1 does not de­ter­mine whether I choose A or B, but Omega (know­ing S1 and the color of the room in which I’ll be offered the choice) can pre­dict whether I choose A or B.

• I’m not sure if any­one’s no­ticed this, but how do you know that you’re not a simu­la­tion of your­self in­side Omega? If he is su­per­in­tel­li­gent, he would com­pute your de­ci­sion by simu­lat­ing you, and you and your simu­la­tion will be in­dis­in­guish­able.

This is fairly ob­vi­ously a PD against said simu­la­tion—if you co­op­er­ate in PD, you should one-box.

I per­son­ally am not sure, al­though if I had to de­cide I’d prob­a­bly one-box

• It cer­tainly seems like a sim­ple re­s­olu­tion ex­ists...

As a ra­tio­nal­ist, there should only ever be one choice you make. It should be the ideal choice. If you are a perfectly ra­tio­nal per­son, you will only ever make the ideal choice. You are cer­tainly at least, de­ter­minis­tic. If you can make the ideal choice, so can some­one else. That means, if some­one knows your ex­act situ­a­tion (triv­ial in the New­comb para­dox, as the su­per in­tel­li­gent agent is caus­ing your situ­a­tion) then they can pre­dict ex­actly what you will do, even with­out be­ing perfectly ra­tio­nal them­selves. If you know they are pre­dict­ing you, and will act in a cer­tain way ac­cord­ingly, the ra­tio­nal solu­tion is sim­ply to fol­low through on whichever pre­dic­tion is most prof­itable, as if they could ac­tu­ally see the fu­ture to make such a pre­dic­tion cor­rectly. Since you’re de­ter­minis­tic, that you will do this is pre­dictable, and thus, the pre­dic­tion is self-fulfilling.

• Wel­come to Less Wrong!

As a ra­tio­nal­ist, there should only ever be one choice you make.

Why do you think so?

• As a ra­tio­nal­ist, there should only ever be one choice you make.

I think so too.

if some­one knows your ex­act situ­a­tion (triv­ial in the New­comb para­dox, as the su­per in­tel­li­gent agent is caus­ing your situ­a­tion)

Per­haps we’ve all heard a slightly differ­ent word­ing of the para­dox (or more), but I don’t see what cau­sa­tion has to do with it.

• Per­haps we’ve all heard a slightly differ­ent word­ing of the para­dox (or more), but I don’t see what cau­sa­tion has to do with it.

He knows what your en­vi­ron­men­tal cir­cum­stances are be­cause he put you in them. That is, he ob­vi­ously knows that you are go­ing to be en­coun­ter­ing a New­comblike prob­lem be­cause he just gave it to you. (ie. No deep tech­ni­cal mean­ing, just the ob­vi­ous.)

• Maybe I’m be­ing dense. Omega needs to know more than just that you are go­ing to en­counter the prob­lem, even Omega’s sched­uler and pub­li­cist know that!

Omega knows the ex­act situ­a­tion, in­clud­ing how an iden­ti­cal model of you would act/​has acted, be­cause that is stipu­lated, but it does not fol­low triv­ially from Omega’s caus­ing your situ­a­tion.

• I’m kind of sur­prised at how com­pli­cated ev­ery­one is mak­ing this, be­cause to me the Bayesian an­swer jumped out as soon as I finished read­ing your defi­ni­tion of the prob­lem, even be­fore the first “ar­gu­ment” be­tween one and two box­ers. And it’s about five sen­tences long:

Don’t choose an amount of money. Choose an ex­pected amount of money—the dol­lar value mul­ti­plied by its prob­a­bil­ity. One-box gets you >(1,000,000*.99). Two-box gets you <(1,000*1+1,000,000*.01). One-box has su­pe­rior ex­pected re­turns. Prob­a­bil­ity the­ory doesn’t usu­ally en­counter situ­a­tions in which your de­ci­sion can af­fect the prior prob­a­bil­ities, but it’s no mys­tery what to do when that situ­a­tion arises—the same thing as always, max­i­mize that util­ity func­tion.

Of course, while I can be proud of my­self for spot­ting that right away, I can’t be too proud be­cause I know I was helped a lot by the fact that my mind was in a “think­ing about Eliezer Yud­kowsky” mode already, a mode it’s not nec­es­sar­ily in by de­fault and might not be when I am pre­sented with a dilemma (un­less I make a con­scious effort to put it there, which I guess now I stand a bet­ter chance of do­ing). I was ex­pect­ing for a Bayesian solu­tion to the prob­lem and spot­ted it even though it wasn’t even the point of the ex­am­ple. I’ve seen this prob­lem be­fore, af­ter all, with­out the con­text of be­ing brought up by you, and I cer­tainly didn’t come up with that solu­tion at the time.

• ...if you build an AI that two-boxes on New­comb’s Prob­lem, it will self-mod­ify to one-box on New­comb’s Prob­lem, if the AI con­sid­ers in ad­vance that it might face such a situ­a­tion. Agents with free ac­cess to their own source code have ac­cess to a cheap method of pre­com­mit­ment.

...

But what does an agent with a dis­po­si­tion gen­er­ally-well-suited to New­comblike prob­lems look like? Can this be for­mally speci­fied?

...

Ra­tional agents should WIN.

It seems to me that if all that is true, and you want to build a Friendly AI, then the ra­tio­nal thing to do here is build it and let it solve all prob­lems like these. That way, you win, at least in the time-man­age­ment sense. Well, you might lose if you en­coun­tered Omega be­fore the FAI was up and run­ning, but that seems un­likely. Am I miss­ing some­thing here?

It will also have to pre­com­mit to mere hu­mans who can’t read its source code and can’t pre­dict the fu­ture, so solv­ing the prob­lem in the case where you meet Omega doesn’t solve the prob­lem in gen­eral.

• Causal de­ci­sion the­o­rists don’t self-mod­ify to time­less de­ci­sion the­o­rists. If you get the de­ci­sion the­ory wrong, you can’t rely on it re­pairing it­self.

• You said:

Causal de­ci­sion the­o­rists don’t self-mod­ify to time­less de­ci­sion the­o­rists. If you get the de­ci­sion the­ory wrong, you can’t rely on it re­pairing it­self.

but you also said:

...if you build an AI that two-boxes on New­comb’s Prob­lem, it will self-mod­ify to one-box on New­comb’s Prob­lem, if the AI con­sid­ers in ad­vance that it might face such a situ­a­tion.

I can en­vi­sion sev­eral pos­si­bil­ities:

• Per­haps you changed your mind and presently dis­agree with one of the above two state­ments.

• Per­haps you didn’t mean a causal AI in the sec­ond quote. In that case I have no idea what you meant.

• Per­haps New­comb’s prob­lem is the wrong ex­am­ple, and there’s some other ex­am­ple mo­ti­vat­ing TDT that a self-mod­ify­ing causal agent would deal with in­cor­rectly.

• Per­haps you have a model of causal de­ci­sion the­ory that makes self-mod­ifi­ca­tion im­pos­si­ble in prin­ci­ple. That would make your first state­ment above true, in a use­less sort of way, so I hope you didn’t mean that.

Would you like to clar­ify?

• Causal de­ci­sion the­o­rists self-mod­ify to one-box on New­comb’s Prob­lem with Omegas that looked at their source code af­ter the self-mod­ifi­ca­tion took place; i.e., if the causal de­ci­sion the­o­rist self-mod­ifies at 7am, it will self-mod­ify to one-box with Omegas that looked at the code af­ter 7am and two-box oth­er­wise. This is not only ugly but also has worse im­pli­ca­tions for e.g. meet­ing an alien AI who wants to co­op­er­ate with you, or worse, an alien AI that is try­ing to black­mail you.

Bad de­ci­sion the­o­ries don’t nec­es­sar­ily self-re­pair cor­rectly.

And in gen­eral, ev­ery time you throw up your hands in the air and say, “I don’t know how to solve this prob­lem, nor do I un­der­stand the ex­act struc­ture of the calcu­la­tion my com­puter pro­gram will perform in the course of solv­ing this prob­lem, nor can I state a math­e­mat­i­cally pre­cise meta-ques­tion, but I’m go­ing to rely on the AI solv­ing it for me ’cause it’s sup­posed to be su­per-smart,” you may very pos­si­bly be about to screw up re­ally damned hard. I mean, that’s what Eliezer-1999 thought you could say about “moral­ity”.

• Okay, thanks for con­firm­ing that New­comb’s prob­lem is a rele­vant mo­ti­vat­ing ex­am­ple here.

“I don’t know how to solve this prob­lem, nor do I un­der­stand the ex­act struc­ture of the calcu­la­tion my com­puter pro­gram will perform in the course of solv­ing this prob­lem, nor can I state a math­e­mat­i­cally pre­cise meta-ques­tion, but I’m go­ing to rely on the AI solv­ing it for me ’cause it’s sup­posed to be su­per-smart,”

I’m not say­ing that. I’m say­ing that self-mod­ifi­ca­tion solves the prob­lem, as­sum­ing the CDT agent moves first, and that it seems sim­ple enough that we can check that a not-very-smart AI solves it cor­rectly on toy ex­am­ples. If I get around to at­tempt­ing that, I’ll post to LessWrong.

As­sum­ing the CDT agent moves first seems rea­son­able. I have no clue whether or when Omega is go­ing to show up, so I feel no need to sec­ond-guess the AI about that sched­ule.

(Quot­ing out of or­der)

This is not only ugly...

As you know, we can define a causal de­ci­sion the­ory agent in one line of math. I don’t know a way to do that for TDT. Do you? If TDT could be con­cisely de­scribed, I’d agree that it’s the less ugly al­ter­na­tive.

but also has worse im­pli­ca­tions for e.g. meet­ing an alien AI who wants to co­op­er­ate with you, or worse, an alien AI that is try­ing to black­mail you.

I’m failing to sus­pend dis­be­lief here. Do you have mo­ti­vat­ing ex­am­ples for TDT that seem likely to hap­pen be­fore Kurzweil’s sched­ule for the Sin­gu­lar­ity causes us to ei­ther win or lose the game?

• As you know, we can define a causal de­ci­sion the­ory agent in one line of math.

If you ap­pre­ci­ate sim­plic­ity/​el­e­gance, I sug­gest look­ing into UDT. UDT says that when you’re mak­ing a choice, you’re de­cid­ing the out­put of a par­tic­u­lar com­pu­ta­tion, and the con­se­quences of any given choice are just the log­i­cal con­se­quences of that com­pu­ta­tion hav­ing that out­put.

CDT in con­trast doesn’t an­swer the ques­tion “what am I ac­tu­ally de­cid­ing when I make a de­ci­sion?” nor does it an­swer “what are the con­se­quences of any par­tic­u­lar choice?” even in prin­ci­ple. CDT can only be de­scribed in one line of math be­cause the an­swer to the lat­ter ques­tion has to be pro­vided to it via an ex­ter­nal pa­ram­e­ter.

• Thanks, I’ll have a look at UDT.

CDT can only be de­scribed in one line of math be­cause the an­swer to the lat­ter ques­tion has to be pro­vided to it via an ex­ter­nal pa­ram­e­ter.

I cer­tainly agree there.

• If TDT could be con­cisely de­scribed, I’d agree that it’s the less ugly al­ter­na­tive.

Maybe this one: “Argmax[A in Ac­tions] in SumO in Out­comes*P(this com­pu­ta­tion yields A []-> O|rest of uni­verse)”

From this post.

• but also has worse im­pli­ca­tions for e.g. meet­ing an alien AI who wants to co­op­er­ate with you, or worse, an alien AI that is try­ing to black­mail you.

I’m failing to sus­pend dis­be­lief here. Do you have mo­ti­vat­ing ex­am­ples for TDT that seem likely to hap­pen be­fore Kurzweil’s sched­ule for the Sin­gu­lar­ity causes us to ei­ther win or lose the game?

I’m rea­son­ably sure Eliezer meant im­pli­ca­tions for the would-be friendly AI meet­ing alien AIs. That could hap­pen at any time in the re­main­ing life span of the uni­verse.

• Causal de­ci­sion the­o­rists don’t self-mod­ify to time­less de­ci­sion the­o­rists.

Why not? A causal de­ci­sion the­o­rist can have an ac­cu­rate ab­stract un­der­stand­ing of both TDT and CDT and can calcu­late the ex­pected util­ity of ap­ply­ing ei­ther. If TDT pro­duces a bet­ter ex­pected out­come in gen­eral then it seems like self mod­ify­ing to be­come a TDT agent is the cor­rect de­ci­sion to make. Is there some re­stric­tion or in­junc­tion as­sumed to be in place with re­spect to de­ci­sion al­gorithm im­ple­men­ta­tion?

Think­ing about it for a a few min­utes: It would seem that the CDT agent will re­li­ably up­date away from CDT but that the new al­gorithm will be nei­ther CDT or TDT (and not UDT ei­ther). It will be able to co­op­er­ate with agents when there has been some sort causal en­tan­gle­ment be­tween the mod­ified source code and the other agent but not able to co­op­er­ate with com­plete strangers. The re­sul­tant de­ci­sion al­gorithm is enough of an at­trac­tor that it de­serves a name of its own. Does it have one?

• Doesn’t have a name as far as I know. But I’m not sure it de­serves one; would CDT re­ally be a prob­a­ble out­put any­where be­sides a ver­bal the­ory ad­vo­cated by hu­man philoso­phers in our own Everett branch? Maybe, now that I think about it, but even so, does it mat­ter?

A causal de­ci­sion the­o­rist can have an ac­cu­rate ab­stract un­der­stand­ing of both TDT and CDT and can calcu­late the ex­pected util­ity of ap­ply­ing ei­ther.

But it will calcu­late that ex­pected value us­ing CDT!ex­pec­ta­tion, mean­ing that it won’t see how self-mod­ify­ing to be a time­less de­ci­sion the­o­rist could pos­si­bly af­fect what’s already in the box, etcetera.

• Doesn’t have a name as far as I know. But I’m not sure it de­serves one; would CDT re­ally be a prob­a­ble out­put any­where be­sides a ver­bal the­ory ad­vo­cated by hu­man philoso­phers in our own Everett branch?

Yes, be­cause there are lem­mas you can prove about (some) de­ci­sion the­ory prob­lems which im­ply that CDT and UDT give the same out­put. For ex­am­ple, CDT works if there is ex­ists a to­tal or­der­ing over in­puts given to the strat­egy, com­mon to all ex­e­cu­tion his­to­ries, such that the world pro­gram in­vokes the strat­egy only with in­creas­ing, non-re­peat­ing in­puts on that or­der­ing. There are (rel­a­tively) easy al­gorithms for these cases. CDT in gen­eral is then a mat­ter of ap­ply­ing a the­o­rem when one of its pre­con­di­tions doesn’t hold, which is one of the most com­mon math mis­takes ever.

• Is that re­ally so bad, if it takes the state of the world at the point be­fore it self-mod­ifies as an un­change­able given, and self-mod­ifies to a de­ci­sion the­ory that only con­sid­ers states from that point on as change­able by its de­ci­sion the­ory? For one thing, doesn’t that avoid Roko’s basilisk?

• Is that re­ally so bad, if it takes the state of the world at the point be­fore it self-mod­ifies as an un­change­able given, and self-mod­ifies to a de­ci­sion the­ory that only con­sid­ers states from that point on as change­able by its de­ci­sion the­ory?

If you do that, you’d be vuln­er­a­ble to ex­tor­tion from any other AIs that hap­pen to be cre­ated ear­lier in time and can prove their source code.

• I’m in­clined to think that in most sce­nar­ios the first AGI wins any­way. And leav­ing solv­ing de­ci­sion the­ory to the AGI could mean you get to build it ear­lier.

• I’m in­clined to think that in most sce­nar­ios the first AGI wins any­way.

I was think­ing of meet­ing alien AIs, post-Sin­gu­lar­ity.

And leav­ing solv­ing de­ci­sion the­ory to the AGI could mean you get to build it ear­lier.

Huh? I thought we were sup­posed to be the good guys here? ;-)

But se­ri­ously, “sac­ri­fice safety for speed” is the “defect” op­tion in the game of “let’s build AGI”. I’m not sure how to get the C/​C out­come (or rather C/​C/​C/​...), but it seems too early to start talk­ing about defect­ing already.

Be­sides, CDT is not well defined enough that you can im­ple­ment it even if you wanted to. I think if you were forced to im­ple­ment a “good enough” de­ci­sion the­ory and hope for the best, you’d pick UDT at this point. (UDT is also miss­ing a big chunk from its speci­fi­ca­tions, namely the “math in­tu­ition mod­ule” but I think that prob­lem has to be solved any­way. It’s hard to see how an AGI can get very far with­out be­ing able to deal with log­i­cal/​math­e­mat­i­cal un­cer­tainty.)

• I was think­ing of meet­ing alien AIs, post-Sin­gu­lar­ity.

What pre-sin­gu­lar­ity ac­tions are you wor­ried about them tak­ing?

Huh? I thought we were sup­posed to be the good guys here? ;-)

What I was think­ing was that a CDT-seeded AI might ac­tu­ally be safer pre­cisely be­cause it won’t try to change pre-Sin­gu­lar­ity events, and if it’s first the new de­ci­sion the­ory will be in place in time for any post-Sin­gu­lar­ity events.

Be­sides, CDT is not well defined enough that you can im­ple­ment it even if you wanted to.

That’s sur­pris­ing to me—what should I read in or­der to un­der­stand this point bet­ter? EDIT: strike that, you an­swer that above.

• What pre-sin­gu­lar­ity ac­tions are you wor­ried about them tak­ing?

They could mod­ify them­selves so that if they ever en­counter a CDT-de­scended AI they’ll start a war (even if it means mu­tual de­struc­tion) un­less the CDT-de­scended AI gives them 99% of its re­sources.

• They could mod­ify them­selves so that if they ever en­counter a CDT-de­scended AI they’ll start a war (even if it means mu­tual de­struc­tion) un­less the CDT-de­scended AI gives them 99% of its re­sources.

They could also mod­ify them­selves to make the analo­gous threat if they en­counter a UDT-de­scended AI, or a de­scen­dant of an AI de­signed by TIm Free­man, or a de­scen­dant of an AI de­signed by Wei Dai, or a de­scen­dant of an AI de­signed us­ing ideas men­tioned on LessWrong. I would hope that any of those AI’s would hand over 99% of their re­sources if the ex­tor­tion­ist could prove its source code and prove that war would be worse. I as­sume you’re say­ing that CDT is spe­cial in this re­gard. How is it spe­cial?

(Thanks for the poin­ter to the James Joyce book, I’ll have a look at it.)

• I as­sume you’re say­ing that CDT is spe­cial in this re­gard. How is it spe­cial?

If the alien AI com­putes the ex­pected util­ity of “prov­ably mod­ify my­self to start a war against CDT-AI un­less it gives me 99% of its re­sources”, it’s cer­tain to get a high value, whereas if it com­putes the ex­pected util­ity of “prov­ably mod­ify my­self to start a war against UDT-AI un­less it gives me 99% of its re­sources” it might pos­si­bly get a low value (not sure be­cause UDT isn’t fully speci­fied), be­cause the UDT-AI, when choos­ing what to do when faced with this kind of threat, would take into ac­count the log­i­cal cor­re­la­tion be­tween its de­ci­sion and the alien AI’s pre­dic­tion of its de­ci­sion.

• ...if it com­putes the ex­pected util­ity of “prov­ably mod­ify my­self to start a war against UDT-AI un­less it gives me 99% of its re­sources” it might pos­si­bly get a low value (not sure be­cause UDT isn’t fully speci­fied), be­cause the UDT-AI, when choos­ing what to do when faced with this kind of threat, would take into ac­count the log­i­cal cor­re­la­tion be­tween its de­ci­sion and the alien AI’s pre­dic­tion of its de­ci­sion.

Well, that’s plau­si­ble. I’ll have to work through some UDT ex­am­ples to un­der­stand fully.

What model do you have of how en­tity X can prove to en­tity Y that X is run­ning spe­cific source code?

The proof that I can imag­ine is en­tity Y gives some se­cure hard­ware Z to X, and then X al­lows Z to ob­serve the pro­cess of X self-mod­ify­ing to run the speci­fied source code, and then X gives the se­cure hard­ware back to Y. Both X and Y can ob­serve the cre­ation of Z, so Y can know that it’s se­cure and X can know that it’s a pas­sive ob­server rather than a bomb or some­thing.

This model breaks the sce­nario, since a CDT play­ing the role of Y could self-mod­ify any time be­fore it hands over Z and play the game com­pe­tently.

Now, if there’s some way for X to cre­ate proofs of X’s source code that will be con­vinc­ing to Y with­out giv­ing ad­vance no­tice to Y, I can imag­ine a prob­lem for Y here. Does any­one know how to do that?

(I ac­knowl­edge that if no­body knows how to do that, that means we don’t know how to do that, not that it can’t be done.)

Hmm, this ex­plains my aver­sion to know­ing the de­tails of what other peo­ple are think­ing. It can put me at a dis­ad­van­tage in ne­go­ti­a­tions un­less I am able to lie con­vinc­ingly and say I do not know.

• I think I″ll stop here for now, be­cause you already seem in­trigued enough to want to learn about UDT in de­tail. I’m guess­ing that once you do, you won’t be so mo­ti­vated to think up rea­sons why CDT isn’t re­ally so bad. :) Let me know if that turns out not to be the case though.

• What model do you have of how en­tity X can prove to en­tity Y that X is run­ning spe­cific source code?

On sec­ond thought, I should an­swer this ques­tion be­cause it’s of in­de­pen­dent in­ter­est. If Y is suffi­ciently pow­er­ful, it may be able to de­duce the laws of physics and the ini­tial con­di­tions of the uni­verse, and then ob­tain X’s source code by simu­lat­ing the uni­verse up to when X is cre­ated. Note that Y may do this not be­cause it wants to know X’s source code in some an­thro­po­mor­phic sense, but sim­ply due to how its de­ci­sion-mak­ing al­gorithm works.

• If Y is suffi­ciently pow­er­ful, it may be able to de­duce the laws of physics and the ini­tial con­di­tions of the uni­verse, and then ob­tain X’s source code by simu­lat­ing the uni­verse up to when X is cre­ated.

Un­less there have been some spe­cific as­sump­tions made about the uni­verse that will not work. Si­mu­lat­ing the en­tire uni­verse does not tell Y which part of the uni­verse it in­hab­its. It will give Y a set of pos­si­ble parts of the uni­verse which match Y’s ob­ser­va­tions. While the simu­la­tion strat­egy will al­low the best pos­si­ble pre­dic­tion about what X’s source code is given what Y already knows it does not give ev­i­dence to Y that it didn’t already have.

• You’re right, the model as­sumes that we live in a uni­verse such that su­per­in­tel­li­gent AIs would “nat­u­rally” have enough ev­i­dence to in­fer the source code of other AIs. (That seems quite plau­si­ble, al­though by no means cer­tain, to me.) Also, since this is a thread about the rel­a­tive mer­its of CDT, I should point out that there are some games in which CDT seems to win rel­a­tive to TDT or UDT, which is a puz­zle that is still open.

• Also, since this is a thread about the rel­a­tive mer­its of CDT, I should point out that there are some games in which CDT seems to win rel­a­tive to TDT or UDT, which is a puz­zle that is still open.

It’s an in­ter­est­ing prob­lem, but my im­pres­sion when read­ing was some­what similar to that of Eliezer in the replies. At the core it is the ques­tion of “How do you deal with con­structs made by other agents?” I don’t think TDT has any par­tic­u­lar weak­ness there.

• If Y is suffi­ciently pow­er­ful, it may be able to de­duce the laws of physics and the ini­tial con­di­tions of the uni­verse, and then ob­tain X’s source code by simu­lat­ing the uni­verse up to when X is cre­ated.

Quan­tum me­chan­ics seems to be pretty clear that true ran­dom num­ber gen­er­a­tors are available, and prob­a­bly hap­pen nat­u­rally. I don’t un­der­stand why you con­sider that sce­nario prob­a­ble enough to be worth talk­ing about.

• It’s hard to see how an AGI can get very far with­out be­ing able to deal with log­i­cal/​math­e­mat­i­cal un­cer­tainty.

Do you have an in­tu­ition as to how it would do this with­out con­tra­dict­ing it­self? I tried to ask a similar ques­tion but got it wrong in the first draft and afaict did not re­ceive an an­swer to the rele­vant part.

I just want to know if my own in­tu­ition fails in the ob­vi­ous way.

• Be­sides, CDT is not well defined enough that you can im­ple­ment it even if you wanted to. I think if you were forced to im­ple­ment a “good enough” de­ci­sion the­ory and hope for the best, you’d pick UDT at this point.

Really? That’s sur­pris­ing. My as­sump­tion had been that CDT would be much sim­pler to im­ple­ment—but just give un­de­sir­able out­comes in whole classes of cir­cum­stance.

• CDT uses a “causal prob­a­bil­ity func­tion” to eval­u­ate the ex­pected util­ities of var­i­ous choices, where this causal prob­a­bil­ity func­tion is differ­ent from the epistemic prob­a­bil­ity func­tion you use to up­date be­liefs. (In EDT they are one and the same.) There is no agree­ment amongst CDT the­o­rists how to for­mu­late this func­tion, and I’m not aware of any spe­cific pro­posal that can be straight­for­wardly im­ple­mented. For more de­tails see James Joyce’s The foun­da­tions of causal de­ci­sion the­ory.

• There is no agree­ment amongst CDT the­o­rists how to for­mu­late this func­tion, and I’m not aware of any spe­cific pro­posal that can be straight­for­wardly im­ple­mented.

I un­der­stand AIXI rea­son­ably well and had as­sumed it was a spe­cific im­ple­men­ta­tion of CDT, per­haps with some tweaks so the re­ward val­ues are gen­er­ated in­ter­nally in­stead of be­ing ob­served in the en­vi­ron­ment. Per­haps AIXI isn’t close to an im­ple­men­ta­tion of CDT, per­haps it’s per­ceived as not spe­cific or straight­for­ward enough, or per­haps it’s not counted as an im­ple­men­ta­tion. Why isn’t AIXI a coun­terex­am­ple?

• Why isn’t AIXI a coun­terex­am­ple?

You may be right that AIXI can be thought of as an in­stance of CDT. Hut­ter him­self cites “se­quen­tial de­ci­sion the­ory” from a 1957 pa­per which cer­tainly pre­dates CDT, but CDT is gen­eral enough that SDT could prob­a­bly fit into its for­mal­ism. (Like EDT can be con­sid­ered an in­stance of CDT with the causal prob­a­bil­ity func­tion set to be the same as the epistemic prob­a­bil­ity func­tion.) I guess I hadn’t con­sid­ered AIXI as a se­ri­ous can­di­date due to its other ma­jor prob­lems.

• http://​​www.al­ife.co.uk/​​es­says/​​on_aixi/​​

Four prob­lems are listed there.

The first one is the claim that AIXI wouldn’t have a proper un­der­stand­ing of its body be­cause its thoughts are defined math­e­mat­i­cally. This is just wrong, IMO; my re­fu­ta­tion, for a ma­chine that’s similar enough to AIXI for this is­sue to work the same, is here. No­body has en­gaged me in se­ri­ous con­ver­sa­tion about that, so I don’t know how well it will stand up. (If I’m right on this, then I’ve seen Eliezer, Tim Tyler, and you make the same er­ror. What other false con­sen­suses do we have?)

The sec­ond one is fixed if we do the tweak I men­tioned in the grand­par­ent of this com­ment.

If you take the fix de­scribed above for the sec­ond one, what’s left of the third one is the claim that in­stan­ta­neous hu­man (or AI) ex­pe­rience is too nu­anced to fit in a sin­gle cell of a Tur­ing ma­chine. Ac­cord­ing to the origi­nal pa­per, page 8, the sym­bols on the re­ward tape are drawn from an alpha­bet R of ar­bi­trary but fixed size. All you need is a very large alpha­bet and this one goes away.

I agree with the facts as­serted in Tyler’s fourth prob­lem, but I do not agree that it is a prob­lem. He’s say­ing that Kol­mogorov com­plex­ity is ill-defined be­cause the pro­gram­ming lan­guage used is un­defined. I agree that ra­tio­nal agents might dis­agree on pri­ors be­cause they’re us­ing differ­ent pro­gram­ming lan­guages to rep­re­sent their ex­pla­na­tions. In gen­eral, a prob­lem may have mul­ti­ple solu­tions. Prac­ti­cal solu­tions to the prob­lems we’re faced with will re­quire mak­ing in­defen­si­ble ar­bi­trary choices of one po­ten­tial solu­tion over an­other. Pick­ing the pro­gram­ming lan­guage for pri­ors is go­ing to be one of those choices.

• The first one is the claim that AIXI wouldn’t have a proper un­der­stand­ing of its body be­cause its thoughts are defined math­e­mat­i­cally. This is just wrong, IMO; my re­fu­ta­tion, for a ma­chine that’s similar enough to AIXI for this is­sue to work the same, is here.

I don’t see how your re­fu­ta­tion ap­plies to AIXI. Let me just try to ex­plain in de­tail why I think AIXI will not prop­erly pro­tect its body. Con­sider an AIXI that arises in a sim­ple uni­verse, i.e., one com­puted by a short pro­gram P. AIXI has a prob­a­bil­ity dis­tri­bu­tion not over uni­verses, but in­stead over en­vi­ron­ments where an en­vi­ron­ment is a TM whose out­put tape is AIXI’s in­put tape and whose in­put tape is AIXI’s out­put tape. What’s the sim­plest en­vi­ron­ment that fits AIXI’s past in­puts/​out­puts? Pre­sum­ably it’s E = P plus some ad­di­tional pieces of code that in­jects E’s in­puts into where AIXI’s phys­i­cal out­put ports are lo­cated in the uni­verse (that is, over­rides the uni­verse’s nat­u­ral evolu­tion us­ing E’s in­puts), and ex­tracts E’s out­puts from where AIXI’s phys­i­cal in­put ports are lo­cated.

What hap­pens when AIXI con­sid­ers an ac­tion that de­stroys its phys­i­cal body in the uni­verse com­puted by P? As long as the in­put/​out­put ports are not also de­stroyed, AIXI would ex­pect that the en­vi­ron­ment E (with its “su­per­nat­u­ral” in­jec­tion/​ex­trac­tion code) will con­tinue to re­ceive its out­puts and provide it with in­puts.

Does that make sense?

• (Re­spond­ing out of or­der)

Does that make sense?

Yes, but it makes some un­rea­son­able as­sump­tions.

Con­sider an AIXI that arises in a sim­ple uni­verse, i.e., one com­puted by a short pro­gram P.

An im­ple­men­ta­tion of AIXI would be fairly com­plex. If P is too sim­ple, then AIXI could not re­ally have a body in the uni­verse, so it would be cor­rect in guess­ing that some ir­reg­u­lar­ity in the laws of physics was caus­ing its be­hav­iors to be spliced into the be­hav­ior of the world.

How­ever, if AIXI has ob­served enough of the in­ner work­ings of other similar ma­chines, or enough of the laws of physics in gen­eral, or enough of its own in­ner work­ings, the sim­plest model will be that AIXI’s out­puts re­ally do emerge from the laws of physics in the real uni­verse, since we are as­sum­ing that that is in­deed the case and that Kol­mogorov in­duc­tion even­tu­ally works. At that point, imag­in­ing that AIXI’s be­hav­iors are a con­se­quence of a bunch of ex­cep­tions to the laws of physics is just ex­tra com­plex­ity and won’t be part of the sim­plest hy­poth­e­sis. It will be part of some less likely hy­pothe­ses, and the AI would have to take that risk into ac­count when de­cid­ing whether to self-im­prove.

• Tim, I think you’re prob­a­bly not get­ting my point about the dis­tinc­tion be­tween our con­cept of a com­putable uni­verse, and AIXI’s for­mal con­cept of a com­putable en­vi­ron­ment. AIXI re­quires that the en­vi­ron­ment be a TM whose in­puts match AIXI’s past out­puts and whose out­puts match AIXI’s past in­puts. A can­di­date en­vi­ron­ment must have the ad­di­tional code to in­ject/​ex­tract those in­puts/​out­puts and place them on the in­put/​out­put tapes, or AIXI will ex­clude it from its ex­pected util­ity calcu­la­tions.

• The can­di­date en­vi­ron­ment must have the ad­di­tional code to in­ject/​ex­tract those in­puts/​out­puts and place them on the in­put/​out­put tapes, or AIXI will ex­clude it from its ex­pected util­ity calcu­la­tions.

I agree that the can­di­date en­vi­ron­ment will need to have code to han­dle the in­puts. How­ever, if the can­di­date en­vi­ron­ment can com­pute the out­puts on its own, with­out need­ing to be given the AI’s out­puts, the can­di­date en­vi­ron­ment does not need code to in­ject the AI’s out­puts into it.

Even if the AI can only par­tially pre­dict its own be­hav­ior based on the be­hav­ior of the hard­ware it ob­serves in the world, it can use that in­for­ma­tion to more effi­ciently en­code its out­puts in the can­di­date en­vi­ron­ment, so it can have some un­der­stand­ing of its po­si­tion in the world even with­out be­ing able to perfectly pre­dict its own be­hav­ior from first prin­ci­ples.

If the AI man­ages to de­stroy it­self, it will ex­pect its out­puts to be dis­con­nected from the world and have no con­se­quences, since any­thing else would vi­o­late its ex­pec­ta­tions about the laws of physics.

This back-and-forth ap­pears to be use­less. I should prob­a­bly do some Python ex­per­i­ments and we then can change this from a de­bate to a pro­gram­ming prob­lem, which would be much more pleas­ant.

• How­ever, if the can­di­date en­vi­ron­ment can com­pute the out­puts on its own, with­out need­ing to be given the AI’s out­puts, the can­di­date en­vi­ron­ment does not need code to in­ject the AI’s out­puts into it.

If a can­di­date en­vi­ron­ment has no spe­cial code to in­ject AIXI’s out­puts, then when AIXI com­putes ex­pected util­ities, it will find that all ac­tions have equal util­ity in that en­vi­ron­ment, so that en­vi­ron­ment will play no role in its de­ci­sions.

I should prob­a­bly do some Python ex­per­i­ments and we then can change this from a de­bate to a pro­gram­ming prob­lem, which would be much more pleas­ant.

Ok, but try not to de­stroy the world while you’re at it. :) Also, please take a closer look at UDT first. Again, I think there’s a strong pos­si­bil­ity that you’ll end up think­ing “why did I waste my time defend­ing CDT/​AIXI?”

• FYI, gen­er­at­ing re­ward val­ues in­ter­nally—in­stead of them be­ing ob­served in the en­vi­ron­ment—makes no differ­ence what­so­ever to the wire­head prob­lem.

AIXI dig­ging into its brains with its own min­ing claws is quite plau­si­ble. It won’t rea­son as you sug­gest—since it has no idea that it is in­stan­ti­ated in the real world. So, its ex­plo­ra­tory min­ing claws may plunge in. Hope­fully it will get suit­ably nega­tively re­in­forced for that—though much will de­pend on which part of its brain it causes dam­age too. It could find that rip­ping out its own in­hi­bi­tion cir­cuits is very re­ward­ing.

A larger set of sym­bols for re­wards makes no differ­ence—since the re­ward sig­nal is a scalar. If you com­pare with an an­i­mal, that has mil­lions of pain sen­sors that op­er­ate in par­allel. The an­i­mal is onto some­thing there—some­thing to do with a-pri­ori knowl­edge about the com­mon causes of pain. Hav­ing lots of pain sen­sors has pos­i­tive as­pects—e.g. it saves you ex­per­i­ment­ing to figure out what hurts.

As for the refer­ence ma­chine is­sue, I do say: “This prob­lem is also not very se­ri­ous.”

Not very se­ri­ous un­less you are mak­ing claims about your agent be­ing “the most in­tel­li­gent un­bi­ased agent pos­si­ble”. Then this kind of thing starts to make a differ­ence...

• A larger set of sym­bols for re­wards makes no differ­ence—since the re­ward sig­nal is a scalar. If you com­pare with an an­i­mal, that has mil­lions of pain sen­sors that op­er­ate in par­allel. The an­i­mal is onto some­thing there—some­thing to do with a-pri­ori knowl­edge about the com­mon causes of pain. Hav­ing lots of pain sen­sors has pos­i­tive as­pects—e.g. it saves you ex­per­i­ment­ing to figure out what hurts.

You can en­code 16 64 bit in­te­gers in a 1024 bit in­te­ger. The scalar/​par­allel dis­tinc­tion is bo­gus.

(Edit: I origi­nal wrote “5 32 bit in­te­gers” when I meant “2**5 32 bit in­te­gers”. Changed to “16 64 bit in­te­gers” be­cause “32 32 bit in­te­gers” looked too much like a typo.)

Not very se­ri­ous un­less you are mak­ing claims about your agent be­ing “the most in­tel­li­gent un­bi­ased agent pos­si­ble”. Then this kind of thing starts to make a differ­ence...

Straw­man ar­gu­ment. The only claim made is that it’s the most in­tel­li­gent up to a con­stant fac­tor, and a bunch of other con­di­tions are thrown in. When Hut­ter’s in­volved, you can bet that some of the con­stant fac­tors are large com­pared to the size of the uni­verse.

• You can en­code 5 32 bit in­te­gers in a 1024 bit in­te­ger. The scalar/​par­allel dis­tinc­tion is bo­gus.

Er, not if you are adding the re­wards to­gether and max­imis­ing the re­sults, you can’t! That is ex­actly what hap­pens to the re­wards used by AIXI.

Not very se­ri­ous un­less you are mak­ing claims about your agent be­ing “the most in­tel­li­gent un­bi­ased agent pos­si­ble”. Then this kind of thing starts to make a differ­ence...

Straw­man ar­gu­ment. The only claim made is that it’s the most in­tel­li­gent up to a con­stant fac­tor, and a bunch of other con­di­tions are thrown in.

Ac­tu­ally Hut­ter says this sort of thing all over the place (I was quot­ing him above) - and it seems pretty ir­ri­tat­ing and mis­lead­ing to me. I’m not say­ing the claims he makes in the fine print are wrong, but rather that the mar­ket­ing head­lines are mis­lead­ing.

• You can en­code 5 32 bit in­te­gers in a 1024 bit in­te­ger. The scalar/​par­allel dis­tinc­tion is bo­gus.

Er, not if you are adding the re­wards to­gether and max­imis­ing the re­sults, you can’t! That is ex­actly what hap­pens to the re­wards used by AIXI.

You’re right there, I’m con­fus­ing AIXI with an­other de­sign I’ve been work­ing with in a similar idiom. For AIXI to work, you have to com­bine to­gether all the en­vi­ron­men­tal stuff and com­pute a util­ity, make the code for do­ing the com­bin­ing part of the en­vi­ron­ment (not the AI), and then use that re­sult­ing util­ity as the in­put to AIXI.

• For more de­tails see James Joyce’s The foun­da­tions of causal de­ci­sion the­ory.

Thankyou for the refer­ence, and the ex­pla­na­tion.

I am prompted to ask my­self a ques­tion analo­gous to the one Eliezer re­cently asked:

Doesn’t have a name as far as I know. But I’m not sure it de­serves one; would CDT re­ally be a prob­a­ble out­put any­where be­sides a ver­bal the­ory ad­vo­cated by hu­man philoso­phers in our own Everett branch? Maybe, now that I think about it, but even so, does it mat­ter?

Is it worth my while ex­plor­ing the de­tails of CDT for­mal­iza­tion be­yond just the page you linked to? There seems to be some ad­van­tage to un­der­stand­ing the de­tails and con­ven­tions of how such con­cepts are de­scribed. At the same time re­vis­ing CDT think­ing in too much de­tail may elimi­nate some en­tirely jus­tifi­able con­fu­sion as to why any­one would think it is a good idea! “Causal Ex­pected Utiluty”? “Causal Ten­den­cies”? What the? I only care about what will get me the best out­come!

• Is it worth my while ex­plor­ing the de­tails of CDT for­mal­iza­tion be­yond just the page you linked to?

Prob­a­bly not. I only learned it by ac­ci­dent my­self. I had come up with a proto-UDT that was mo­ti­vated purely by an­thropic rea­son­ing para­doxes (as op­posed to New­comb-type prob­lems like CDT and TDT), and wanted to learn how ex­ist­ing de­ci­sion the­o­ries were for­mal­ized so I could do some­thing similar. James Joyce’s book was the most promi­nent such book available at the time.

ETA: Sorry, I think the above is prob­a­bly not en­tirely clear or helpful. It’s a bit hard for me to put my­self in your po­si­tion and try to figure out what may or may not be worth­while for you. The fact is that Joyce’s book is the de­ci­sion the­ory book I read, and quite pos­si­bly it in­fluenced me more than I re­al­ize, or is more use­ful for un­der­stand­ing the mo­ti­va­tion for or the for­mu­la­tion of UDT than I think. It couldn’t hurt to grab a copy of it and read a few chap­ters to see how use­ful it is to you.

• Thanks for the edit/​up­date. For refer­ence it may be worth­while to make such ad­di­tions as a new com­ment, ei­ther as a re­ply to your­self or the par­ent. It was only by chance that I spot­ted the new part!

• I was think­ing of meet­ing alien AIs, post-Sin­gu­lar­ity.

What pre-sin­gu­lar­ity ac­tions are you wor­ried about them tak­ing?

Huh? I thought we were sup­posed to be the good guys here? ;-)

What I was think­ing was that a CDT-seeded AI might ac­tu­ally be safer pre­cisely be­cause it won’t try to change pre-Sin­gu­lar­ity events, and if it’s first the new de­ci­sion the­ory will be in place in time for any post-Sin­gu­lar­ity events.

Be­sides, CDT is not well defined enough that you can im­ple­ment it even if you wanted to.

That’s sur­pris­ing to me—what should I read in or­der to un­der­stand this point bet­ter?

• But I’m not sure it de­serves one; would CDT re­ally be a prob­a­ble out­put any­where be­sides a ver­bal the­ory ad­vo­cated by hu­man philoso­phers in our own Everett branch? Maybe, now that I think about it, but even so, does it mat­ter?

Yes, for rea­sons of game the­ory and of prac­ti­cal sin­gu­lar­ity strat­egy.

Game the­ory, be­cause things in Everett branches that are ‘clos­est’ to us might be the ones it’s most im­por­tant to be able to in­ter­act with, since they’re eas­ier to simu­late and their prefer­ences are more likely to have in­ter­est­ing over­lap with ours. Know­ing very roughly what to ex­pect from our neigh­bors is use­ful.

And sin­gu­lar­ity strat­egy, be­cause if you can show that ar­chi­tec­tures like AIXI-tl have some non-neg­ligible chance of con­verg­ing to what­ever an FAI would have con­verged to, as far as ac­tual poli­cies go, then that is a very im­por­tant thing to know; es­pe­cially if a non-uFAI ex­is­ten­tial risk starts to look im­mi­nent (but the game the­ory in that case is crazy). It is not prob­a­ble but there’s a hell of a lot of struc­tural un­cer­tainty and Omo­hun­dro’s AI drives are still pretty in­for­mal. I am still not ab­solutely sure I know how a self-mod­ify­ing su­per­in­tel­li­gence would in­ter­pret or re­flect on its util­ity func­tion or terms therein (or how it would re­flect on its im­plicit policy for in­ter­pret­ing or re­flect­ing on util­ity func­tions or terms therein). The ap­par­ent rigidity of Goedel ma­chines might con­sti­tute a dis­proof in the­ory (though I’m not sure about that), but when some of the terms are se­quences of let­ters like “makeHu­man­sHappy” or for­mally ma­nipu­la­ble cor­re­lated mark­ers of hu­man hap­piness, then I don’t know how the syn­tax gets turned into se­man­tics (or fails en­tirely to get turned into se­man­tics, as they case may well be).

But it will calcu­late that ex­pected value us­ing CDT!ex­pec­ta­tion, mean­ing that it won’t see how self-mod­ify­ing to be a time­less de­ci­sion the­o­rist could pos­si­bly af­fect what’s already in the box, etcetera.

This im­plies that the ac­tu­ally-im­ple­mented-CDT agent has a sin­gle level of ab­strac­tion/​gran­u­lar­ity at like the naive re­al­ist phys­i­cal level at which it’s prov­ing things about causal re­la­tion­ships. Like, it can’t/​shouldn’t prove causal re­la­tion­ships at the level of string the­ory, and yet it’s still con­fi­dent that its ac­tions are caus­ing things de­spite that struc­tural un­cer­tainty, and yet de­spite the sym­me­try it for some rea­son can­not pos­si­bly see how switch­ing a few tran­sis­tors or chang­ing its de­ci­sion policy might af­fect things via re­la­tion­ships that are ul­ti­mately causal but cur­rently un­known for rea­sons of bound­ed­ness and not spec­u­la­tive meta­physics. It’s plau­si­ble, but I think let­ting a uni­ver­sal hy­poth­e­sis space or maybe even just Goedelian limi­ta­tions en­ter the de­ci­sion calcu­lus at any point is go­ing to make such rigidity un­likely. (This is re­lated to how a non-hy­per­com­pu­ta­tion-driven de­ci­sion the­ory in gen­eral might rea­son about the pos­si­bil­ity of hy­per­com­pu­ta­tion, or the risk of self-di­ag­o­nal­iza­tion, I think.)

• But it will calcu­late that ex­pected value us­ing CDT!ex­pec­ta­tion, mean­ing that it won’t see how self-mod­ify­ing to be a time­less de­ci­sion the­o­rist could pos­si­bly af­fect what’s already in the box, etcetera.

The CDT is mak­ing a de­ci­sion about whether to self-mod­ify even be­fore it meets the alien, based on its ex­pec­ta­tion of meet­ing the alien. How does CDT!ex­pec­ta­tion differ from Eliezer!ex­pec­ta­tion be­fore we meet the alien?

• Doesn’t have a name as far as I know. But I’m not sure it de­serves one; would CDT re­ally be a prob­a­ble out­put any­where be­sides a ver­bal the­ory ad­vo­cated by hu­man philoso­phers in our own Everett branch? Maybe, now that I think about it, but even so, does it mat­ter?

It is use­ful to sep­a­rate in one’s mind the differ­ence be­tween on one hand be­ing able to One Box and co­op­er­ate in PD with agents that you know well (shared source code) and on the other hand not firing on Baby Eaters af­ter they have already cho­sen not to fire on you. This is es­pe­cially the case when first grap­pling the sub­ject. (Could you con­firm, by the way, that Akon’s de­ci­sion in that par­tic­u­lar para­graph or two is ap­prox­i­mately what TDT would sug­gest?)

The above is par­tic­u­larly rele­vant be­cause the “have ac­cess to each other’s source code” is such a use­ful in­tu­ition pump when grap­pling with or ex­plain­ing the solu­tions to many of the rele­vant de­ci­sion prob­lems. It is use­ful to be able to draw a line on just how far the source code metaphor can take you.

There is also some­thing dis­taste­ful about mak­ing com­par­i­sons to a de­ci­sion the­ory that isn’t even im­plic­itly sta­ble un­der self mod­ifi­ca­tion. A CDT agent will change to CDT++ un­less there is an ad­di­tional flaw in the agent be­yond the poor de­ci­sion mak­ing strat­egy. If I cre­ate a CDT agent, give it time to think and then give it New­comb’s prob­lem it will One Box (and also no longer be a CDT agent). It is the er­rors in the agent that still re­main af­ter that time that need TDT or UDT to fix.

But it will calcu­late that ex­pected value us­ing CDT!ex­pec­ta­tion, mean­ing that it won’t see how self-mod­ify­ing to be a time­less de­ci­sion the­o­rist could pos­si­bly af­fect what’s already in the box, etcetera.

*nod* This is just the ‘new rules start­ing now’ op­tion. What the CDT agent does when it wakes up in an empty, bor­ing room and does some in­tro­spec­tion.

• Surely the im­por­tant thing is that it will self-mod­ify to what­ever de­ci­sion the­ory has the best con­se­quences?

The new al­gorithm will not ex­actly be TDT, be­cause it won’t try to change de­ci­sions that have already been made the way TDT does. In par­tic­u­lar this means that there’s no risk from Roko’s basilisk.

Dis­claimer: I’m not very con­fi­dent of any­thing I say about de­ci­sion the­ory.

• Eliezer says el­se­where that cur­rent de­ci­sion the­ory doesn’t let us prove a self-mod­ify­ing AI would choose to keep the goals we pro­gram into it. He wants to de­velop a proof be­fore even start­ing work on the AI.

• It’s easy to con­trive situ­a­tions where a self-mod­ify­ing AI would choose not to keep the goals pro­grammed into it, even with­out pre­com­mit­ment is­sues. Just con­trive the cir­cum­stances so it gets paid to change. Un­less there’s some­thing wrong with the ar­gu­ment there, TDT etc. won’t be enough to en­sure that the goals are kept.

• How would New­comb’s prob­lem look like in the phys­i­cal world, tak­ing quan­tum physics into ac­count? Speci­fi­cally, would Omega need to know quan­tum physics in or­der to pre­dict my de­ci­sion on “to one box or not to one box”?

To sim­plify the pic­ture, imag­ine that Omega has a vari­able with it that can be ei­ther in the state A+B or B and which is ex­pected to cor­re­late with my de­ci­sion and there­fore serves to “pre­dict” me. Omega runs some phys­i­cal pro­cess to ar­rive at the con­tents of this vari­able. I’m as­sum­ing that “to pre­dict” means “to simu­late”—i.e. Omega can pre­dict me by run­ning a simu­la­tion of me (say us­ing a uni­ver­sal quan­tum Tur­ing ma­chine) though that is not nec­es­sar­ily the only way to do so. Given that we’re in a quan­tum world, would Omega ac­tu­ally need to simu­late me in or­der to en­sure a cor­re­la­tion be­tween its vari­able and my choice, po­ten­tially in an­other galaxy, of whether to pick A+B or B?

Say |Oab> and |Ob> are the two eigen­states of Omega’s vari­able (w.r.t. some op­er­a­tor it has) and the box sys­tem in front of me similarly has two eigen­states |Cab> and |Cb> (“C” for “choice”) and my “ac­tion” is sim­ply a choice of mea­sur­ing the box sys­tem in the state |Cab> or in the state |Cb> and not a mix­ture of them.

If Omega sets up an EPR-like en­tan­gle­ment be­tween its vari­able and the box sys­tem of the form m|Oab>|Cab> + n|Ob>|Cb>, and then chooses to mea­sure a mixed state of its vari­able, say, |Oab>+|Ob>, it can bifur­cate the uni­verse. Then, if I mea­sure |Cab> (i.e. choose A+B), I end up in the same uni­verse as the one in which Omega mea­sured its vari­able to be |Oab> and if I choose |Cb>, I end up in the same uni­verse as the one in which Omega mea­sured its vari­able to be |Ob>. There­fore, if our two sys­tems are en­tan­gled this way, Omega wouldn’t need to take any trou­ble to simu­late me at all in or­der to en­sure its rep­u­ta­tion of be­ing a perfect pre­dic­tor!

That is only as far as Omega’s rep­u­ta­tion for be­ing a perfect pre­dic­tor is con­cerned. But hold on for a mo­ment there. In this setup, the box sys­tem’s state is not dis­con­nected from that of Omega’s pre­dic­tor vari­able even if Omega has left the galaxy and yet Omega can­not causally in­fluence it “con­tents”. In my think­ing, this is an ar­gu­ment against the stance of the “causal de­ci­sion the­o­rists” that what­ever the con­tents of the box, it is “fixed” and there­fore I max­i­mize my util­ity by pick­ing A+B. This is now an ar­gu­ment for the one box­ers ob­serv­ing that Omega has shown a solid his­tory of be­ing right (i.e. Omega’s in­ter­nal vari­able has always cor­re­lated with the choices of all the peo­ple be­fore), form­ing the sim­plest (?) ex­pla­na­tion that Omega could be us­ing quan­tum en­tan­gle­ment (edit: EPR-like en­tan­gle­ment) to effect the cor­re­la­tion, and there­fore choos­ing to one box so that they end up in the uni­verse with a mil­lion bucks in­stead of the one with a thou­sand.

So, my fi­nal ques­tion to peo­ple here is this—does knowl­edge of quan­tum physics re­solve New­comb’s prob­lem in favour of the one box­ers? If not, the ar­gu­ments cer­tainly would be in­ter­est­ing to read :)

edit: To clar­ify the ar­gu­ment against the causal de­ci­sion the­o­rists, “B is ei­ther empty or has a mil­lion bucks” is not true. It could be in a su­per­po­si­tion of the two that is en­tan­gled with Omega’s vari­able. There­fore the stan­dard causal ar­gu­ment for pick­ing A+B doesn’t hold any more.

• The origi­nal de­scrip­tion of the prob­lem doesn’t men­tion if you know of Omega’s strat­egy for de­cid­ing what to place in box B, or their suc­cess his­tory in pre­dict­ing this out­come—which is ob­vi­ously a very im­por­tant fac­tor.

If you know these things, then the only ra­tio­nal choice, ob­vi­ously and by a huge mar­gin, is to pick only box B.

If you don’t know any­thing other than box B may or may not con­tain a mil­lion dol­lars, and you have no rea­sons to be­lieve that it’s un­likely, like in the lot­tery, then the only ra­tio­nal de­ci­sion is to take both. This also seems to be com­pletely ob­vi­ous and un­am­bigu­ous.

But since this com­mu­nity has spent a while de­bat­ing this, I con­clude that there’s a good chance I have missed some­thing im­por­tant. What is it?

• It looks like you just restated the “para­dox”—us­ing one ar­gu­ment, it is “ob­vi­ous” to pick B and us­ing an­other ar­gu­ment, it is “ob­vi­ous” to pick both.

Also, in gen­eral, do try to avoid say­ing some­thing is “ob­vi­ous”. It usu­ally throws a lot of com­plex­ity and po­ten­tial faults into a black box and wors­ens your chances of un­cov­er­ing those faults by in­timi­dat­ing peo­ple.

• Upon read­ing this, I im­me­di­ately went,

“Well, Gen­eral Rel­a­tivity in­cludes solu­tions that have closed timelike curves, and I cer­tainly am not in any po­si­tion to rule out the pos­si­bil­ity of com­mu­ni­ca­tion by such. So I have no ac­tual rea­son to rule out the pos­si­bil­ity that which strat­egy I choose will, af­ter I make my de­ci­sion, be com­mu­ni­cated to Omega in my past and then the boxes filled ac­cord­ingly. So I bet­ter one-box in or­der to choose the closed timelike loop where Omega fills the box.”

I un­der­stand, look­ing at Wikipe­dia, that in Noz­ick’s for­mu­la­tion he sim­ply de­clared that the box won’t be filled based on the ac­tual de­ci­sion. Fine. How would he go about prov­ing that to some­one ac­tu­ally faced with the sce­nario? Ra­tional peo­ple do not risk a mil­lion dol­lars based on an un­prov­able state­ment by a philoso­pher. Same with claims that, for ex­am­ple, Omega didn’t set up the boxes so that two-box­ing ac­tu­ally re­sults in the an­nihila­tion of the con­tents of box B. Or that Omega doesn’t tele­port the money in B some­how af­ter the de­cider makes the de­ci­sion to one-box. Those dec­la­ra­tions may have a truth value of 1 for pur­poses of a per­son out­side ob­serv­ing the sce­nario, but un­less em­piri­cally testable within the sce­nario, can­not be val­ued as ap­prox­i­mat­ing 1 by the per­son mak­ing the de­ci­sion.

Every “given” that the de­ci­sion-maker can’t ver­ify is a “given” that is not us­able for mak­ing the de­ci­sion. The whole ar­gu­ment for two-box­ing de­pends on a bound­ary vi­o­la­tion; that the knowl­edge known by the reader but which can­not be known to the char­ac­ter in the sce­nario can some­how be used by the char­ac­ter in the sce­nario to make a de­ci­sion.

• “the dom­i­nant con­sen­sus in mod­ern de­ci­sion the­ory is that one should two-box...there’s a com­mon at­ti­tude that ver­bal ar­gu­ments for one-box­ing are easy to come by, what’s hard is de­vel­op­ing a good de­ci­sion the­ory that one-boxes”

This may be more a state­ment about the rele­vance and util­ity of de­ci­sion the­ory it­self as a field (or lack thereof) than the difficulty of the prob­lem, but it is at least philo­soph­i­cally in­trigu­ing.

From a phys­i­cal and com­pu­ta­tional per­spec­tive, there is no para­dox, and one need not in­voke back­wards causal­ity, ‘pre-com­mit­ment”, or cre­ate a new ‘de­ci­sion the­ory’.

The chain of phys­i­cal causal­ity just has a branch:

M0-> O(D)-> B

M0-> M1-> M2-> .. MN ->D

and O(D) = D

Where M0, M1, M2 .. . MN are the agent’s mind states, D is the agent’s de­ci­sion, O is Omega’s pre­dic­tion of the de­ci­sion, and B is the con­tent of box B.

Your de­ci­sion does not phys­i­cally cause the con­tents of box B to change. Your de­ci­sion it­self how­ever is caused by your past state of mind, and this prior state is also the cause of the box’s cur­rent con­tents (via the power of Omega’s pre­dic­tor). So your de­ci­sion and the box’s con­tents are ca­su­ally linked, en­tan­gled if you will.

From your per­spec­tive, the box’s con­tents are un­known. Your fi­nal de­ci­sion is also un­known to you, un­de­cided, un­til the mo­ment you make that de­ci­sion by open­ing the box. Mak­ing the de­ci­sion it­self re­veals this in­for­ma­tion about your mind his­tory to you, along with the con­tents of the box.

One way of think­ing about it is that this prob­lem is an illus­tra­tion of the dic­tum that any mind or com­pu­ta­tional sys­tem can never fully pre­dict it­self from within.

Note that in the con­text of ac­tual AI in com­puter sci­ence, this type of re­flec­tive search (con­sid­er­ing a po­ten­tial de­ci­sion, then agent B’s con­se­quent de­ci­sion, your next de­ci­sion, and so on, ex­plor­ing a de­ci­sion tree) is pretty ba­sic stuff. In this case the Omega agent es­sen­tially has an in­finite branch­ing depth, but the de­ci­sion at each point is pretty sim­ple—be­cause Omega always gets the ‘last move’.

You may start as a ‘one boxer’, think­ing that af­ter the scan, you can now out­wit Omega by ‘self-mod­ify­ing’ into a ‘two-boxer’ (which re­ally can be just as sim­ple as chang­ing your in­ter­nal reg­ister), but Omega already pre­dicted this move .. and your next re­ac­tive move of flip­ping back to a ‘one-boxer’ . . and the next, on and on to in­finity . . .un­til you fi­nally run out of time and the reg­ister is sam­pled. You can con­tinue chain­ing M’s to in­finity, but you can’t change the fact that MN->D and O(D) = D.

Part of the con­fu­sion ex­pe­rienced by the causal de­ci­sion camp may stem from the sub­jec­tivity of the solu­tion.

The op­ti­mal de­ci­sion for some ab­stract al­gorithm, di­vorced from Omega’s pre­dic­tive brain­scan, will of course choose to two-box, sim­ply be­cause it’s de­ci­sion is not causally linked to the box’s con­tents.

But your N-box reg­ister is linked to the box’s con­tents, so you should set it to 1.

• I wanted to con­sider some truly silly solu­tion. But since tak­ing only box A is out (and I can’t find a good rea­son for choos­ing box A, other than a vague ar­gu­ment based in ir­ra­tional­ity along the lines that I’d rather not know if om­ni­science ex­ists…), so I came up with this in­stead. I won’t apol­o­gize for all the math-eco­nomics, but it might get dense.

Omega has been cor­rect 100 times be­fore, right? Fully in­tend­ing to take both boxes, I’ll go to each of the 100 other peo­ple. There’re 4 cat­e­gories of peo­ple. Let’s as­sume they aren’t bound by psy­chol­ogy and they’re risk-neu­tral, but they are bound by their be­liefs.

1. Two-box­ers who defend their de­ci­sion do so on ground of “no back­wards causal­ity” (uh, what’s the smart-peo­ple term for that?). They don’t be­lieve in Omega’s om­ni­science. There’s Q1 of these.

2. Two-box­ers who re­gret their de­ci­sion also con­cede to Omega’s near-perfect om­ni­science. There’re Q2 of these.

3. One-box­ers who’re happy also con­cede to Omega’s near-perfect om­ni­science. There’re Q3 of these.

4. One-box­ers who re­gret fore­go­ing \$1000. They don’t be­lieve in Omega’s om­ni­science. There’re Q4 of these.

I’ll offer groups 2 and 3 (be­liev­ers in that I’ll only get 1000) to split my 1000 be­tween them, in pro­por­tion to their bet, if they’re right. If they be­lieve in Omega’s perfect pre­dic­tive pow­ers, they think there’s a 0% chance of me win­ning. There­fore, it’s a good bet for them. Ex­pected profit = 1000/​weight-0*(all their money)>0

Groups 1 and 4 are trick­ier. They think Omega has a P chance of be­ing wrong about me. I’ll ask them to bet X=1001000P/​((1-P)weight)-eps, where weight is a pos­i­tive num­ber >1 that’s a func­tion of how many peo­ple donated how much. Ex­plic­itly defin­ing weight(Q1, Q4, var­i­ous money caps) is a medium-difficulty ex­er­cise for a be­gin­ning calcu­lus stu­dent. If you in­sist, I’ll model it, but it will take me more time than I’d already spent on this. So, for a per­son in one of these groups, ex­pected profit = -X(1-P)+1001000P/​weight = eps > 0!

So what do I have now? (Should I pray to Bayes that my in­tu­ition be con­firmed?) There’re two pos­si­ble out­comes of tak­ing both boxes.

1. Both are full. I give the 1001000 to groups 1 and 4, and col­lect Q21000+Q31000000 from groups 2 and 3, which is more than 1001000 if Q3>0 AND Q2>0, or if Q3>1. This out­come has po­ten­tial for tremen­dous profit. Call this num­ber PIE >> 1001000.

2. Only A is full. I split my 1000 be­tween groups 2 and 3, and col­lect X1Q1+X4Q4 from groups 1 and 4. What are X1 and X4 again? X, the amount of money group 1 and group 4 bet, is unique for each group. I called group 1’s X X1, group 4’s X4.

I need to find the con­di­tions when X1Q1+X4Q4 > 1000. So sup­pose I un­der­max­i­mized my profit, and com­pletely ig­nored the poor group 1 (their 1000 won’t make much differ­ence ei­ther way). Then X=X4 be­comes much sim­pler, X=1001000P/​((1-P)Q4)-eps, and then they pay­off I get is -Q4eps+1001000P/​(1-P). P = 0.001 and Q4eps < \$2 guaran­tee X1Q1+X4Q4 > X4Q4 > 1000.

That’s all well and good, but if P is low (un­der 0.5), I’m get­ting less than 1001000. What can I do? Hedge again! I would ac­tu­ally go to peo­ple of groups 1 and 4 again, ex­cept it’s get­ting too con­fus­ing, so let’s in­tro­duce a “bank” that has the same men­tal­ity as the peo­ple of groups 1 and 4 (that there’s a chance P that Omega will be wrong about me). Re­mem­ber PIE? The bank es­ti­mates my chances of get­ting PIE at P. Let’s say if I don’t get PIE, I get 1000 (which is the low­est pos­si­ble profit for out­come 2; oth­er­wise it’s not worth mak­ing that bet). I ask the fol­low­ing sum from the bank: PIEP+1000(1-P) – eps. The bank makes a profit of eps > 0. Since PIE is a large num­ber, my profit at the end is ap­prox­i­mately PIEP+1000(1-P) > 1001000.

Note that I’d been try­ing to find the LOWER bound on this gam­bit. Ac­tu­ally plug­ging in num­bers for P and Q’s eas­ily yielded prof­its in the 5 mil to 50 mil range.

• You’re es­sen­tially en­gag­ing in ar­bi­trage, tak­ing ad­van­tage of the differ­ence in the prob­a­bil­ities as­signed to both boxes be­ing full by differ­ent peo­ple. Which is one rea­son ra­tio­nal peo­ple never as­sign 0 prob­a­bil­ity to any­thing.

You could just as well go to some one-box­ers (who “be­lieve P(both full) = 0”) and offer them a \$1 bet 10000000:1 in your fa­vor that both boxes will be full; then offer the two-box­ers what­ever bet they will take “that only one box is full” that will give you more than \$1 profit if you win. Thus, ei­ther way, you make a profit, and you can make how­ever much you like just by in­creas­ing the stakes.

This still doesn’t ac­tu­ally solve new­comb’s prob­lem, though. I’d call it more of a cau­tion­ary tale against be­ing ab­solutely cer­tain.

(In­ci­den­tally, since you’re go­ing into this “fully in­tend­ing” to take both boxes, I’d ex­pect both one box­ers and two box­ers to agree on the ex­tremely low prob­a­bil­ity Omega is go­ing to have filled both boxes.)

• Yes, nshep­perd, my as­sump­tion is that P << 0.5, some­thing in the 0.0001 to 0.01 range.

Be­sides, ar­bi­trage would still be pos­si­ble if some peo­ple es­ti­mated P=0.01 and oth­ers P=0.0001, only the solu­tion would be messier than what I’d ever want to do ca­su­ally. Be­sides, if I were un­con­strained by the bets I could make (I’d tried to work with a cap be­fore), that would make mak­ing prof­its even eas­ier.

I wasn’t ex­actly try­ing to solve the prob­lem, only to find a “naively ra­tio­nal” workaround (us­ing the same naive ra­tio­nal­ity that leads pris­on­ers to rat each other out in PD).

When you’re say­ing that this doesn’t solve New­comb’s prob­lem, what do you ex­pect the solu­tion to ac­tu­ally en­tail?

• Yes, ar­bi­trage is pos­si­ble pretty much when­ever peo­ple’s prob­a­bil­ities dis­agree to any sig­nifi­cant de­gree. Set­ting P = 0 just lets you take it to ab­surd lev­els (eg. put up no stake at all, and it’s still a “fair bet”).

When you’re say­ing that this doesn’t solve New­comb’s prob­lem, what do you ex­pect the solu­tion to ac­tu­ally en­tail?

Max­i­miz­ing the money found upon open­ing the box(es) you have se­lected.

If you like, re­place the money with cures for can­cer with differ­ing prob­a­bil­ities of work­ing, or ma­chines with differ­ing prob­a­bil­ities of be­ing a halt­ing or­a­cle, or some­thing else you can’t get by ex­ploit­ing other hu­mans.

• Which is one rea­son ra­tio­nal peo­ple never as­sign 0 prob­a­bil­ity to any­thing.

I don’t know, I feel pretty con­fi­dent as­sign­ing P(A&!A)=0 :P

• Do you as­sign 0 prob­a­bil­ity to the hy­poth­e­sis that there ex­ists some­thing which you be­lieve to be math­e­mat­i­cally true which is not?

• No, P(I’m wrong about some­thing math­e­mat­i­cal) is 1-ep­silon. P(I’m wrong about this math­e­mat­i­cal thing) is of­ten low- like 2%, and some­times ac­tu­ally 0, like when dis­cussing the in­ter­sec­tion of a set and its com­ple­ment. It’s defined to be the empty set- there’s no way that it can fail to be the empty set. I may not have com­plete con­fi­dence in the rest of set the­ory, and I may not ex­pect that the com­ple­ment of a set (or the set it­self) is always well-defined, but when I limit my­self to prob­a­bil­ity mea­sures over rea­son­able spaces then I’m con­tent.

• So, for some par­tic­u­lar as­pects of math, you have cer­tainty 1-ep­silon, where ep­silon is ex­actly zero?

What you are re­ally do­ing is mak­ing the claim “Given that what I know about math­e­mat­ics is cor­rect, then the in­ter­sec­tion of a set and its com­ple­ment is the empty set.”

• I was in­ter­pret­ing “some­thing” as “at least one thing.” Al­most surely my un­der­stand­ing of math­e­mat­ics as a whole is in­cor­rect some­where, but there are a hand­ful of math­e­mat­i­cal state­ments that I be­lieve with com­plete meta­phys­i­cal cer­ti­tude.

What you are re­ally do­ing is mak­ing the claim “Given that what I know about math­e­mat­ics is cor­rect, then the in­ter­sec­tion of a set and its com­ple­ment is the empty set.”

“Cor­rect” is an un­clear word, here. Sup­pose I start off with a hand­ful of ax­ioms. What is the prob­a­bil­ity that one of the ax­ioms is true /​ cor­rect? In the con­text of that sys­tem, 1, since it’s the start­ing point. Now, the ax­ioms might not be use­ful or rele­vant to re­al­ity, and the ax­ioms may con­flict and thus the sys­tem isn’t in­ter­nally con­sis­tent (i.e. state­ments hav­ing prob­a­bil­ity 0 and 1 si­mul­ta­neously). And so the ge­ome­ter who is only 1-ep­silon sure that Eu­clid’s ax­ioms de­scribe the real world will be able to up­date grace­fully when pre­sented with ev­i­dence that real space is curved, even though they re­tain the same con­fi­dence in their Eu­clidean proofs (as they ap­ply to ab­stract con­cepts).

Ba­si­cally, I only agree with this post when it comes to state­ments about which un­cer­tainty is rea­son­able. If you re­quire 1-ep­silon cer­tainty for any­thing, even P(A|A), then you break the math of prob­a­bil­ity.

• The map is not the ter­ri­tory. “A&!A” would mean some fact about the world be­ing both true and false, rather than any­one’s be­liefs about that fact.

• As­sign­ing zero or nonzero prob­a­bil­ity to that as­ser­tion is hav­ing a be­lief about it.

• Yes, the prob­a­bil­ity is a be­lief, but your pre­vi­ous ques­tion was about some­thing more like P(!A&P(A)=1), that is to say, an ab­solute be­lief be­ing in­con­sis­tent with the facts. Vaniver’s as­ser­tion was about the facts them­selves be­ing in­con­sis­tent with the facts, which would have a rather alarm­ing lack of im­pli­ca­tions.

• “Pretty con­fi­dent” is about as close to “ac­tu­ally 0″ as the moon is (which I don’t care to quan­tify :P).

• “Pretty con­fi­dent” is about as close to “ac­tu­ally 0″ as the moon is (which I don’t care to quan­tify :P).

“Pretty con­fi­dent” was also a rhetor­i­cal un­der­state­ment. :P

• One-box­ers who re­gret fore­go­ing \$1000. They don’t be­lieve in Omega’s om­ni­science. There’re Q4 of these.

How is there any­body in this group? Con­sid­er­ing that all of them have \$1,000,000, what con­vinced them to one-box in the first place such that they later changed their minds about it and re­gret­ted the de­ci­sion? (Like, I guess a one-boxer could say af­ter­wards “I bet that guy wasn’t re­ally om­ni­scient, I should have taken the other box too, then I’d have got­ten \$1,001,000 in­stead”, but why wouldn’t a per­son who thinks that way two-box to be­gin with?)

• True.

I only took that case into ac­count for com­plete­ness, to cover my bases against the crit­i­cism that “not all one-box­ers would be happy with their de­ci­sions.”

Naively, when you have a choice be­tween 1000000.01 and 1000000.02, it’s very easy to ar­gue that the lat­ter is the bet­ter op­tion. To ar­gue for the former, you would prob­a­bly cite the in­signifi­cance of that cent next to the rest of 1000000.01: that eps doesn’t mat­ter, or that an ex­tra penny in your pocket is in­con­ve­nient, or that you already have 1000000.01, so why do you need an­other 0.01?

• “Ver­bal ar­gu­ments for one-box­ing are easy to come by, what’s hard is de­vel­op­ing a good de­ci­sion the­ory that one-boxes”

First, the prob­lem needs a cou­ple am­bi­gui­ties re­solved, so we’ll use three as­sump­tions: A) You are mak­ing this de­ci­sion based on a de­ter­minis­tic, ra­tio­nal philos­o­phy (no ran­dom­iza­tion, ex­ter­nal fac­tors, etc. can be used to make your de­ci­sion on the box) B) Omega is in fact in­fal­lible C) Get­ting more money is the goal (i.e. we are ex­clud­ing de­ci­sion-mak­ers which would pre­fer to get less money, and other such ab­sur­di­ties)

Chang­ing any of these re­sults in a differ­ent game (ei­ther one that de­pends on how Omega han­dles ran­dom strate­gies, or one which de­pends on how of­ten Omega is wrong—and we lack in­for­ma­tion on ei­ther)

Se­cond, I’m go­ing to re­frame the prob­lem a bit: Omega comes to you and has you write a de­ci­sion-mak­ing func­tion. He will eval­u­ate the func­tion, and pop­u­late Box B ac­cord­ing to his con­clu­sions on what the func­tion will re­sult in. The func­tion can be self-mod­ify­ing, but must com­plete in finite time. You are bound to the de­ci­sion made by the ac­tual ex­e­cu­tion of this func­tion.

I can’t think of any ar­gu­ment as to why this re­fram­ing would pro­duce differ­ent re­sults, given both As­sump­tions A and B as true. I feel this is a valid re­fram­ing be­cause, if we as­sume Omega is in fact in­fal­lible, I don’t see this as be­ing any differ­ent from him eval­u­at­ing the “ac­tual” de­ci­sion mak­ing func­tion that you would use in the situ­a­tion. Cer­tainly, you’re mak­ing a de­ci­sion that can be ex­pressed log­i­cally, and pre­sum­ably you have the abil­ity to think about the prob­lem and mod­ify your de­ci­sion based on that con­tem­pla­tion (i.e. you have a de­ci­sion-mak­ing func­tion, and it can be self-mod­ify­ing). If your de­ci­sion func­tion is some­how im­pos­si­ble to ren­der math­e­mat­i­cally, then I’d ar­gue that As­sump­tion A has been vi­o­lated and we are, once again, play­ing a differ­ent game. If your de­ci­sion func­tion doesn’t halt in finite time, then your pay­off is guaran­teed to be \$0, since you will never ac­tu­ally take ei­ther box >.>

Given this situ­a­tion, the AI sim­ply needs to do two things: Iden­tify that the prob­lem is New­com­bian and then iden­tify some func­tion X that pro­duces the max­i­mum ex­pected pay­off.

Iden­ti­fy­ing the prob­lem as New­com­bian should be triv­ial, since “aware­ness that this is a New­com­bian prob­lem” is a re­quire­ment of it be­ing a New­com­bian prob­lem (if Omega didn’t tell you what was in the boxes, it would be a differ­ent game, neh?)

Iden­ti­fy­ing the func­tion X is well be­yond my pro­gram­ming abil­ity, but I will as­sert defini­tively that there is no func­tion that pro­duces a highe ex­pected pay­off than f(Always One-Box). If I am proven wrong, I dare say the per­son writ­ing that proof will prob­a­bly be able to cash in to a rather sig­nifi­cant pay­off :)

Keep in mind that the de­ci­sion func­tion can self-mod­ify, but Omega can also pre­dict this. The func­tion “com­mit to One-Box un­til Omega leaves, then switch to Two-Box be­cause it’ll pro­duce a higher gain now that Omega has made his pre­dic­tion” would, ob­vi­ously, have Omega con­clude you’ll be Two-Box­ing and leave you with \$0.

I hon­estly can­not find any­thing about this that would be overly difficult to pro­gram, as­sum­ing you already had an AI that could han­dle game the­ory prob­lems (I’m as­sum­ing said AI is very, very difficult, and is cer­tainly be­yond my abil­ity).

Given this re­fram­ing, f(Always One-Box) seems like a fairly triv­ial solu­tion, and nei­ther para­dox­i­cal nor ter­ribly difficult to rep­re­sent math­e­mat­i­cally… I’m go­ing to as­sume I’m miss­ing some­thing, since this doesn’t seem to be the con­cen­sus con­clu­sion at all, but since nei­ther me nor my friend can figure out any faults, I’ll go ahead and make this my first post on LessWrong and hope that it’s not buried in ob­scu­rity due to this be­ing a 2 year old thread :)

• Rather than trans­form­ing the prob­lem in the way you did, trans­form it so that you move first—Omega doesn’t put money in the boxes un­til you say which one(s) you want.

Given this re­fram­ing, f(Always One-Box) seems like a fairly triv­ial solu­tion, and nei­ther para­dox­i­cal nor ter­ribly difficult to rep­re­sent math­e­mat­i­cally...

As a de­ci­sion prob­lem, New­comb’s prob­lem is rather pointless, IMHO. As a thought ex­per­i­ment helping us to un­der­stand the as­sump­tions that are im­plicit in game the­ory, it could be rather use­ful. The thought ex­per­i­ment shows us that when a prob­lem state­ment speci­fies a par­tic­u­lar or­der of moves, what is re­ally be­ing speci­fied is a state of knowl­edge at de­ci­sion time. When a prob­lem speci­fies that Omega moves first that is im­plic­itly in con­tra­dic­tion to the claim that he knows what you will do when you move sec­ond. The im­plicit mes­sage is that Omega doesn’t know—the ex­plicit mes­sage is that he does. If the ex­plicit mes­sage is to be be­lieved, then change the move or­der to make the im­plicit mes­sage match the ex­plicit one.

How­ever, here, many peo­ple seem to pre­fer to pre­tend that New­comb prob­lems con­sti­tute a de­ci­sion the­ory prob­lem which re­quires clever solu­tion, rather than a bit of de­liber­ate con­fu­sion con­structed by vi­o­lat­ing the im­plicit rules of the prob­lem genre.

• There is a good chance I am miss­ing some­thing here, but from an eco­nomic per­spec­tive this seems triv­ial:

P(Om) is the prob­a­bil­ity the per­son as­signs Omega of be­ing able to ac­cu­rately pre­dict their de­ci­sion ahead of time.

A. P(Om) x \$1m is the ex­pected re­turn from open­ing one box.

B. (1 - P(Om))x\$1m + \$1000 is the ex­pected re­turn of open­ing both boxes (the prob­a­bil­ity that Omega was wrong times the mil­lion plus the thou­sand.)

Since P(Om) is de­pen­dent on peo­ple’s in­di­vi­d­ual be­lief about Omega’s abil­ity to pre­dict their ac­tions it is not sur­pris­ing differ­ent peo­ple make differ­ent de­ci­sions and think they are be­ing ra­tio­nal—they are!

If A > B they choose one box, if B > A they choose both boxes.

This also shows why peo­ple will change their views if the amount in the visi­ble box is changed (to \$990,000 or \$10).

Ba­si­cally, in this in­stance, if you think the prob­a­bil­ity of Omega be­ing able to de­ter­mine your fu­ture ac­tion is greater than 0.5005 then you se­lect a sin­gle box, if less than that you se­lect both boxes. At P(Om)=0.5005 the ex­pected re­turn of both strate­gies is \$500,500.

EDIT. I think I over­sim­plified B, but the point still stands. nhamann—I didn’t see your post be­fore writ­ing mine. I think the only differ­ence be­tween them is that I state that it is a per­sonal view of the prob­a­bil­ity of Omega be­ing able to pre­dict choices and you seem to want to use the ac­tual prob­a­bil­ity that he can.

• I’m not read­ing 127 com­ments, but as a new­comer who’s been in­vited to read this page, along with barely a dozen oth­ers, as an in­tro­duc­tion, I don’t want to leave this unan­swered, even though what I have to say has prob­a­bly already been said.

First of all, the an­swer to New­comb’s Prob­lem de­pends a lot on pre­cisely what the prob­lem is. I have seen ver­sions that posit time travel, and there­fore back­wards causal­ity. In that case, it’s quite rea­son­able to take only one box, be­cause your de­ci­sion to do so does have a causal effect on the amount in Box B. Pre­sum­ably causal de­ci­sion the­o­rists would agree.

How­ever, in any ver­sion of the prob­lem where there is no clear ev­i­dence of vi­o­la­tions of cur­rently known physics and where the money has been placed by Omega be­fore my de­ci­sions, I am a two-boxer. Yet I think that your post above must not be talk­ing about the same prob­lem that I am think­ing of, es­pe­cially at the end. Although you never said so, it seems to me that you must be talk­ing about a prob­lem which says “If you choose Box B, then it will have a mil­lion dol­lars; if you choose both boxes, then Box B will be empty.”. But that is sim­ply not what the facts will be if Omega has made the de­ci­sion in the past and cur­rently un­der­stood physics ap­plies. In the prob­lem as stated, Omega may make mis­takes in the fu­ture, and that makes all the differ­ence.

It’s pre­sump­tu­ous of me to as­sume that you’re talk­ing about a differ­ent prob­lem from the one that you stated, I know. But as I read the psy­cholog­i­cal states that you sug­gest that I might have —that I might wish that I con­sid­ered one-box­ing ra­tio­nal, for ex­am­ple—, they seem ut­terly in­sane. Why would I wish such a thing? What does it have to do with any­thing? The only thing that I can wish for is that Omega has pre­dicted that I will be a one-boxer, which has noth­ing to do with what I con­sider ra­tio­nal now.

The quo­ta­tion from Joyce ex­plains it well, up un­til the end, where poor phras­ing may have con­fused you. The last sen­tence should read:

When Rachel wishes she was Irene’s type she is wish­ing for Irene’s cir­cum­stances, not wish­ing to make Irene’s choice.

It is sim­ply not true that Rachel en­vies Irene’s choice. Rachel en­vies Irene’s situ­a­tion, the situ­a­tion where there is a mil­lion dol­lars in Box B. And if Rachel were in that situ­a­tion, then she would still take both boxes! (At least if I un­der­stand Joyce cor­rectly.)

Pos­si­bly one thing that dis­t­in­guishes me from one-box­ers, and maybe even most two-box­ers, is that I un­der­stand fun­da­men­tal physics rather thor­oughly and my prior has a very strong pre­sump­tion against back­wards causal­ity. The mere fact that Omega has made suc­cess­ful pre­dic­tions about New­comb’s Para­dox will never be enough to over­rule that. Even be­ing su­per­in­tel­li­gent and com­ing from an­other galaxy is not enough, al­though things change if Omega (known to be su­per­in­tel­li­gent and hon­est) claims to be a time-trav­el­ler. Per­haps for some one-box­ers, and even for some ir­ra­tional two-box­ers, Omega’s past suc­cess at pre­dic­tion is good ev­i­dence for back­wards causal­ity, but not for me.

So sup­pose that some­body puts two boxes down be­fore me, pre­sents con­vinc­ing ev­i­dence for the situ­a­tion as you stated it above (but no more), and goes away. Then I will sim­ply take all of the money that this per­son has given me: both boxes. Be­fore I open them, I will hope that they pre­dicted that I will choose only one. After I open them, if I find Box B empty, then I will wish that they had pre­dicted that I would choose only one. But I will not wish that I had cho­sen only one. And I cer­tainly will not hope, be­fore­hand, that I will choose only one and yet nev­er­the­less choose two; that would in­deed be ir­ra­tional!

• You are dis­posed to take two boxes. Omega can tell. (Per­haps by read­ing your com­ment. Heck, I can tell by read­ing your com­ment, and I’m not even a su­per­in­tel­li­gence.) Omega will there­fore not put a mil­lion dol­lars in Box B if it sets you a New­comb’s prob­lem, be­cause its de­ci­sion to do so de­pends on whether you are dis­posed to take both boxes or not, and you are.

I am dis­posed to take one box. Omega can tell. (Per­haps by read­ing this com­ment. I bet you can tell by read­ing my com­ment, and I also bet that you’re not a su­per­in­tel­li­gence.) Omega will there­fore put a mil­lion dol­lars in Box B if it sets me a New­comb’s prob­lem, be­cause its de­ci­sion to do so de­pends on whether I am dis­posed to take both boxes or not, and I’m not.

If we both get pairs of boxes to choose from, I will get a mil­lion dol­lars. You will get a thou­sand dol­lars. I will be mon­e­tar­ily bet­ter off than you.

But wait! You can fix this. All you have to do is be dis­posed to take just Box B. You can do this right now; there’s no rea­son to wait un­til Omega turns up. Omega does not care why you are so dis­posed, only that you are so dis­posed. You can mut­ter to your­self all you like about how silly the prob­lem is; as long as you wan­der off with just B un­der your arm, it will tend to be the case that you end the day a mil­lion­aire.

• Some­time ago I figured out a re­fu­ta­tion of this kind of rea­son­ing in Coun­ter­fac­tual Mug­ging, and it seems to ap­ply in New­comb’s Prob­lem too. It goes as fol­lows:

Imag­ine an­other god, Up­silon, that offers you a similar two-box setup—ex­cept to get the \$2M in the box B, you must be a one-boxer with re­gard to Up­silon and a two-boxer with re­gard to Omega. (Up­silon pre­dicts your coun­ter­fac­tual be­hav­ior if you’d met Omega in­stead.) Now you must choose your dis­po­si­tions wisely be­cause you can’t win money from both gods. The right dis­po­si­tion de­pends on your pri­ors for en­coun­ter­ing Omega or Up­silon, which is a “bead jar guess” be­cause both gods are very im­prob­a­ble. In other words, to win in such prob­lems, you can’t just look at each prob­lem in­di­vi­d­u­ally as it arises—you need to have the cor­rect prior/​pre­dis­po­si­tion over all pos­si­ble pre­dic­tors of your ac­tions, be­fore you ac­tu­ally meet any of them. Ob­tain­ing such a prior is difficult, so I don’t re­ally know what I’m pre­dis­posed to do in New­comb’s Prob­lem if I’m faced with it some­day.

• Omega lets me de­cide to take only one box af­ter meet­ing Omega, when I have already up­dated on the fact that Omega ex­ists, and so I have much bet­ter knowl­edge about which sort of god I’m likely to en­counter. Up­silon treats me on the ba­sis of a guess I would sub­junc­tively make with­out knowl­edge of Up­silon. It is there­fore not sur­pris­ing that I tend to do much bet­ter with Omega than with Up­silon, be­cause the rele­vant choices be­ing made by me are be­ing made with much bet­ter knowl­edge. To put it an­other way, when Omega offers me a New­comb’s Prob­lem, I will con­di­tion my choice on the known ex­is­tence of Omega, and all the Up­silon-like gods will tend to can­cel out into Pas­cal’s Wagers. If I run into an Up­silon-like god, then, I am not overly wor­ried about my poor perfor­mance—it’s like run­ning into the Chris­tian God, you’re screwed, but so what, you won’t ac­tu­ally run into one. Even the best ra­tio­nal agents can­not perform well on this sort of sub­junc­tive hy­poth­e­sis with­out much bet­ter knowl­edge while mak­ing the rele­vant choices than you are offer­ing them. For ev­ery ra­tio­nal agent who performs well with re­spect to Up­silon there is one who performs poorly with re­spect to anti-Up­silon.

On the other hand, beat­ing New­comb’s Prob­lem is easy, once you let go of the idea that to be “ra­tio­nal” means perform­ing a strange rit­ual cog­ni­tion in which you must only choose on the ba­sis of phys­i­cal con­se­quences and not on the ba­sis of cor­rect pre­dic­tions that other agents re­li­ably make about you, so that (if you choose us­ing this bizarre rit­ual) you go around re­gret­ting how ter­ribly “ra­tio­nal” you are be­cause of the cor­rect pre­dic­tions that oth­ers make about you. I sim­ply choose on the ba­sis of the cor­rect pre­dic­tions that oth­ers make about me, and so I do not re­gret be­ing ra­tio­nal.

And these ques­tions are highly rele­vant and re­al­is­tic, un­like Up­silon; in the fu­ture we can ex­pect there to be lots of ra­tio­nal agents that make good pre­dic­tions about each other.

• Omega lets me de­cide to take only one box af­ter meet­ing Omega, when I have already up­dated on the fact that Omega ex­ists, and so I have much bet­ter knowl­edge about which sort of god I’m likely to en­counter.

In what sense can you up­date? Up­dat­ing is about fol­low­ing a plan, not about de­cid­ing on a plan. You already know that it’s pos­si­ble to ob­serve any­thing, you don’t learn any­thing new about en­vi­ron­ment by ob­serv­ing any given thing. There could be a deep con­nec­tion be­tween up­dat­ing and log­i­cal un­cer­tainty that makes it a good plan to up­date, but it’s not ob­vi­ous what it is.

• Huh? Up­dat­ing is just about up­dat­ing your map. (?) The next sen­tence I didn’t un­der­stand the rea­son­ing of, could you ex­pand?

• In­tu­itively, the no­tion of up­dat­ing a map of fixed re­al­ity makes sense, but in the con­text of de­ci­sion-mak­ing, for­mal­iza­tion in full gen­er­al­ity proves elu­sive, even un­nec­es­sary, so far.

By mak­ing a choice, you con­trol the truth value of cer­tain state­ments—state­ments about your de­ci­sion-mak­ing al­gorithm and about math­e­mat­i­cal ob­jects de­pend­ing on your al­gorithm. Only some of these math­e­mat­i­cal ob­jects are part of the “real world”. Ob­ser­va­tions af­fect what choices you make (“up­dat­ing is about fol­low­ing a plan”), but you must have de­cided be­fore­hand what con­se­quences you want to es­tab­lish (“[up­dat­ing is] not about de­cid­ing on a plan”). You could have de­cided be­fore­hand to care only about math­e­mat­i­cal struc­tures that are “real”, but what char­ac­ter­izes those struc­tures apart from the fact that you care about them?

• Pas­cal’s Wagers, huh. So your de­ci­sion the­ory re­quires a spe­cific prior?

• This is not a re­fu­ta­tion, be­cause what you de­scribe is not about the thought ex­per­i­ment. In the thought ex­per­i­ment, there are no Up­silons, and so noth­ing to worry about. It is if you face this sce­nario in real life, where you can’t be given guaran­tees about the ab­sence of Up­silons, that your rea­son­ing be­comes valid. But it doesn’t re­fute the rea­son­ing about the thought ex­per­i­ment where it’s pos­tu­lated that there are no Up­silons.

• Thanks for drop­ping the links here. FWIW, I agree with your ob­jec­tion. But at the very least, the peo­ple claiming they’re “one-box­ers” should also make the dis­tinc­tion you make.

Also, user Nisan tried to ar­gue that var­i­ous Up­silons and other fauna must bal­ance them­selves out if we use the uni­ver­sal prior. We even­tu­ally took this ar­gu­ment to email, but failed to move each other’s po­si­tions.

• Just didn’t want you con­fus­ing peo­ple or mis­rep­re­sent­ing my opinion, so made ev­ery­thing clear. :-)

• OK. I as­sume the usual (Omega and Up­silon are both re­li­able and sincere, I can re­li­ably dis­t­in­guish one from the other, etc.)

Then I can’t see how the game doesn’t re­duce to stan­dard New­comb, mod­ulo a sim­ple prob­a­bil­ity calcu­la­tion, mostly based on “when I en­counter one of them, what’s my prob­a­bil­ity of meet­ing the other dur­ing my life­time?” (plus var­i­ous “ac­tu­ar­ial” calcu­la­tions).

If I have no in­for­ma­tion about the prob­a­bil­ity of en­coun­ter­ing ei­ther, then my de­ci­sion may be in­cor­rect—but there’s noth­ing para­dox­i­cal or sur­pris­ing about this, it’s just a nor­mal, “bor­ing” ex­am­ple of an in­com­plete in­for­ma­tion prob­lem.

you need to have the cor­rect prior/​pre­dis­po­si­tion over all pos­si­ble pre­dic­tors of your ac­tions, be­fore you ac­tu­ally meet any of them.

I can’t see why that is—again, as­sum­ing that the full prob­lem is ex­plained to you on en­coun­ter­ing ei­ther Up­silon or Omega, both are truh­ful, etc. Why can I not perform the ap­pro­pri­ate calcu­la­tions and make an ex­pec­ta­tion-max­imis­ing de­ci­sion even af­ter Up­silon-Omega has left? Surely Omega-Up­silon can pre­dict that I’m go­ing to do just that and act ac­cord­ingly, right?

• Yes, this is a stan­dard in­com­plete in­for­ma­tion prob­lem. Yes, you can do the calcu­la­tions at any con­ve­nient time, not nec­es­sar­ily be­fore meet­ing Omega. (Th­ese calcu­la­tions can’t use the in­for­ma­tion that Omega ex­ists, though.) No, it isn’t quite as sim­ple as you state: when you meet Omega, you have to calcu­late the coun­ter­fac­tual prob­a­bil­ity of you hav­ing met Up­silon in­stead, and so on.

• I’m pretty sure the logic is cor­rect. I do make silly math mis­takes some­times, but I’ve tested this one on Vladimir Nesov and he agrees. No com­ment from Eliezer yet (this sce­nario was first posted to de­ci­sion-the­ory-work­shop).

• It re­minds me vaguely of Pas­cal’s Wager, but my cached re­sponses there­unto are not trans­lat­ing in­for­ma­tively.

• Then I think the origi­nal New­comb’s Prob­lem should re­mind you of Pas­cal’s Wager just as much, and my sce­nario should be analo­gous to the re­fu­ta­tion thereof. (There­unto? :-)

• But wait! You can fix this. All you have to do is be dis­posed to take just Box B.

No, that’s not what I should do. What I should do is make Omega think that I am dis­posed to take just Box B. If I can suc­cess­fully make Omega think that I’ll take only Box B but still take both boxes, then I should. But since Omega is su­per­in­tel­li­gent, let’s take it as un­der­stood that the only way to make Omega think that I’ll take only Box B is to make it so that I’ll ac­tu­ally take Box B. Then that is what I should do.

But I have to do it now! (I don’t do it now only be­cause I don’t be­lieve that this situ­a­tion will ever hap­pen.) Once Omega has placed the boxes and left, if the known laws of physics ap­ply, then it’s too late!

If you take only Box B and get a mil­lion dol­lars, wouldn’t you re­gret hav­ing not also taken Box A? Not only would you have got­ten a thou­sand dol­lars more, you’d also have shown up that know-it-all su­per­in­tel­li­gent in­ter­galac­tic trav­el­ler too! That’s a chance that I’ll never have, since Omega will read my com­ment here and leave my Box B empty, but you might have that chance, and if so then I hope you’ll take it.

• It’s not re­ally too late then. Omega can pre­dict what you’ll do be­tween see­ing the boxes, and choos­ing which to take. If this is go­ing to in­clude a de­ci­sion to take one box, then Omega will put a mil­lion dol­lars in that box.

I will not re­gret tak­ing only one box. It strikes me as in­con­sis­tent to re­gret act­ing as the per­son I most wish to be, and it seems clear that the per­son I most wish to be will take only one box; there is no room for ap­proved re­gret.

• It’s not re­ally too late then.

If you say this, then you be­lieve in back­wards causal­ity (or a break­down of the very no­tion of causal­ity, as in Kevin’s com­ment be­low). I agree that if causal­ity doesn’t work, then I should take only Box B, but noth­ing in the prob­lem as I un­der­stand it from the origi­nal post im­plies any vi­o­la­tion of the known laws of physics.

If known physics ap­plies, then Omega can pre­dict all it likes, but my ac­tions af­ter it has placed the boxes can­not af­fect that pre­dic­tion. There is always the chance that it pre­dicts that I will take both boxes but I take only Box B. There is even the chance that it will pre­dict that I will take only Box B but I take both boxes. Noth­ing in the prob­lem state­ment rules that out. It would be differ­ent if that were ac­tu­ally im­pos­si­ble for some rea­son.

I will not re­gret tak­ing only one box.

I knew that you wouldn’t, of course, since you’re a one-boxer. And we two-box­ers will not re­gret tak­ing both boxes, even if we find Box B empty. Bet­ter \$1000 than noth­ing, we will think!

• If known physics ap­plies, then Omega can pre­dict all it likes, but my ac­tions af­ter it has placed the boxes can­not af­fect that pre­dic­tion. There is always the chance that it pre­dicts that I will take both boxes but I take only Box B. There is even the chance that it will pre­dict that I will take only Box B but I take both boxes. Noth­ing in the prob­lem state­ment rules that out. It would be differ­ent if that were ac­tu­ally im­pos­si­ble for some rea­son.

Ah, I see what the probem is. You have a con­fused no­tion of free will and what it means to make a choice.

Mak­ing a choice be­tween two op­tions doesn’t mean there is a real chance that you might take ei­ther op­tion (there always is at least an in­finites­i­mal chance, but that it always true even for things that are not use­fully de­scribed as a choice). It just means that at­tribut­ing the rea­son for your tak­ing what­ever op­tion you take is most use­fully at­tributed to you (and not e.g. grav­ity, gov­ern­ment, the per­son hold­ing a gun to you head etc.). In the end, though, it is (un­less the choice is so close that ran­dom noise makes the differ­ence) a fact about you that you will make the choice you will make. And it is in prin­ci­ple pos­si­ble for oth­ers to dis­cover this fact about you.

If it is a fact about you that you will one-box it is not pos­si­ble that you will two-box. If it is a fact about you that you will two-box it is not pos­si­ble that you will one-box. If it is a fact about you that you will leave the choice up to chance then Omega prob­a­bly doesn’t offer you to take part in the first place.

Now, when de­cid­ing what choice to make it is usu­ally most use­ful to pre­tend there is a real pos­si­bil­ity of tak­ing ei­ther op­tion, since that gen­er­ally causes facts about you that are more benefi­tial to you. And that you do that is just an­other fact about you, and in­fluences the fact about which choice you make. Usu­ally the fact which choice you will make has no con­se­quences be­fore you make your choice, and so you can model the rest of the world as be­ing the same in ei­ther case up to that point when coun­ter­fac­tu­ally con­sid­er­ing the con­se­quences of ei­ther choice. But the fact about which choice you will make is just an­other fact like any other, and is al­lowed, even if it usu­ally doesn’t, to have con­se­quences be­fore that point in time. If it does it is best, for the very same rea­son you pre­tend that ei­ther choice is a real pos­si­bil­ity in the first place, to also model the rest of the world as differ­ent con­tin­gent on your choice. That doesn’t mean back­wards causal­ity. Model­ing the word in this way is just an­other fact about you that gen­er­ates good out­comes.

• Ali­corn:

It’s not re­ally too late then. Omega can pre­dict what you’ll do be­tween see­ing the boxes, and choos­ing which to take. If this is go­ing to in­clude a de­ci­sion to take one box, then Omega will put a mil­lion dol­lars in that box.

TobyBar­tels:

If you say this, then you be­lieve in back­wards causal­ity (or a break­down of the very no­tion of causal­ity, as in Kevin’s com­ment be­low). I agree that if causal­ity doesn’t work, then I should take only Box B, but noth­ing in the prob­lem as I un­der­stand it from the origi­nal post im­plies any vi­o­la­tion of the known laws of physics.

I re­mem­ber read­ing an ar­ti­cle about some­one who sincerely lacked re­spect for peo­ple who were ‘soft’ (not ex­act quote) on the death penalty … be­fore end­ing up on the jury of a death penalty case, and ul­ti­mately sup­port­ing life in prison in­stead. It is not in­con­ceiv­able that a suffi­ciently canny an­a­lyst (e.g. Omega) could de­duce that the pro­cess of be­ing picked would mo­ti­vate you to re­con­sider your stance. (Or, per­haps more likely, mo­ti­vate a pro­fessed one-boxer like me to re­con­sider mine.)

• If you say this, then you be­lieve in back­wards causal­ity (or a break­down of the very no­tion of causal­ity, as in Kevin’s com­ment be­low). I agree that if causal­ity doesn’t work, then I should take only Box B, but noth­ing in the prob­lem as I un­der­stand it from the origi­nal post im­plies any vi­o­la­tion of the known laws of physics.

Be­ware hid­den in­fer­ences. Ta­boo causal­ity.

• I don’t see what that link has to do with any­thing in my com­ment thread. (I haven’t read most of the other threads in re­ply to this post.)

I should ex­plain what I mean by ‘causal­ity’. I do not mean some meta­phys­i­cal ne­ces­sity, whereby ev­ery event (called an ‘effect’) is de­ter­mined (or at least in­fluenced in some asym­met­ric way) by other events (called its ‘causes’), which must be (or at least so far seem to be) prior to the effect in time, lead­ing to in­finite regress (ap­par­ently back to the Big Bang, which is some­how an ex­cep­tion). I do not mean any­thing that Aris­to­tle knew enough physics to un­der­stand in any but the vaguest way.

I mean the flow of macro­scopic en­tropy in a phys­i­cal sys­tem.

The best refer­ence that I know on the ar­row of time is Huw Price’s 1996 book Time’s Ar­row and Archimedes’ Point. But ac­tu­ally I didn’t un­der­stand how en­tropy flow leads to a phys­i­cal con­cept of causal­ity un­til sev­eral years af­ter I read that, so that might not ac­tu­ally help, and I’m hav­ing no luck find­ing the In­ter­net con­ver­sa­tion that made it click for me.

But ba­si­cally, I’m say­ing that, if known physics ap­plies, then P(there is money in Box B|all in­for­ma­tion available on a macro­scopic level when Omega placed the boxes) = P(there is money in Box B|all in­for­ma­tion … placed the boxes & I pick both boxes), even though P(I pick both boxes|all in­for­ma­tion … placed the boxes) < 1, be­cause macro­scopic en­tropy strictly in­creases be­tween the plac­ing of the boxes and the time that I fi­nally pick a box.

So I need to be given ev­i­dence that known physics does not ap­ply be­fore I pick only Box B, and a suc­cess­ful record of pre­dic­tions by Omega will not do that for me.

• The Psy­chopath But­ton: Paul is de­bat­ing whether to press the ‘kill all psy­chopaths’ but­ton. It would, he thinks, be much bet­ter to live in a world with no psy­chopaths. Un­for­tu­nately, Paul is quite con­fi­dent that only a psy­chopath would press such a but­ton. Paul very strongly prefers liv­ing in a world with psy­chopaths to dy­ing. Should Paul press the but­ton? (Set aside your the­o­ret­i­cal com­mit­ments and put your­self in Paul’s situ­a­tion. Would you press the but­ton? Would you take your­self to be ir­ra­tional for not do­ing so?)

New­comb’s Fire­bomb: There are two boxes be­fore you. Box A definitely con­tains \$1,000,000. Box B definitely con­tains \$1,000. You have two choices: take only box A (call this one-box­ing), or take both boxes (call this two-box­ing). You will sig­nal your choice by press­ing one of two but­tons. There is, as usual, an un­can­nily re­li­able pre­dic­tor on the scene. If the pre­dic­tor has pre­dicted that you will two-box, he has planted an in­cen­di­ary bomb in box A, wired to the two-box but­ton, so that press­ing the two-box but­ton will cause the bomb to deto­nate, burn­ing up the \$1,000,000. If the pre­dic­tor has pre­dicted that you will one-box, no bomb has been planted – noth­ing un­to­ward will hap­pen, whichever but­ton you press. The pre­dic­tor, again, is un­can­nily ac­cu­rate.

I would sug­gest look­ing at your im­plicit choice of coun­ter­fac­tu­als and their role in your de­ci­sion the­ory. Stan­dard causal de­ci­sion the­ory in­volves lo­cal vi­o­la­tions of the laws of physics (you as­sign prob­a­bil­ities to the world be­ing such that you’ll one-box, or such that you’ll one-box, and then ask what mir­a­cle mag­i­cally al­ter­ing your de­ci­sion, with­out any con­nec­tion to your psy­cholog­i­cal dis­po­si­tions, etc, would de­liver the high­est util­ity). Stan­dard causal de­ci­sion the­ory is a nor­ma­tive prin­ci­ple for ac­tion, that says to do the ac­tion that would de­liver the most util­ity if a cer­tain kind of mir­a­cle hap­pened. But you can get differ­ent ver­sions of causal de­ci­sion the­ory by sub­sti­tut­ing differ­ent sorts of mir­a­cles, e.g. you can say: “if I one-box, then I have a psy­chol­ogy that one-boxes, and like­wise for two-box­ing” so you se­lect the ac­tion such that a mir­a­cle giv­ing you the dis­po­si­tion to do so ear­lier on would have been bet­ter. Yet an­other sort of coun­ter­fac­tual that can be hooked up to the causal de­ci­sion the­ory frame­work would go “there’s some math­e­mat­i­cal fact about what de­ci­sion(de­ci­sions given Everett) my brain struc­ture leads to in stan­dard physics, and the pre­dic­tor has ac­cess to this math­e­mat­i­cal info, so I’ll se­lect the ac­tion that would be best brought about by a mir­a­cle chang­ing that math­e­mat­i­cal fact”.

• Thanks for the replies, ev­ery­body!

This is a global re­sponse to sev­eral replies within my lit­tle thread here, so I’ve put it at nearly the top level. Hope­fully that works out OK.

I’m glad that FAWS brought up the prob­a­bil­is­tic ver­sion. That’s be­cause the greater the prob­a­bil­ity that Omega makes mis­takes, the more in­clined I am to take two boxes. I once read the claim that 70% of peo­ple, when told New­comb’s Para­dox in an ex­per­i­ment, claim to choose to take only one box. If this is ac­cu­rate, then Omega can achieve a 70% level of ac­cu­racy by pre­dict­ing that ev­ery­body is a one-boxer. Even if 70% is not ac­cu­rate, you can still make the para­dox work by ad­just­ing the dol­lar amounts, as long as the bias is great enough that Omega can be con­fi­dent that it will show up at all in the records of its past pre­dic­tions. (To be fair, the pro­por­tion of two-box­ers will prob­a­bly rise as Omega’s ac­cu­racy falls, and chang­ing the stakes should also af­fect peo­ple’s choices; there may not be a fixed point, al­though I ex­pect that there is.)

If, in ad­di­tion to the prob­lem as stated (but with only 70% prob­a­bil­ity of suc­cess), I know that Omega always pre­dicts one-box­ing, then (hope­fully) ev­ery­body agrees that I should take both boxes. There needs to some cor­re­la­tion be­tween Omega’s pre­dic­tions and the ac­tual out­comes, not just a high pro­por­tion of past suc­cesses.

FAWS also writes:

You your­self claim to know what you would do in the box­ing experiment

Ac­tu­ally, I don’t re­ally want to make that claim. Although I’ve writ­ten things like ‘I would take both boxes’, I re­ally should have writ­ten ‘I should take both boxes’. I’m stat­ing a cor­rect de­ci­sion, not mak­ing a pre­dic­tion about my ac­tual ac­tions. Right now, I pre­dict about a 70% chance of two-box­ing given the situ­a­tion as stated in the origi­nal post, al­though I’ve never tried to calcu­late my es­ti­mates of prob­a­bil­ities, so who knows what that re­ally means. (H’m, 70% again? Nope, I don’t trust that cal­ibra­tion at all!)

FAWS writes el­se­where:

Mak­ing a choice be­tween two op­tions […] just means that at­tribut­ing the rea­son for your tak­ing what­ever op­tion you take is most use­fully at­tributed to you (and not e.g. grav­ity, gov­ern­ment, the per­son hold­ing a gun to you head etc.).

I don’t see what the gun has to do with it; this is a perfectly good prob­lem in de­ci­sion the­ory:

• Sup­pose that you have a but­ton that, if pressed, will trig­ger a bomb that kills two strangers on the other side of the world. I hold a gun to your head and threaten to shoot you if you don’t press the but­ton. Should you press it?

A per­son who presses the but­ton in that situ­a­tion can rea­son­ably say af­ter­wards ‘I had no choice! Toby held a gun to my head!’, but that doesn’t in­val­i­date the ques­tion. Such a per­son might even panic and make the ques­tion ir­rele­vant, but it’s still a good ques­tion.

If it is a fact about you that you will leave the choice up to chance then Omega prob­a­bly doesn’t offer you to take part in the first place.

So that’s how Omega gets such a good record! (^_^)

Un­der­stand­ing the ques­tion re­ally is im­por­tant. I’ve been in­ter­pret­ing it some­thing along these lines: you in­ter­rupt your nor­mal thought pro­cesses to go through a com­plete eval­u­a­tion of the situ­a­tion be­fore you, then see what you do. (This is ex­actly what you can­not do if you panic in the gun prob­lem above.) So per­haps we can pre­dict with cer­tain ac­cu­racy that an ut­ter bi­got will take one course of ac­tion, but that is not what the bi­got should do, nor is it what they will do if they dis­card their prej­u­dices and de­cide afresh.

Now that I think about it, I see some prob­lems with this in­ter­pre­ta­tion, and also some re­fine­ments that might fix it. (The first thing to do is to make it less de­pen­dent on the spe­cific per­son mak­ing the de­ci­sion.) But I’ll skip the re­fine­ments. It’s enough to no­tice that Omega might very well pre­dict that a per­son will not take the time to think things through, so there is poor cor­re­la­tion be­tween what one should do and what Omega will pre­dict, even though the de­ci­sion is based on what the world would be like if one did take the time.

I still think that (mod­ulo re­fine­ments) this is a good in­ter­pre­ta­tion of what most peo­ple would mean if they tell a story and then ask ‘What should this per­son do?’. (I can try to defend that claim if any­body still wants me to af­ter they finish this com­ment.) In that case, I stand by my de­ci­sion that one should take both boxes, at least if there is no good ev­i­dence of new physics.

How­ever, I now re­al­ise that there is an­other in­ter­pre­ta­tion, which is more prac­ti­cal, how­ever much the or­di­nary per­son might not in­ter­pret things this way. That is: sit down and think through the whole situ­a­tion now, long be­fore you are ever faced with it in real life, and de­cide what to do. One ob­vi­ous benefit of this is that when I hold a gun to your head, you won’t panic, be­cause you will be pre­pared. More gen­er­ally, this is what we are all ac­tu­ally do­ing right now! So as we make these idle philo­soph­i­cal mus­ings, let’s be prac­ti­cal, and de­cide what we’ll do if Omega ever offers us this deal.

In this case, I agree that I will be bet­ter off (given the ex­tremely un­likely but pos­si­ble as­sump­tion that I am ever in this situ­a­tion) if I have de­cided now to take only Box B. As RobinZ points out, I might change my mind later, but that can’t be helped (and to a cer­tain ex­tent shouldn’t be helped, since it’s best if I take two boxes af­ter Omega pre­dicts that I’ll only take one, but we can’t judge that ex­tent if Omega is smarter than us, so re­ally there’s no benefit to hold­ing back at all).

If Omega is fal­lible, then the value of one-box­ing falls dras­ti­cally, and even ad­just­ing the amount of money doesn’t help in the end; once Omega’s pro­por­tion of past suc­cess matches the ob­served pro­por­tion in ex­per­i­ments (or what­ever our best guess of the ac­tual pro­por­tion of real peo­ple is), then I’m back to two-box­ing, since I ex­pect that Omega sim­ply always pre­dicts one-box­ing.

In hind­sight, it’s ob­vi­ous that the the origi­nal post was about de­ci­sion in this sense, since Eliezer was talk­ing about an AI that mod­ifies its de­ci­sion pro­ce­dures in an­ti­ci­pa­tion of fac­ing Omega in the fu­ture. Similarly, we hu­mans mod­ify our de­ci­sion pro­ce­dures by mak­ing com­mit­ments and let­ting our­selves in­vent ra­tio­nal­i­sa­tions for them af­ter­wards (al­though the prob­lem with this is that it makes it hard to change our minds when we re­ceive new in­for­ma­tion). So ob­vi­ously Eliezer wants us to de­cide now (or at least well ahead of time) and use our leet Meth­ods of Ra­tion­al­ity to keep the ra­tio­nal­i­sa­tions in check.

So I hereby de­cide that I will pick only one box. (You hear that, Omega!?) Since I am hon­est (and strongly doubt that Omega ex­ists), I’ll add that I may very well change my mind if this ever re­ally hap­pens, but that’s about what I would do, not what I should do. And in a cer­tain sense, I should change my mind … then. But in an­other sense, I should (and do!) choose to be a one-boxer now.

(Thanks also to Car­lShul­man, whom I haven’t quoted, but whose com­ment was a big help in draw­ing my at­ten­tion to the differ­ent senses of ‘should’, even though I didn’t re­ally adopt his anal­y­sis of them.)

• If Omega is fal­lible, then the value of one-box­ing falls dras­ti­cally, and even ad­just­ing the amount of money doesn’t help in the end;

As­sume Omega has a prob­a­bil­ity X of cor­rectly pre­dict­ing your de­ci­sion:

If you choose to two-box:

• X chance of get­ting \$1000

• (1-X) chance of get­ting \$1,001,000

If you choose to take box B only:

• X chance of get­ting \$1,000,000

• (1-X) chance of get­ting \$0

Your ex­pected util­ities for two-box­ing and one-box­ing are (re­spec­tively):

E2 = 1000X + (1-X)1001000
E1 = 1000000X

For E2 > E1, we must have 1000X + 1,001,000 − 1,001,000X − 1,000,000X > 0, or 1,001,000 > 2,000,000X, or

X < 0.5005

So as long as Omega can main­tain a greater than 50% ac­cu­racy, you should ex­pect to earn more money by one-box­ing. Since the solu­tion seems so sim­ple, and since I’m a to­tal novice at de­ci­sion the­ory, it’s pos­si­ble I’m miss­ing some­thing here, so please let me know.

• Wait—we can’t as­sume that the prob­a­bil­ity of be­ing cor­rect is the same for two-box­ing and one-box­ing. Sup­pose Omega has a prob­a­bil­ity X of pre­dict­ing one when you choose one and Y of pre­dict­ing one when you choose two.

``````E1 = E(\$1 000 000) * X
E2 = E(\$1 000) + E(\$1 000 000) * Y
``````

The spe­cial case you list cor­re­sponds to Y = 1 - X, but in the gen­eral case, we can de­rive that E1 > E2 implies

``````X > Y + E(\$1 000) /​ E(\$1 000 000)
``````

If we as­sume lin­ear util­ity in wealth, this cor­re­sponds to a differ­ence of 0.001. If, al­ter­nately, we choose a me­dian net wealth of \$93 100 (the U.S. figure) and use log-wealth as the mea­sure of util­ity, the re­quired differ­ence in­creases to 0.004 or so. Either way, un­less you’re dead broke (e.g. net wealth \$1), you had bet­ter be ex­tremely con­fi­dent that you can fool the in­ter­roga­tor be­fore you two-box.

• So as long as Omega can main­tain a greater than 50% ac­cu­racy, you should ex­pect to earn more money by one-box­ing. Since the solu­tion seems so sim­ple, and since I’m a to­tal novice at de­ci­sion the­ory, it’s pos­si­ble I’m miss­ing some­thing here, so please let me know.

Your ca­clu­la­tion is fine. What you’re miss­ing is that Omega has a record of 70% ac­cu­racy be­cause Omega always pre­dicts that a per­son will one-box and 70% of peo­ple are one-box­ers. So Omega always puts the mil­lion dol­lars in Box B, and I will always get \$1,001,000\$ if I’m one of the 30% of peo­ple who two-box.

At least, that is a pos­si­bil­ity, which your calcu­la­tion doesn’t take into ac­count. I need ev­i­dence of a cor­re­la­tion be­tween Omega’s pre­dic­tions and the par­ti­ci­pants’ ac­tual be­havi­our, not just ev­i­dence of cor­rect pre­dic­tions. My prior prob­a­bil­ity dis­tri­bu­tion for how of­ten peo­ple one-box isn’t even con­cen­trated very tightly around 70% (which is just a num­ber that I re­mem­ber read­ing once as the re­sult of one sur­vey), so any­thing short of a long run of pre­dic­tions with very high pro­por­tion of cor­rect ones will make me sus­pect that Omega is pul­ling a trick like this.

So the prob­lem is much cleaner as Eliezer states it, with a perfect record. (But if even that record is short, I won’t buy it.)

• Oops, I see that RobinZ already replied, and with calcu­la­tions. This shows that I should still re­move the word ‘dras­ti­cally’ from the bit that nhamann quoted.

• You un­der­es­ti­mate the mean­ing of su­per­in­tel­li­gence. One way of defin­ing a su­per­in­tel­li­gence that wins at New­comb with­out vi­o­lat­ing causal­ity, is to as­sume that the uni­verse is com­puter simu­la­tion like, such that it can be defined by a set of phys­i­cal laws and a very long string of ran­dom num­bers. If Omega knows the laws and ran­dom num­bers that define the uni­verse, shouldn’t Omega be able to pre­dict your ac­tions with 100% ac­cu­racy? And then wouldn’t you want to choose the ac­tion that re­sults in you win­ning a lot more money?

• So part of the defi­ni­tion of a su­per­in­tel­li­gence is that the uni­verse is like that and Omega knows all that? In other words, if I have con­vinc­ing ev­i­dence that Omega is su­per­in­tel­li­gent, then I must have con­vinc­ing ev­i­dence that the uni­verse is a com­puter simu­la­tion, etc? Then that changes things; just as the Se­cond Law of Ther­mo­dy­nam­ics doesn’t ap­ply to Maxwell’s De­mon, so the law of for­ward causal­ity (which is ac­tu­ally a con­se­quence of the Se­cond Law, un­der the as­sump­tion of no time travel) doesn’t ap­ply to a su­per­in­tel­li­gence. So yes, then I would pick only Box B.

This just goes to show how im­por­tant it is to un­der­stand ex­actly what the prob­lem states.

• The com­puter simu­la­tion as­sump­tion isn’t nec­es­sary, the only thing that mat­ters is that Omega is tran­scen­den­tally in­tel­li­gent, and it has all the tech­nol­ogy that you might imag­ine a post-Sin­gu­lar­ity in­tel­li­gence might have (we’re talk­ing Shock Level 4). So Omega scans your brain by us­ing some tech­nol­ogy that is effec­tively in­dis­t­in­guish­able from magic, and we’re left to as­sume that it can pre­dict, to a very high de­gree of ac­cu­racy, whether you’re the type of per­son who would take one or two boxes.

Omega doesn’t have to ac­tu­ally simu­late your un­der­ly­ing physics, it just needs a highly ac­cu­rate model, which seems rea­son­ably easy to achieve for a su­per­in­tel­li­gence.

• If its model is good enough that it vi­o­lates the Se­cond Law as we un­der­stand it, fine, I’ll pick only Box B, but I don’t see any­thing in the prob­lem state­ment that im­plies this. The only ev­i­dence that I’m given is that it’s made a run of perfect pre­dic­tions (of un­known length!), is smarter than us, and is from very far away. That’s not enough for new physics.

And just hav­ing a re­ally good simu­la­tion of my brain, of the sort that we could imag­ine do­ing us­ing known physics but just don’t have the tech­ni­cal ca­pac­ity for, is definitely not good enough. That makes the prob­a­bil­ity that I’ll act as pre­dicted very high, but I’ll still come out worse if, af­ter the boxes have been set, I’m un­lucky enough to only pick Box B any­way (or come out bet­ter if I’m lucky enough to pick both boxes any­way, if Omega pegs me for a one-boxer).

• If its model is good enough that it vi­o­lates the Se­cond Law as we un­der­stand it [...]

It doesn’t have to be even re­motely close to good enough to that for the sce­nario. I’d bet a suffi­ciently good hu­man psy­chol­o­gist could take omega’s role and get it 90%+ right if he tests and in­ter­views the peo­ple ex­ten­sively first (with­out them know­ing the pur­pose) and gets to ex­clude peo­ple he is un­sure about. A su­per in­tel­li­gent be­ing should be far, far bet­ter at this.

You your­self claim to know what you would do in the box­ing ex­per­i­ment, and you are an agent limited by con­ven­tional physics. There is no phys­i­cal law that for­bids an­other agent from know­ing you as well as (or even bet­ter than) you know your­self.

You’ll have to ex­plain why you think 99.99% (or what­ever) is not good enough, a 0.01% chance to win \$ 1000 shouldn’t make up for a 99.99% chance of los­ing \$999,000.

• I re­ally don’t see what the prob­lem is. Clearly, the be­ing has “read your mind” and knows what you will do. If you are of the opinion to take both boxes, he knows that from his mind scan, and you are play­ing right into his hands.

Ob­vi­ously, your de­ci­sion can­not af­fect the out­come be­cause it’s already been de­cided what’s in the box, but your BRAIN af­fected what he put in the box.

It’s like me hand­ing you an opaque box and tel­ling you there is \$1 mil­lion in it if and only if you go and com­mit mur­der. Then, you open the box and find it empty. I then offer Han­ni­bal Lecter the same deal, he com­mits mur­der, and then opens the box and finds \$1 mil­lion. Amaz­ing? I don’t think so. I was sim­ply able to cre­ate an ac­cu­rate psy­cholog­i­cal pro­file of the two of you.

• The ques­tion is how to cre­ate a for­mal de­ci­sion al­gorithm that will be able to un­der­stand the prob­lem and give the right an­swer (with­out failing on other such tests). Of course you can solve it cor­rectly if you are not yet poi­soned by too much pre­sump­tu­ous philos­o­phy.

• I guess I’m miss­ing some­thing ob­vi­ous. The prob­lem seems very sim­ple, even for an AI.

The way the prob­lem is usu­ally defined (omega re­ally is om­ni­scient, he’s not fool­ing you around, etc.) there are only two solu­tions:

• You take the two boxes, and Omega had already pre­dicted that, mean­ing that there is noth­ing in Box B—you win 1000\$

• You take box B only, and Omega had already pre­dicted that, mean­ing that there is 1M\$ in box B—you win 1M\$.

That’s it. Pe­riod. Noth­ing else. Nada. Rien. Nichts. Sod all. Th­ese are the only two pos­si­ble op­tions (again, as­sum­ing the hy­pothe­ses are true). The de­ci­sion to take box B only is a sim­ple out­come com­par­i­son. It is a perfectly ra­tio­nal de­ci­sion (if you ac­cept the premises of the game).

Now the way Eliezer states it is differ­ent from the usual for­mu­la­tion. In Eliezer’s ver­sion, you can­not be sure about Omega’s ab­solute ac­cu­racy. All you know is his pre­vi­ous record. That does com­pli­cate things, if only be­cause you might be the vic­tim of a scam (e.g. like the well-known trick to con­vince come­one that you can con­sis­tently pre­dict the win­ning horse in a 2-horse race—sim­ply start with 2^N peo­ple, always give a differ­ent pre­dic­tion to each half of them, dis­card those to whom you gave the wrong one, etc.)

At any rate, the other two out­comes that were im­pos­si­ble in the pre­vi­ous ver­sion (in­volv­ing mis-pre­dic­tion by Omega) are now pos­si­ble, with a cer­tain prob­a­bil­ity that you need to some­how as­cer­tain. That may be difficult, but I don’t see any log­i­cal para­dox.

For ex­am­ple, if this hap­pened in the real world, you might rea­son that the prob­a­bil­ity that you are be­ing scammed is over­whelming in re­gard to the prob­a­bil­ity of ex­is­tence of a truly om­ni­scient pre­dic­tor. This is a rea­son­able in­fer­ence from the fact that we hear about scams ev­ery day, but no­body has ever re­ported such an om­ni­scient pre­dic­tor. So you would take both boxes and en­joy your ex­pected \$1000+ep­silon (Omega may have been sincere but de­luded, lucky in the pre­vi­ous 100 tri­als, and wrong in this one).

In the end, the guy who would win most (in ex­pected value!) would not be the “least ra­tio­nal”, but sim­ply the one who made the best es­ti­mates for the prob­a­bil­ites of each out­come, based on his own knowl­edge of the uni­verse (if you have a di­rect phone line to the An­gel Gabriel, you will clearly do bet­ter).

What is the part that would be con­cep­tu­ally (as op­posed to tech­ni­cally/​prac­ti­cally) difficult for an al­gorithm?

• I’ve come around to the ma­jor­ity view­point on the alien/​Omega prob­lem. It seems to be eas­ier to think about when you pin it down a bit more math­e­mat­i­cally.

Let’s sup­pose the alien de­ter­mines the prob­a­bil­ity of me one-box­ing is p. For the sake of sim­plic­ity, let’s as­sume he then puts the 1M into one of the boxes with this prob­a­bil­ity p. (In the­ory he could do it when­ever p ex­ceeded some thresh-hold, but this just com­pli­cates the math.)

There­fore, once I en­counter the situ­a­tion, there are two pos­si­ble states:

a) with prob­a­bil­ity p there is 1M in one box, and 1k in the other

b) with prob­a­bil­ity 1-p there is 0 in one box, and 1k in the other So:

the ex­pected re­turn of two-box­ing is p(1M+1k)+(1-p)1k = 1Mp + 1kp + 1k − 1kp = 1Mp + 1k

the ex­pected re­turn of one-box­ing is 1Mp

If the act of choos­ing af­fects the prior de­ter­mi­na­tion p, then the ex­pected re­turn calcu­la­tion differs de­pend­ing on my choice:

If I choose to two-box, then p=~0, and I get about 1k on average

If I choose to one-box, then p=~1, and I get about 1M on average

In this case, the ex­pected re­turn is higher by one-box­ing.

If choos­ing the box does not af­fect p, then p is the same in both ex­pected re­turn calcu­la­tions. In this case, two box­ing clearly has bet­ter ex­pected re­turn than one-box­ing.

Of course if the de­ter­mi­na­tion of p is effected by the choice ac­tu­ally made in the fu­ture, you have a situ­a­tion with re­verse-time causal­ity.

If I know that I am go­ing to en­counter this kind of prob­lem, and it is some­how pos­si­ble to pre-com­mit to one box­ing be­fore the alien de­ter­mines the prob­a­bil­ity p of me do­ing so, that cer­tainly makes sense. But it is difficult to see why I would main­tain that com­mit­ment when the choice ac­tu­ally pre­sents it­self, un­less I ac­tu­ally be­lieve this choice effects p, which, again, im­plies re­verse-time causal­ity.

It seems the prob­lem has been setup in a de­liber­ately con­fus­ing man­ner. It is as if the alien has just de­cided to find peo­ple who are ir­ra­tional and pay them 1M for it. The prob­lem seems to en­courage ir­ra­tional think­ing, maybe be­cause we want to be­lieve that ra­tio­nal peo­ple always win, when of course one can set up a fairly ab­surd situ­a­tion so that they do not.

• Cross-post­ing from Less Wrong, I think there’s a gen­er­al­ized Rus­sell’s Para­dox prob­lem with this the­ory of ra­tio­nal­ity:

I don’t think I buy this for New­comb-like prob­lems. Con­sider Omega who says, “There will be \$1M in Box B IFF you are ir­ra­tional.”

Ra­tion­al­ity as win­ning is prob­a­bly sub­ject to a whole fam­ily of Rus­sell’s-Para­dox-type prob­lems like that. I sup­pose I’m not sure there’s a bet­ter no­tion of ra­tio­nal­ity.

• There are two ways of think­ing about the prob­lem.

1. You see the prob­lem as de­ci­sion the­o­rist, and see a con­flict be­tween the ex­pected util­ity recom­men­da­tion and the dom­i­nance prin­ci­ple. Peo­ple who have seen the prob­lem this way have been led into var­i­ous forms of causal de­ci­sion the­ory.

2. You see the prob­lem as game the­o­rist, and are try­ing to figure out the pre­dic­tor’s util­ity func­tion, what points are fo­cal and why. Peo­ple who have seen the prob­lem this way have been led into var­i­ous dis­cus­sions of tacit co­or­di­na­tion.

New­comb’s sce­nario is a para­dox, not meant to be solved, but rather ex­plored in differ­ent di­rec­tions. In its origi­nal form, much like the Monty Hall prob­lem, New­comb’s sce­nario is not well stated to give rise to prob­lem with a calcu­lated solu­tion.

This is not a crit­i­cism of the prob­lem, in­deed it is an in­ge­nious lit­tle puz­zle.

And there is much to learn from well defined New­comb like prob­lems.

• In my mo­ti­va­tions and in my de­ci­sion the­ory, dy­namic in­con­sis­tency is Always Wrong. Among other things, it always im­plies an agent un­sta­ble un­der re­flec­tion.

If you re­ally want to im­press an in­spec­tor who can see your in­ter­nal state, by al­ter­ing your util­ity func­tion to con­form to their wishes, then one strat­egy would be to cre­ate a trusted ex­ter­nal “brain sur­geon” agent with the keys to your util­ity func­tion to change it back again af­ter your util­ity func­tion has been in­spected—and then for­get all about the ex­is­tence of the sur­geon.

The in­spec­tor will be able to see the lock on your util­ity func­tion—but those are pretty stan­dard is­sue.

• Yes, but when I tried to write it up, I re­al­ized that I was start­ing to write a small book. And it wasn’t the most im­por­tant book I had to write, so I shelved it. My slow writ­ing speed re­ally is the bane of my ex­is­tence. The the­ory I worked out seems, to me, to have many nice prop­er­ties be­sides be­ing well-suited to New­comblike prob­lems. It would make a nice PhD the­sis, if I could get some­one to ac­cept it as my PhD the­sis. But that’s pretty much what it would take to make me un­shelve the pro­ject. Other­wise I can’t jus­tify the time ex­pen­di­ture, not at the speed I cur­rently write books.

If you have a solu­tion to New­comb’s Prob­lem, but don’t have the time to work on it, is there any chance you will post a sketch of your solu­tion for other peo­ple to in­ves­ti­gate and/​or de­velop?

• If the alien is able to pre­dict your de­ci­sion, it fol­lows that your de­ci­sion is a func­tion of your state at the time the alien an­a­lyzes you. Then, there is no mean­ingful ques­tion of “what should you do?” Either you are in a uni­verse in which you are dis­posed to choose the one box AND the alien has placed the mil­lion dol­lars, or you are in a uni­verse in which you are dis­posed to take both boxes AND the alien has placed noth­ing. If the former, you will have the sub­jec­tive ex­pe­rience of “de­cid­ing to take the one box”, which is it­self a de­ter­minis­tic pro­cess that feels like a free choice, and you will find the mil­lion. If the lat­ter, you will have the sub­jec­tive ex­pe­rience of “de­cid­ing to take both boxes”, and you will find noth­ing in the opaque box.

In short, the fram­ing of the prob­lem im­plies that your de­ci­sion-mak­ing pro­cess is de­ter­minis­tic (which does not pre­clude it be­ing a pro­cess that you are con­scious of par­ti­ci­pat­ing in), and the figu­ra­tive no­tion of “free will” does not in­clude literal de­grees of free­dom. If you must in­sist on view­ing it as a ques­tion of what the cor­rect ac­tion is, it’s to take the one box. Re­gard­less of your mo­ti­va­tion, even if your rea­son for do­ing so is this ar­gu­ment, you will find your­self in a uni­verse in which events (in­clud­ing thought events) have led you to take one box, and these are the same uni­verses in which the alien places a mil­lion dol­lars in the box.

• I’m a con­vinced two-boxer, but I’ll try to put my ar­gu­ment with­out any bias. It seems to me the way this prob­lem has been put has been an at­tempt to rig it for the one box­ers. When we talk about “pre­com­mit­ment” it is sug­gested the sub­ject has an ad­vance knowl­edge of Omega and what is to hap­pen. The way I thought the para­dox worked, was that Omega would scan/​an­a­lyze a per­son and make its pre­dic­tion, all be­fore the per­son ever heard of the dilemna. There­fore, a per­son has no way to de­velop an in­ten­tion of be­ing a one-boxer or a two-boxer that in any way af­fects Omega’s pre­dic­tion. For the Irene/​Rachel situ­a­tion, there is no way to ever “pre­com­mit;” the sub­ject never gets to play Omega’s game again and Omega scans their brains be­fore they ever heard of him. (So imag­ine you only had one shot at play­ing Omega’s game, and Omega made its pre­dic­tion be­fore you ever came to this web­site or any­where else and heard about New­comb’s para­dox. Then that already de­cides what it puts in the boxes.)

Se­condly, I think a re­quire­ment of the prob­lem is that your choice, at the time of ac­tu­ally tak­ing the box(es), can­not effect what’s in the box. What we have here are two com­pletely differ­ent prob­lems; if in any way Omega or your choice in­for­ma­tion can travel back in time to change the con­tents of the box, the choice is triv­ial. So yes, Omega may have cho­sen to dis­crim­i­nate against ra­tio­nal peo­ple and award ir­ra­tional ones; the point is, there is ab­solutely noth­ing we can do about it (nei­ther in pre­com­mit­ment or at the ac­tual time to choose).

To clar­ify why I think two-box­ing is the right choice, I would pro­pose a real life ex­per­i­ment. Let’s say we de­vel­oped a sur­vey, which, by ask­ing peo­ple var­i­ous ques­tions about logic or the para­nor­mal etc..., we use to clas­sify them into one-box­ers or two-box­ers. The crux of the setup is, all the vol­un­teers we take have never heard of the New­comb Para­dox; we make up any rea­son we want for them to take the sur­vey. THEN, hav­ing already placed money or no money in box B, we give them the story about Omega and let them make the choice. Hy­po­thet­i­cally, our sur­vey could be 100% ac­cu­rate; even if not it may be very ac­cu­rate such that many of our pre­dicted one-box­ers will be glad to find their choice re­warded. In essence, they can­not “pre­com­mit” and their choice won’t mag­i­cally change the con­tents of the box (based on a hu­man sur­vey). They also can­not go back and con­vince them­selves to cheat on our sur­vey—it’s im­pos­si­ble—and that is how Omega is sup­posed to op­er­ate. The point is, from the ex­per­i­men­tal point of view, ev­ery sin­gle per­son would make more from tak­ing both boxes, be­cause at the time of choice there’s always the ex­tra \$1000 in box A.

• The key point you’ve missed in your anal­y­sis, how­ever, is that Omega is al­most always cor­rect in his pre­dic­tions.

It doesn’t mat­ter how Omega does it—that is a sep­a­rate prob­lem. You don’t have enough in­for­ma­tion about his pro­cess of pre­dic­tion to make any ra­tio­nal judg­ment about it ex­cept for the fact that it is a very, very good pro­cess. Brain scans, re­versed causal­ity, time travel, none of those ideas mat­ter. In the para­dox as origi­nally posed, all you have are guesses about how he may have done it, and you would be an ut­ter fool to give higher weight to those guesses than to the fact that Omega is always right.

The if ob­ser­va­tions (that Omega is always right) dis­agree with the­ory (that Omega can­not pos­si­bly be right), it is the the­ory that is wrong, ev­ery time.

Thus the ra­tio­nal agent should, in this situ­a­tion, give ex­tremely low weight to his un­der­stand­ing of the way the uni­verse works, since it is ob­vi­ously flawed (the ex­is­tence of a perfect pre­dic­tor proves this). The ques­tion re­ally comes down to 100% chance of get­ting \$1000 plus a nearly 0% chance of get­ting \$1.01 mil­lion, vs nearly 100% chance of get­ting \$1 mil­lion.

What re­ally blows my mind about mak­ing the 2-box choice is that you can sig­nifi­cantly re­duce Omega’s abil­ity to pre­dict the out­come, and un­less you are ab­solutely des­per­ate for that \$1000* the 2-box choice doesn’t be­come su­pe­rior un­til Omega is only roughly 50% ac­cu­rate (at 50.1% the out­come equal­izes). Only then do you ex­pect to get more money, on av­er­age, by choos­ing both boxes.

In other words, if you think Omega is do­ing any­thing but flip­ping a coin to de­ter­mine the con­tents of box B, you are bet­ter off choos­ing box B.

*I could see the value of \$1000 ris­ing sig­nifi­cantly if, for ex­am­ple, a man is hold­ing a gun to your head and will kill you in two min­utes if you don’t give him \$1000. In this case, any un­cer­tainty of Omega’s abil­ities are over­shad­owed by the cer­tainty of the \$1000. This in­verts if the man with the gun is de­mand­ing more than \$1000 - mak­ing the 2-box choice a non-op­tion.

• It is not pos­si­ble for an agent to make a ra­tio­nal choice be­tween 1 or 2 boxes if the agent and Omega can both be simu­lated by Tur­ing ma­chines. Proof: Omega pre­dicts the agent’s de­ci­sion by simu­lat­ing it. This re­quires Omega to have greater al­gorith­mic com­plex­ity than the agent (in­clud­ing the nonzero com­plex­ity of the com­piler or in­ter­preter). But a ra­tio­nal choice by the agent re­quires that it simu­late Omega, which re­quires that the agent have greater al­gorith­mic com­plex­ity in­stead.

In other words, the agent X, with com­plex­ity K(X), must model Omega which has com­plex­ity K(X + “put \$1 mil­lion in box B if X does not take box A”), which is slightly greater than K(X).

In the frame­work of the ideal ra­tio­nal agent in AIXI, the agent guesses that Omega is the short­est pro­gram con­sis­tent with the ob­served in­ter­ac­tion so far. But it can never guess Omega be­cause its com­plex­ity is greater than that of the agent. Since AIXI is op­ti­mal, no other agent can make a ra­tio­nal choice ei­ther.

As an aside, this is also a won­der­ful demon­stra­tion of the illu­sion of free will.

• But a ra­tio­nal choice by the agent re­quires that it simu­late Omega

Not so. I don’t need to simu­late a hun­gry tiger in or­der to stay safely (and ra­tio­nally) away from it, even though I don’t know the ex­act meth­ods by which its brain will iden­tify me as a tasty treat. If you think that one can’t “ra­tio­nally” stay away from hun­gry tigers, then we’re us­ing the word “ra­tio­nally” vastly differ­ently.

• Um, AIXI is not com­putable. Re­lat­edly, K(AIXI) is un­defined, as AIXI is not a finite ob­ject.

Also, A can simu­late B, even when K(B)>K(A). For ex­am­ple, one could eas­ily define a com­puter pro­gram which, given suffi­cient com­put­ing re­sources, simu­lates all Tur­ing ma­chines on all in­puts. This must ob­vi­ously in­clude those with much higher Kol­mogorov com­plex­ity.

Yes, you run into is­sues of two Tur­ing ma­chines/​agents/​what­ever simu­lat­ing each other. (You could also get this from the re­cur­sion the­o­rem.) What hap­pens then? Sim­ple: nei­ther simu­la­tion ever halts.

• The premise is that a ra­tio­nal agent would start out con­vinced that this story about the alien that knows in ad­vance what they’ll de­cide ap­pears to be false.

The Kolo­mogorov com­plex­ity of the story about the alien is very large be­cause we have to hy­poth­e­size some mechanism by which it can ex­trap­o­late the con­tents of minds. Even if I saw the alien land a mil­lion times and watched the box-pick­ing con­nect with the box con­tents as they’re sup­posed to, it is sim­pler to as­sume that the boxes are some stage magic trick, or even that they are an ex­cep­tion to the usual laws of physics.

Once we’ve done enough ex­per­i­ments that we’re forced into the hy­poth­e­sis that the boxes are an ex­cep­tion to the usual laws of physics, it’s pretty clear what to do. The ob­vi­ous re­vised laws of physics based on the new ob­ser­va­tions make it clear that one should choose just one box.

So a ra­tio­nal agent would do the right thing, but only be­cause there’s no way to get it to be­lieve the back­story.

• To me, the de­ci­sion is very easy. Omega ob­vi­ously pos­sesses more pre­science about my box-tak­ing de­ci­sion than I do my­self. He’s been able to guess cor­rect in the past, so I’d see no rea­son to doubt him with my­self. With that in mind, the ob­vi­ous choice is to take box B.

If Omega is so nearly always cor­rect, then de­ter­minism is shown to ex­ist (at least to some ex­tent). That be­ing the case, causal­ity would be noth­ing but an illu­sion. So I’d see no prob­lem with it work­ing in “re­verse”.

• In ar­gu­ing for the sin­gle box, Yud­kowsky has made an as­sump­tion that I dis­agree with: at the very end, he changes the stakes and de­clares that your choice should still be the same.

My way of look­ing at it is similar to what Hen­drik Boom has said. You have a choice be­tween bet­ting on Omega be­ing right and bet­ting on Omega be­ing wrong.

A = Con­tents of box A

B = What may be in box B (if it isn’t empty)

A is yours, in the sense that you can take it and do what­ever you want with it. One thing you can do with A is pay it for a chance to win B if Omega is right. Your other op­tion is to pay noth­ing for a chance to win B if Omega is wrong.

Then just make your bet based on what you know about Omega. As stated, we only know his track record over 100 at­tempts, so use that. Don’t worry about the na­ture of causal­ity or whether he might be scan­ning your brain. We don’t know those things.

If you do it that way, you’ll prob­a­bly find that your an­swer de­pends on A and B as well as Omega’s track record.

I’d prob­a­bly put Omega at around 99%, as Hen­drik did. Keep­ing A at a thou­sand dol­lars, I’d one-box if B were a mil­lion dol­lars or if B were some­thing I needed to save my life. But I’d two-box if B were a thou­sand dol­lars and one cent.

So I think chang­ing A and B and declar­ing that your strat­egy must stay the same is in­valid.

• There is a big differ­ence be­tween hav­ing time in­con­sis­tent prefer­ences, and time in­con­sis­tent strate­gies be­cause of the strate­gic in­cen­tives of the game you are play­ing.

I can see why a hu­man would have time-in­con­sis­tent strate­gies—be­cause of in­con­sis­tent prefer­ences be­tween their past and fu­ture self, hy­per­bolic dis­count­ing func­tions, that sort of thing. I am quite at a loss to un­der­stand why an agent with a con­stant, ex­ter­nal util­ity func­tion should ex­pe­rience in­con­sis­tent strate­gies un­der any cir­cum­stance, re­gard­less of strate­gic in­cen­tives. Ex­pected util­ity lets us add up con­flict­ing in­cen­tives and re­duce to a sin­gle prefer­ence: a mul­ti­plic­ity of strate­gic in­cen­tives is not an ex­cuse for in­con­sis­tency.

I am a Bayesian; I don’t be­lieve in prob­a­bil­ity calcu­la­tions that come out differ­ent ways when you do them us­ing differ­ent valid deriva­tions. Why should I be­lieve in de­ci­sional calcu­la­tions that come out in differ­ent ways at differ­ent times?

I’m not sure that even a causal de­ci­sion the­o­rist would agree with you about strate­gic in­con­sis­tency be­ing okay—they would just in­sist that there is an im­por­tant differ­ence be­tween de­cid­ing to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am, be­cause in the former case you cause Omega’s ac­tion while in the lat­ter case you do not. In other words, they would in­sist the two situ­a­tions are im­por­tantly differ­ent, not that time in­con­sis­tency is okay.

And I ob­serve again that a self-mod­ify­ing AI which finds it­self with time-in­con­sis­tent prefer­ences, strate­gies, what-have-you, will not stay in this situ­a­tion for long—it’s not a world I can live in, pro­fes­sion­ally speak­ing.

Try­ing to find a set of prefer­ences that avoids all strate­gic con­flicts be­tween your differ­ent ac­tions seems a fool’s er­rand.

I guess I com­pleted the fool’s er­rand, then...

Do you at least agree that self-mod­ify­ing AIs tend not to con­tain time-in­con­sis­tent strate­gies for very long?

• After you’ve spent some time work­ing in the frame­work of a de­ci­sion the­ory where dy­namic in­con­sis­ten­cies nat­u­rally Don’t Hap­pen—not be­cause there’s an ex­tra clause for­bid­ding them, but be­cause the sim­ple foun­da­tions just don’t give rise to them—then an in­tertem­po­ral prefer­ence re­ver­sal starts look­ing like just an­other prefer­ence re­ver­sal.

… Roughly, self-mod­ify­ing ca­pa­bil­ity in a clas­si­cal causal de­ci­sion the­o­rist doesn’t fix the prob­lem that gives rise to the in­tertem­po­ral prefer­ence re­ver­sals, it just makes one tem­po­ral self win out over all the oth­ers.

This is a gen­uine con­cern. Note that most in­stances of pre­com­mit­ment arise quite nat­u­rally due to rep­u­ta­tional con­cerns: any agent which is com­plex enough to come up with the con­cept of rep­u­ta­tion will make su­perfi­cially ir­ra­tional (“hawk­ish”) choices in or­der not to be pushed around in the fu­ture. More­over, pre­com­mit­ment is only worth­while if it can be ac­cu­rately as­sessed by the coun­ter­party: an agent will not want to “gen­er­ally mod­ify its fu­ture self … to do what its past self would have wished” un­less it can gain a rep­u­ta­tional ad­van­tage by do­ing so.

• There is a big differ­ence be­tween hav­ing time in­con­sis­tent prefer­ences, and time in­con­sis­tent strate­gies be­cause of the strate­gic in­cen­tives of the game you are play­ing. Try­ing to find a set of prefer­ences that avoids all strate­gic con­flicts be­tween your differ­ent ac­tions seems a fool’s er­rand.

• Be care­ful of this sort of ar­gu­ment, any time you find your­self defin­ing the “win­ner” as some­one other than the agent who is cur­rently smil­ing from on top of a gi­ant heap.

This made me laugh. Well said!

There’s only one ques­tion about this sce­nario for me—is it pos­si­ble for a suffi­ciently in­tel­li­gent be­ing to fully, fully model an in­di­vi­d­ual hu­man brain? If so, (and I think it’s tough to ar­gue ‘no’ un­less you think there’s a se­ri­ous glass ceiling for in­tel­li­gence) choose box B. If you try and sec­ond-guess (or, hell, googolth-guess) Omega, you’re tak­ing the risk that Omega is not smart enough to have mod­el­led your con­scious­ness suffi­ciently well. How big is this risk? 100 times out of 100 speaks for it­self. Omega is clev­erer than we can un­der­stand. Box B.

(Time travel? No thanks. I find the prob­a­bil­ity that Omega is simu­lat­ing peo­ple’s minds a hell of a lot more likely than that he’s time trav­el­ling, de­stroy­ing the uni­verse etc. And even if he were, Box B!)

If you can have your brain mod­el­led ex­actly—to the point where there is an iden­ti­cal simu­la­tion of your en­tire con­scious mind and what it per­ceives—then a lot of weird stuff can go on. How­ever, none of it will vi­o­late causal­ity. (Quan­tum effects mess­ing up the simu­la­tion or chang­ing the origi­nal? I guess if the model could be reg­u­larly up­dated based on the origi­nal...but I don’t know what I’m talk­ing about now ;) )

• Un­known: your last ques­tion high­lights the prob­lem with your rea­son­ing. It’s idle to ask whether I’d go and jump off a cliff if I found my fu­ture were de­ter­mined. What does that ques­tion even mean?

Put a differ­ent way, why should we ask an “ought” ques­tion about events that are de­ter­mined? If A will do X whether or not it is the case that a ra­tio­nal per­son will do X, why do we care whether or not it is the case that a ra­tio­nal per­son will do X? I sub­mit that we care about ra­tio­nal­ity be­cause we be­lieve it’ll give us trac­tion on our prob­lem of de­cid­ing what to do. So as­sum­ing fatal­ism (which is what we must do if the AI knows what we’re go­ing to do, perfectly, in ad­vance) de­mo­ti­vates ra­tio­nal­ity.

Here’s the ul­ti­mate prob­lem: our in­tu­itions about these sorts of ques­tions don’t work, be­cause they’re fun­da­men­tally rooted in our self-un­der­stand­ing as agents. It’s re­ally, re­ally hard for us to say sen­si­ble things about what it might mean to make a “de­ci­sion” in a de­ter­minis­tic uni­verse, or to un­der­stand what that im­plies. That’s why New­comb’s prob­lem is a prob­lem—be­cause we have nor­ma­tive prin­ci­ples of ra­tio­nal­ity that make sense only when we as­sume that it mat­ters whether or not we fol­low them, and we don’t re­ally know what it would mean to mat­ter with­out causal lev­er­age.

(There’s a rea­son free will is one of Kant’s an­ti­monies of rea­son. I’ve been mean­ing to write a post about tran­scen­den­tal ar­gu­ments and the limits of ra­tio­nal­ity for a while now… it’ll hap­pen one of these days. But in a nut­shell… I just don’t think our brains work when it comes down to com­pre­hend­ing what a de­ter­minis­tic uni­verse looks like on some level other than just solv­ing equa­tions. And note that this might make evolu­tion­ary sense—a crea­ture who gets the best re­sults through a [de­ter­mined] causal chain that in­cludes ra­tio­nal­ity is go­ing to be se­lected for the be­liefs that make it eas­iest to use ra­tio­nal­ity, in­clud­ing the be­lief that it makes a differ­ence.)

• In re­al­ity, ei­ther I am go­ing to take one box or two. So when the two-boxer says, “If I take one box, I’ll get amount x,” and “If I take two boxes, I’ll get amount x+1000,” one of these state­ments is ob­jec­tively coun­ter­fac­tual. Let’s sup­pose he is go­ing to in fact take both boxes. Then his sec­ond take­ment is fac­tual and his first state­ment coun­ter­fac­tual. Then his two state­ments are:

1)Although I am not in fact go­ing to take only one box, were I to take only box, I would get amount x, namely the amount that would be in the box.

2)I am in fact go­ing to take both boxes, and so I will get amount x+1000, namely 1000 more than how much is in fact in the other box.

From this it is ob­vi­ous that x in the two state­ments has a differ­ent value, and so his con­clu­sion that he will get more if he takes both boxes is false. For x has the value 1,000,000 in the first case, and 0 in the sec­ond. He mis­tak­enly as­sumes it has the same value in the two cases.

Like­wise, when the two-boxer says to the one boxer, “If you had taken both boxes, you would have got­ten more,” his state­ment is coun­ter­fac­tual and false. For if the one-boxer had been a two boxer, there origi­nally would have been noth­ing in the other box, and so he would have got­ten only \$1000 in­stead of \$1,000,000.

• What if there was an as­ter­oid rush­ing to­ward Earth, and box A con­tained an as­ter­oid deflec­tor that worked 10% of the time, and box B might con­tain an as­ter­oid deflec­tor that worked 100% of the time?

I’d change that to 95%, be­cause if B con­tains a 100% deflec­tor, A adds noth­ing and there’s no dilemma.

• I two-box.

Three days later, “Omega” ap­pears in the sky and makes an an­nounce­ment. “Greet­ing earth­lings. I am sorry to say that I have lied to you. I am ac­tu­ally Alpha, a galac­tic su­per­in­tel­li­gence who hates that Omega ass­hole. I came to pre­dict your species’ re­ac­tion to my arch-neme­sis Omega and I must say that I am dis­ap­pointed. So many of you chose the ob­vi­ously-ir­ra­tional sin­gle-box strat­egy that I must de­cree your species un­wor­thy of this uni­verse. Good­bye.”

Gi­ant laser beam then obliter­ates earth. I die wish­ing I’d done more to warn the world of this highly-im­prob­a­ble threat.

TLDR: I don’t buy this post’s ar­gu­ment that I should be­come the type of agent that sees one-box­ing on New­comb-like prob­lems as ra­tio­nal. It is triv­ial to con­struct any num­ber of no-less plau­si­ble sce­nar­ios where a su­per­in­tel­li­gence de­scends from the heav­ens and puts a few thou­sand peo­ple through New­comb’s prob­lem be­fore sud­denly an­nihilat­ing those who one-box. The pre­sented ar­gu­ment for be­com­ing the type of agent that Omega pre­dicts will one-box can be equally used to ar­gue for be­com­ing the type of agent that Alpha pre­dicts will two-box. Why then should it sway me in ei­ther di­rec­tion?

• It seems like the ‘ra­tio­nal’ two box­ers are fal­ling prey to the con­cept of be­lief in be­lief. They think that be­cause they be­lieve that they are peo­ple who would choose both boxes, than it doesn’t mat­ter what they choose, box B is already empty so they may as well take both. If you have all the in­for­ma­tion (ex­cept for what is in box B), than choos­ing both is the ir­ra­tional op­tion and the ‘ra­tio­nal’ peo­ple are ra­tio­nal­iz­ing. You’ve just seen some­one (or some­thing) ma­te­ri­al­ize two boxes from thin air, tell you they know which op­tion you’ll choose (and have ev­i­dence that they’ve been wrong yet), and leave. That per­son (or thing) has two pieces of in­for­ma­tion you don’t: what’s in box b and which op­tion will be cho­sen. If you ig­nore the ev­i­dence pro­vided in fa­vor of the be­lief that you know your­self bet­ter than re­al­ity, and then call it be­ing ra­tio­nal, I don’t know what to tell you.

Now let’s say you don’t know ev­ery­thing. A reg­u­lar per­son comes up and tells you one box has 1k and one has 1000k, and you can ei­ther take A and B, or just B and there is a high chance that tak­ing A and B will re­sult in B be­ing empty while tak­ing just B will re­sult in B hav­ing the 1000k, the per­son offer­ing you the boxes has l, es­sen­tially, zero cred­i­bil­ity, you may not even be­lieve ei­ther box has money. It doesn’t mat­ter to you whether that per­son knows already what you’ll pick. You don’t know they know, and it doesn’t mat­ter if they do. The ques­tion be­comes do you run away from this crazy and pos­si­bly dan­ger­ous per­son, do you beleven them and take both, or do you be­lieve them and take B? Ra­tion­ally speak­ing, you don’t lose any­thing by tak­ing any of those op­tions ex­cept for the op­por­tu­nity to learn what fol­low­ing the other op­tions would en­tail. It be­comes a ques­tion of will you re­gret tak­ing both and get­ting only 1k, oR tak­ing only b and los­ing the pos­si­bil­ity of 1k (or run­ning away, and re­gret­ting not call­ing the cops re:dan­ger­ous lu­natic later).

I had bet­ter phrase­ol­ogy and or­der in my head half an hour ago, but I’m typ­ing this up on my phone and I’m los­ing track of my points, so I’ll leave things as they are.

• The way I see it, causal de­ci­sion the­ory sim­ply ig­nores a part of the prob­lem: that the Pre­dic­tor is able to “pre­dict”.

Ev­i­dence should get in­side the equa­tion, but not the same way as ev­i­den­tial de­ci­sion the­ory: the ev­i­dence is what should fuel the hy­poth­e­sis “The Pre­dic­tor pre­dicts our choices”.

It does not mat­ter if we “think” that our “choice” shouldn’t change what’s in­side the boxes—as the main thing about a pre­dic­tion is that we aren’t ac­tu­ally mak­ing any “choice”, that “choice” is already pre­dicted. It’s the whole “free will” illu­sion all over again, that we think our choices are ours, when the pres­ence of such a Pre­dic­tor would sim­ply in­val­i­date that hy­poth­e­sis.

Causal de­ci­sion the­ory should still work, but not with a rea­son­ing that for­gets about the Pre­dic­tor. Since the Pre­dic­tor is gone, our choice shouldn’t (and won’t) af­fect what’s in the boxes—but as our choice was pre­dicted, ac­cu­rately, and as we have sup­pos­edly enough ev­i­dence to in­fer this pre­dic­tion, we should one box—and this won’t be a “choice”, it will sim­ply have been pre­dicted, and we’ll get the money.

I’m prob­a­bly not be­ing clear, and will try to say it an­other way. “Choos­ing” to one box will sim­ply mean that the Pre­dic­tor had pre­dicted that choice. “Choos­ing” to two box will also mean the same. It’s not a “choice” at all—our be­hav­ior will sim­ply be de­ter­minis­tic. There­fore we should one box, even though that is not a real “choice”.

The fea­tures of the Pre­dic­tor should ap­pear in causal de­ci­sion the­ory.

• Here is my an­swer to New­comb’s prob­lem:

Omega doesn’t ex­ist in re­al­ity. There­fore New­comb’s prob­lem is ir­rele­vant and I don’t waste time think­ing about it.

I won­der how many peo­ple come up with this an­swer. Most of them are prob­a­bly smarter than me and also don’t waste time com­ment­ing their opinion.

Am I miss­ing some­thing?

• I won­der how many peo­ple come up with this an­swer.

I’ve come up with a re­lated an­swer with the past, but I don’t think that defense is the best an­gle to take any­more when it comes to New­comb’s.

Am I miss­ing some­thing?

It helps to be very spe­cific with why you’re re­ject­ing a thought ex­per­i­ment. The state­ment “Omega doesn’t ex­ist in re­al­ity” needs to be traced to the ax­ioms that give you an im­pos­si­bil­ity proof. This both al­lows you to up­date your con­clu­sion as soon as those ax­ioms come into ques­tion and gen­er­al­ize from those ax­ioms to other situ­a­tions.

For ex­am­ple, the ‘frailty’ ap­proach to New­comb’s is to say “given that 1) my prior prob­a­bil­ity of in­san­ity is higher than my prior prob­a­bil­ity of Omega and 2) any ev­i­dence for Omega’s su­per­nat­u­ral abil­ity is at least as strong ev­i­dence for my in­san­ity, I can’t reach a state where I think that it’s more likely that Omega has su­per­nat­u­ral pow­ers than that I’m in­sane.” This gen­er­al­izes to, say, claims from con men; you might think that any ev­i­dence they pre­sent for their claims is also ev­i­dence for their un­trust­wor­thi­ness, and reach a point where you liter­ally can’t be­lieve them. (Is this a good state to be in?) But it’s not clear that 2 is true, and even if the con­clu­sion fol­lows through, it helps to have a de­ci­sion the­ory for what to do when you think you’re in­sane!

Another ap­proach to New­comb’s prob­lem is to get very spe­cific about what we mean by ‘causal­ity,’ be­cause New­comb’s is a situ­a­tion where we have a strong ver­bal ar­gu­ment that causal­ity shouldn’t ex­ist and a strong ver­bal ar­gu­ment that causal­ity should ex­ist. In or­der to re­solve the ar­gu­ment, we need to figure out what causal­ity means math­e­mat­i­cally, and then we can gen­er­al­ize much more broadly, and the time spent for­mal­iz­ing causal­ity is not at all wasted.

• Thanks for your re­ply. I didn’t ex­pect to get so much feed­back.

I tend to as­sume that I am not in­sane. Maybe I am over­con­fi­dent in that re­gard :-)

I would call my ap­proach to New­comb’s prob­lem an ex­am­ple of ra­tio­nal ig­no­rance. I think the cost of think­ing about this prob­lem (my time) is higher than the pos­si­ble benefit I could get out of it.

• Am I miss­ing some­thing?

Depends. Do you gen­er­ally think that thought ex­per­i­ments in­volv­ing fic­tional/​nonex­is­tent en­tities are ir­rele­vant (to what?) and not worth think­ing about? Or is there some­thing spe­cial about New­comb’s prob­lem?

If the former, yes, I think you’re miss­ing some­thing. If the lat­ter, then you might not be miss­ing any­thing.

• Thanks for this an­swer.

I think it’s only New­comb’s prob­lem in par­tic­u­lar. I just can’t imag­ine how 1) know­ing the right an­swer to this prob­lem or 2) think­ing about it can im­prove my life or that of any other per­son in any way.

• I was read­ing quite re­cently, but I can’t re­mem­ber where (LessWrong it­self?) (ETA: yes, here and on So8res’ blog), some­one say­ing New­comb-like prob­lems are the rule in so­cial in­ter­ac­tions. Every time you deal with some­one who is try­ing to pre­dict what you are go­ing to do and might be bet­ter at it than you, you have a New­comb-like prob­lem. If you just make what seems to you like the ob­vi­ously bet­ter de­ci­sion, the other per­son may have an­ti­ci­pated that and made that choice ap­pear de­cep­tively bet­ter for you.

“Hey, check out this great offer I re­ceived! Of course, these things are scams, but I just can’t see how this one could be bad!”

“Dude, you’re won­der­ing whether you should do ex­actly what a con artist has asked you to do?”

Now and then some less tech­ni­cally-minded friend will ask my opinion about a piece of dodgy email they re­ceived. My an­swer always be­gins, “IT’S A SCAM. IT’S ALWAYS A SCAM.”

New­comb’s Prob­lem re­duces the situ­a­tion to its bare es­sen­tials. A de­ci­sion the­ory that two-boxes may not be much use for an AGI, or for a per­son.

• (nods)
And how would you char­ac­ter­ize New­comb’s prob­lem?

For ex­am­ple, I would char­ac­ter­ize it as rais­ing ques­tions about how to be­have in situ­a­tions where our own be­hav­iors can re­li­ably (though im­perfectly) be pre­dicted by an­other agent.

• Imag­ine a differ­ent set of play­ers. For ex­am­ple, some soft­ware which is ca­pa­ble of mod­ify­ing its own code (that’s noth­ing out of the or­di­nary, such things ex­ist) and a pro­gram­mer ca­pa­ble of ex­am­in­ing that code.

• Yes, you’re miss­ing some­thing. You’re fight­ing the hy­po­thet­i­cal.

• Some hy­po­thet­i­cals are worth fight­ing. What’s the right ac­count­ing policy if 1=2? If 1=2, you have big­ger prob­lems.

• Some hy­po­thet­i­cals are worth fight­ing.

Not the one in ques­tion, though, since Omega can be ap­prox­i­mated—and typ­i­cally is, even if only as a (50+x)% cor­rect pre­dic­tor. Hu­mans are an ap­prox­i­ma­tion of Omega, in some sense. Solv­ing a prob­lem as­sum­ing a hy­po­thet­i­cal Omega is not un­like as­sum­ing cows are spheres in a vac­uum, i.e. a solu­tion of the ideal­ized thought ex­per­i­ment can still be rele­vant.

• As I un­der­stand it, most types of de­ci­sion the­ory (in­clud­ing game the­ory) as­sume that all agents have about the same in­tel­li­gence and that this in­tel­li­gence is effec­tively in­finite (or at least large enough so ev­ery­one has a com­plete un­der­stand­ing of the math­e­mat­i­cal im­pli­ca­tions of the rele­vant util­ity func­tions).

In New­comb’s prob­lem, one of the play­ers is ex­plic­itly defined as vastly more in­tel­li­gent than the other.

In any situ­a­tion where some­one might be re­ally good at pre­dict­ing your thought pro­cesses, its best to add some ran­dom­ness to your ac­tions. There­fore, my strat­egy would be to use a quan­tum ran­dom num­ber gen­er­a­tor to choose just box B with 51% prob­a­bil­ity. I should be able to win an av­er­age of \$1000490.

If there isn’t a prob­lem with this ar­gu­ment and if it hasn’t been thought of be­fore, I’ll call it “vari­able in­tel­li­gence de­ci­sion the­ory” or maybe “prac­ti­cal de­ci­sion the­ory”.

Dustin Soodak

• In any situ­a­tion where some­one might be re­ally good at pre­dict­ing your thought pro­cesses, its best to add some ran­dom­ness to your ac­tions. There­fore, my strat­egy would be to use a quan­tum ran­dom num­ber gen­er­a­tor to choose just box B with 51% prob­a­bil­ity. I should be able to win an av­er­age of \$1000490.

Some var­i­ants of the New­comb prob­lem spec­ify that if Omega isn’t sure what you will do he will as­sume you’re go­ing to two-box.

(And if Omega is re­ally that smart he will leave box A in a quan­tum su­per­po­si­tion en­tan­gled with that of your RNG. :-))

• I think gen­er­ally there’s an ad­den­dum to the prob­lem where if Omega sees you us­ing a quan­tum ran­dom­ness gen­er­a­tor, Omega will put noth­ing in box B, speci­fi­cally to pre­vent this kind of solu­tion. :P

Also, how did you reach your \$1000490 figure? If Omega just simu­lates you once, your pay­off is: 0.51 (0.51 (1000000) + 0.49 (1001000)) + 0.49 (0.51 0 + 0.49 (1000)) = \$510490 < \$1000000, so you’re bet­ter off one-box­ing un­less Omega simu­lates you mul­ti­ple times.

• I figured that if Omega is re­quired to try its best to pre­dict you and you are per­mit­ted to do some­thing that is phys­i­cally ran­dom in your de­ci­sion mak­ing pro­cess, then it will prob­a­bly be able to work out that I am go­ing to choose just one box with slightly more prob­a­bil­ity than choos­ing 2. There­fore, it will gain the most sta­tus on av­er­age (it MUST be af­ter sta­tus since it ob­vi­ously has no in­ter­est in money) by guess­ing that I will go with one box.

• .51 1000000 + .49 1001000 = 1000490

• Didn’t re­al­ize any­one watched the older threads so wasn’t ex­pect­ing such a fast re­sponse...

I’ve already heard about the ver­sion where “in­tel­li­gent alien” is re­placed with “psy­chic” or “pre­dic­tor”, but not the “hu­man is re­quired to be de­ter­minis­tic” or quan­tum ver­sion (which I’m pretty sure would re­quire the abil­ity to mea­sure the com­plete wave­form of some­thing with­out af­fect­ing it). I didn’t think of the “halt­ing prob­lem” ob­jec­tion, though I’m pretty sure its already ex­pected to do things even more difficult to get such a good suc­cess rate with some­thing as com­pli­cated as a hu­man CNS (does it just pas­sively ob­serve the player for a few days pre­ced­ing the event or is it al­lowed to do a com­plete brain scan?).

I still think my solu­tion will work in any re­al­is­tic case (where the alien isn’t mag­i­cal, and doesn’t re­quire your thought pro­cesses to be both de­ter­minis­tic and com­putable while not plac­ing any such limits on it­self).

What I find par­tic­u­larly in­ter­est­ing, how­ever, is that such a trou­ble­some ex­am­ple ex­plic­itly states that the agents have vastly un­equal in­tel­li­gence, while most ex­am­ples seem to as­sume “perfectly ra­tio­nal” agents (which seems to be in­ter­preted as be­ing in­tel­li­gent and ra­tio­nal enough so that fur­ther in­creases in in­tel­li­gence and ra­tio­nal­ity will make no differ­ence). Are there any other ex­am­ples where causal de­ci­sion the­ory fails which don’t in­volve non-equal agents? If not, I won­der if you could con­struct a proof that it DEPENDS on this as an ax­iom.

Has any­one tried adding “rel­a­tive abil­ity of one agent to pre­dict an­other agent” as a pa­ram­e­ter in de­ci­sion the­ory ex­am­ples? It seems like this might be ap­pli­ca­ble in the pris­oner’s dilemma as well. For ex­am­ple, a sim­ple tit-for-tat bot mod­ified so it doesn’t defect un­less it has re­ceived 2 nega­tive feed­backs in a row might do rea­son­ably well against other bots but would do badly against a hu­man player as soon as they figured out how it worked.

• You are fight­ing the hy­po­thet­i­cal. It is a com­mon pit­fall when faced with a coun­ter­in­tu­itive is­sue like that. Don’t do it, un­less you can prove a con­tra­dic­tion in the prob­lem state­ment. Omega is defined as a perfect pre­dic­tor of your ac­tions no mat­ter what you do. That in­cludes any quan­tum tricks. Also see the re­cent in­tro­duc­tion to New­comblike prob­lems for a de­tailed anal­y­sis.

• How does my ob­jec­tion fit into this: that it may not be pos­si­ble for Omega to pre­dict you in prin­ci­ple, since such an Omega would have to be able to solve the halt­ing prob­lem?

• Oddly, this prob­lem seems (to my philoso­pher/​en­g­ineer mind) to have an ex­ceed­ingly non-com­plex solu­tion, and it de­pends not upon the chooser but upon Omega.

Here’s the pay­out schema as­sumed by the two-boxer, for refer­ence: 1) Both boxes pre­dicted, both boxes picked: +\$1,000 2) Both boxes pre­dicted, only B picked: \$0 3) Only B pre­dicted, both boxes picked: +\$1,001,000 4) Only B pre­dicted, only B picked: +\$1,000,000

Omega, be­ing an un­know­able su­per­in­tel­li­gence, qual­ifies as a force of na­ture from our cur­rent level of hu­man un­der­stand­ing. Since Omega’s ways are in­scrutable, we can only eval­u­ate Omega based upon what we know of him so far: he’s 100 for 100 on pre­dict­ing the predilec­tions of peo­ple. While I’d pre­fer to have a much larger suc­cess base be­fore draw­ing in­fer­ence, it seems that we can es­tab­lish a defea­si­ble Law of Omega: what­ever de­ci­sion Omega has pre­dicted is vir­tu­ally cer­tain to be cor­rect.

So while the two-boxer would hold that choos­ing both boxes would give them ei­ther \$1,000 or \$1,001,000, this is clearly IRRATIONAL: the (defea­si­ble) Law of Omega out­right elimi­nates out­comes 2 and 3 above, which means that (un­til such time as new data forces a re­vi­sion of the Law of Omega) the two-boxer’s an­ti­ci­pated pay­off of \$1,001,000 DOES NOT EXIST. The only choice is be­tween out­come 1 (two-boxer gets \$1,000) and out­come 4 (one-boxer gets \$1,000,000). At that point, op­tion 4 is the dom­i­nant strat­egy… AND the ra­tio­nal thing to do.

Does that makes sense? Or am I plac­ing un­founded faith in Omega?

• If you look through the many sub­se­quent dis­cus­sions of this, you’ll see that in­deed \$1,001,000 is not in the out­come do­main, but the clas­si­cal CDT is un­able to enu­mer­ate this do­main cor­rectly.

• Think­ing about this in terms of AGI, would it be rea­son­able to sug­gest that a bias must be cre­ated in fa­vor of uti­liz­ing in­duc­tive rea­son­ing through Bayes’ The­o­rem rather than de­duc­tive rea­son­ing when and if the two con­flict?

• Seems like a sim­ple and rea­son­able an­swer to this prob­lem is that I would take the box with the mil­lion dol­lars, rather than the box with the thou­sand dol­lars and the empty box. It seems the main ques­tion is, “But why?”. So here is my rea­son­ing: Omega has shown over 99% ac­cu­racy in pro­vid­ing re­sults de­pen­dent on peo­ple’s choices. Box B has 100,000% bet­ter re­wards than Box A, such that if there is even 0.1% chance that tak­ing Box A will lose those re­wards, it is ir­ra­tional to also take Box A. As I have seen no ev­i­dence that Omega has left, it is not even cer­tain that my choice of ac­tions now will have not effect on the con­tents of the opaque box (only a fool would be cer­tain that just be­cause he “saw Omega fly away” that said su­per­in­tel­li­gence is not hid­ing nor has left be­hind an ob­serv­ing agent). As each of these choices would cause my choice to be only Box B, it is al­most cer­tain that Omega has seen like­wise and put the \$1,000,000 in Box B.

I sus­pect the prob­lem peo­ple seem to have with this is be­cause they think they are out­side of the game. But the game de­scrip­tion it­self says that (a very ac­cu­rate model of) you is in the game, and that there­fore your (mod­eled) choices, in­clud­ing sec­ond doubts and your de­sire not to leave that last \$1,000, will (if mod­eled cor­rectly) af­fect the con­tents of Box B. No, Omega is not re­ward­ing ir­ra­tional­ity. Omega is giv­ing a large re­ward to those who trust in Omega’s judge­ment, and a smaller re­ward to those who ar­ro­gantly think they can cheat the game he set up.

• I choose Box B. This is be­cause tak­ing into ac­count that Omega is a su­per­in­tel­li­gence with a suc­cess rate of 100% and no mar­gin of er­ror and is the one offer­ing the prob­lem to me. The only log­i­cal rea­son for this is an abil­ity to pre­dict vari­ables that I have no cur­rent un­der­stand­ing of. This is ei­ther through an abil­ity to an­a­lyze my psy­che and see my ten­dency to trust in things with 100% suc­cess rates, the abil­ity to fore­see in time my de­ci­sion, or the abil­ity for Omega to af­fect things back­wards in time. Omega has not pro­vided any rea­son­ing for its 100% suc­cess rate, so these are the three log­i­cal things that I see. If you would ar­gue to take both in the in­stance of the as­sump­tion that Omega has no ex­traor­di­nary pow­ers with time, and so the de­ci­sion is already made, I think this is ac­tu­ally the ir­ra­tional stance. Rea­son­ing from a stand­point that doesn’t con­sider the past facts is ac­tu­ally ir­ra­tional. I would take Box B, be­cause even if that as­sump­tion that he’s guessed wrong is cor­rect, and I take both boxes and get both sets of money, then I’m re­ally not that much bet­ter off than if I took Box B. To me, the ir­ra­tional de­ci­sion is to take both boxes, if the prob­a­bil­ity is as fol­lows: if I take Box B, I pre­sum­ably have a 100% prob­a­bil­ity of 1,000,000 dol­lars. If I take both boxes, I have a 50% chance of 1,000 dol­lars and 50% chance of 1,001,000 dol­lars. Tak­ing both is there­fore not the log­i­cal choice, as 1,001,000 dol­lars ver­sus 1,000,000 dol­lars is not worth the 50% chance of re­duc­ing my pay­out to 1,000 dol­lars. If you would put this into per­spec­tive in Pri­soner’s Dilemma in game the­ory, and put this de­ci­sion in front of me 10 times, the out­come and my de­ci­sions be­come a lot clearer. Let’s say that Omega has the abil­ity to guess wrong. If ev­ery 10 times I take both boxes, there is a 50% chance of the money be­ing in Box B, then nu­mer­i­cally I lose ver­sus if I choose Box B ev­ery time, even if Omega has the abil­ity to be wrong and there­fore it’s not in there one of the times. How­ever, one time out of ten would be the most log­i­cal er­ror rate to as­sume, if any, com­ing from the fact that if he’s been cor­rect 100100 times, and if he would be wrong with me, then he’s been cor­rect 100101 times, in which failure rate out of 10 chances re­ally only has the pos­si­bil­ity of be­ing ei­ther 010 or 110. There­fore, by tak­ing Box B 10 times, the min­i­mum pay­out I re­ceive is 9 mil­lion. If I take both all 10 times, then the most pay­out I can re­ally hope to achieve is 1 mil­lion and ten thou­sand. If Omega had a failure rate of even 5%, then that would definitely effect my de­ci­sion, but as it stands, the only log­i­cal choice is choos­ing only Box B. Fur­ther­more, if I only take Box B, and he’s wrong and it’s empty, then I be­lieve Omega would be cu­ri­ous enough in its failure to re­ward me with the mil­lion dol­lars af­ter­wards. Fur­ther­more, the 1,000 dol­lars out­come is sim­ply not enough money to me to “need it” in a way that makes it so I have to play safe.

• It seems to me that the en­tire dis­cus­sion is con­fused. Many peo­ple seem to be us­ing the claim that Omega can’t pre­dict your ac­tions to make claims about what ac­tions to take in the hy­po­thet­i­cal world where it can. Ac­cept­ing the as­sump­tion that Omega can pre­dict your ac­tions the prob­lem seems to be a triv­ial calcu­la­tion of ex­pected util­ity:

If the opaque box con­tains b1 util­ity, the trans­par­ent one b2 util­ity, omega has e1 prob­a­bil­ity of falsly pre­dict­ing you’ll one box and e2 prob­a­bil­ity of falsely pre­dict­ing you’ll two box the ex­pected util­ities are

1 box: (1-e2)b1 2 box: e1b1 + b2

And you should 1 box un­less b2 is big­ger than (1 - e2 - e1)*b1.

• It’s con­fort­ing some­times to read from some­one else that ra­tio­nal­ity is not the looser’s way, and ar­guably more so for Pri­son­ner’s Dilemma than New­comb’s if your con­sider the cur­rent state of our planet and the tragedy of com­mons.

I’m writ­ing this be­cause I be­lieve I suceeded writ­ing a com­puter pro­gram (it is so sim­ple I can’t call it an AI) able to ac­tu­ally simu­late Omega in a New­comb game. What I de­scribe be­low may look like an iter­ated New­comb’s prob­lem. But I claim it is not so and will ex­plain why.

When us­ing my pro­gram the hu­man player will ac­tu­ally be fac­ing some high ac­cu­racy pre­dic­tor and it will be true.

Ob­vi­ously there is a trick. Here is how it goes. The pre­dic­tor must first be cal­ibrated. This is done in the sim­plest pos­si­ble fash­ion : it just asks to the user if it would one-box or two-box. The prob­lem achiev­ing that is like ask­ing to some­one if she would en­ter burn­ing build­ing to save a child : no­body (ex­cept pro­fe­sional fire­men) would ac­tu­ally know be­fore con­fronted to the ac­tual event.

The pro­gram can ac­tu­ally do that : just don’t say to the player if it’s cal­ibra­tion of the pre­dic­tor he is do­ing or the ac­tual unique play.

Now reach­ing the de­sired pre­dic­tion ac­cu­racy level is sim­ple enough : just count the to­tal trial runs, and the num­ber of two-box­ing or one-box­ing, when one or the other goes over 99%. The pro­gram can then go for the pre­dic­tion.

Ob­vi­ously it must no ad­ver­tise that is the real game, or it would defeats the strat­egy of not say­ing if it’s the real game or not for pre­dic­tion ac­cu­racy. But any reader can check from pro­gram source code that the pre­dic­tion will in­deed be done be­fore (in a tem­po­ral mean­ing) ask­ing to the player if he will one box or two box.

Here goes my pro­gram, it is writ­ten us­ing python lan­guage and hevily com­mented, it should not be nec­es­sary to be much of a CS lit­ter­ate to un­dr­stand it. The only trick is in­ser­tion of some ran­dom­ness to avoid the player could pre­dict the end of cal­ibra­tion and start of the game.

``````print “I will run some trial games (at least 5) to cal­ibrate the pre­dic­tor.”
print (“As soon as the pre­dic­tor will reach the ex­pected qual­ity level\n”
“I will run the ac­tual New­comb game. Be warned you won’t be\n”
“warned when cal­ibra­tion phase will end and ac­tual game be­gin\n”
“this is in­tended to avoid any per­tur­ba­tion of pre­dic­tor ac­cu­racy.\n”)

# run some pre­lude games (to avoid com­put­ing av­er­ages on too small a set)
# then com­pute av­er­ages to reach the in­tended pre­dic­tion qual­ity
# in­ect­ing some ran­dom­ness in pre­lude and pre­ci­sion qual­ity avoid
# any­body (in­clud­ing pro­gram writer) to be cer­tain of when
# cal­ibra­tion ends. This is to avoid pro­vid­ing to user data that
# will change it’s be­hav­ior and defeats pre­dic­tion ac­cu­racy.
im­port ran­dom
# 5 to 25 cal­ibra­tion move
pre­lude = (5 + ran­dom.ran­dom() * 20.0)
# 90% ac­cu­racy or bet­ter, and avoid in­finite loop
# we do not tell how much bet­ter to avoid guessers
ac­cu­racy = 1.0 - (ran­dom.ran­dom() * 0.1) − 0.01
# postlude is the num­ber of test games where de­sired ac­cu­racy must be kept
# be­fore run­ning the ac­tual game
# postlude will be a ran­dom num­ber be­tween 1 and 5 to avoid play­ers guess­ing
# on the ex­act play time when per­cent will change, this could give them some
# hint on the ex­act fi­nal game time. It is pos­si­ble the cur­rent postlude
# can still be ex­ploited to im­prove cheater chances above in­tended pre­dic­tor
# val­ues, but it’s just here to get the idea… and be­sides out­guess­ing omega
# the cheater is only do­ing so in the hope of get­ting 100 bucks.
# How much en­ergy does that de­serve ?
postlude = 0
one = to­tal = two = 0
while ((to­tal < pre­lude) and (int(postlude) != 1)):
a = raw_in­put (“1 - One-box, 2 - Two-boxes : ”)
if not a in [‘1’, ‘2’]: con­tinue
if a == ‘1’:
one += 1
else:
two += 1
to­tal += 1
print “cur­rent ac­cu­racy is %d%%” % int(100.0 * max(two, one) /​ to­tal)
if (max(two, one) * 1.0 < to­tal * ac­cu­racy):
if postlude != 0 :
postlude -= 1
else:
postlude = 1 + ran­dom.ran­dom() * 5.0
else:
postlude = 0

# Now pre­dic­tion ac­cu­racy is good enough, run ac­tual New­comb’s game
# pre­dic­tion is truly a pre­dic­tion of the fu­ture
# noth­ing pre­vents the user to choose oth­er­wise.
#print “This is the ac­tual New­comb game, but I won’t say it”
pre­dic­tion = 1 if one > two else 2
finished = False
while not finished:
a = raw_in­put (“1 - One-box, 2 - Two-boxes : ”)
if a == ‘1’:
if pre­dic­tion == 1:
print “You win 1 000 000 dol­lars”
else:
print “You win zero dol­lars”
finished = True
elif a == ‘2’:
if pre­dic­tion == 1:
print “You win 1 000 100 dol­lars”
else:
print “You win 100 dol­lars”
finished = True
``````

Now, why did I said this is not an Iter­ated New­comb’s ?

The point is that the way it is writ­ten the pro­gram is not finite. The hu­man player is the only one able to stop the game. And to do that he has to com­mit to some op­tion one-box­ing or two-box­ing, thus leav­ing the pro­gram to reach the de­sired ac­cu­racy level. He also has no pos­si­bil­ity of “un­com­mit­ing” when the real game comes as this last one is not differ­ent from the oth­ers.

You could con­sider that the whole point of this set­ting is to con­vince the user that the claimed ac­cu­racy of Omega is true. What is fun is that in this set­ting it be­comes true be­cause the hu­man player choose it to be so.

I be­lieve the above pro­gram prooves that One-box­ing is ra­tio­nal, I should even say ob­vi­ous, pro­vided with the right set­ting.

Now, I can’t stop here. I be­lieve in maths as a neu­tral tool. It means that if the rea­son­ing lead­ing to one-box­ing is right, the rea­son­ing lead­ing to tow-box­ing must also be false. If both rea­son­ing were true maths would col­lapse;and that is not to be taken lightly.

Sum­mar­ily as the two-box­ing rea­son­ing goes it is an im­me­di­ate con­se­quence of the Dom­i­nance Ar­gu­ment.

So what ? Dom­i­nance Ar­gu­ment is rock solid. It is so sim­ple, so ob­vi­ous.

Below is a quote from Led­wig’s re­view on New­comb’s prob­lem about Dom­i­nance Ar­gu­ment, you could say a re­stric­tive clause of when you can of can­not ap­ply it:

``````> The prin­ci­ples of dom­i­nance are re­stricted in their range, for they can only be ap­plied,
> when the de­ci­sion maker be­lieves that the pos­si­ble ac­tions of the de­ci­sion maker don’t
> causally in­fluence the pos­si­ble states of the world, or the pos­si­ble ac­tions of any other
> de­ci­sion maker.
``````

There is a sub­tile er­ror in the above state­ment. You should re­place the words causally in­fluence by are not cor­re­lated with. Us­ing prob­a­bil­ist words it means ac­tions of both de­ci­sion mak­ers are in­de­pen­dant vari­ables. But the lack of cor­re­la­tion isn’t guaran­teed by the lask of causal­ity.

Think of a Pri­son­ner’s like Dilemma be­tween traders. Stock ex­change is fal­ling down for some cor­po­rate. If traders sell you get a stock mar­ket crash, if they buy it’s back to busi­ness as usual. If one sell while the other buy, only one will make big money.

Do you se­ri­ously be­lieve that given ac­cess to the same cor­po­rate data (but not com­mu­ni­cat­ing be­tween each other), both traders are not likely to make the same choice ?

In the above set­ting both play­ers are not in­de­pen­dant vari­ables and you can’t di­rectly ap­ply Dom­i­nance.

Rea­son­ing back­ward, you could say that your choice gives you some in­for­ma­tion on the prob­a­bil­ity of the other’s choice and as tak­ing that in­for­ma­tion into ac­count can change your choice, it may also change the choice of the other, you en­ter some in­ifinite re­cur­sion (but that’s not a prob­lem, you still have tools to solve that, like fixed point the­o­rem).

In the New­comb’s prob­lem, we are in an ex­treme case. The hy­poth­e­sis states the cor­re­la­tion be­tween play­ers, that’s the Omega’s pre­dic­tion ac­cu­racy.

Hence­forth, two-box­ing is not a ra­tio­nal de­ci­sion based on causal­ity, but a sim­ple dis­be­lief of the cor­re­la­tion stated in the hy­poth­e­sis, and a con­fu­sion be­twwen cor­re­la­tion and causal­ity.

When you re­move that dis­be­lief (that’s what my pro­gram does) the prob­lem dis­ap­pears.

• Now per­haps I am mi­s­un­der­stand­ing the prob­lem. Are we to as­sume that all this is fore­knowl­edge?

Given the in­for­ma­tion pre­sent in this ar­ti­cle I would just choose to take only B. But that is as­sum­ing that Omega is never wrong. Logic in my own mind dic­tates that re­gard­less of why I chose B, or if I at some ear­lier point may have Two-Boxed, at this time I choose box B, and if Omega’s pre­dic­tion is never wrong- then if I choose B, B will con­tain a mil­lion dol­lars.

Now in an al­ter­nate it­ter­a­tion of this dilemna, re­gard­less of the truth (whiether Omega is in­deed never wrong or not), if I only know of 100 ob­served oc­curences, that might have sub­stan­cial in­fluence on my rea­son­ing. Given a failure rate of (at most) 1 out of 101, I may very well be tempted by all the prior men­tioned ar­gu­ments for tak­ing boxes A and B, while I might still have a ten­dancy to just take box B any­way. After all, \$1,000 dol­lars isnt life-chang­ing for me, but I could re­ally make use of a mil­lion.

When all is said and done it comes down to a choice of \$1000, or \$1,000,000 dol­lars. If Omega is never wrong, then there is never a pos­si­bil­ity of tak­ing \$1,001,000. In which case, tak­ing A and B re­sults in \$1,000 with­out fail, and if by choos­ing only B, B would never be empty. The choice seems ob­vi­ous.

• I hope I’m not be­ing re­dun­dant, but… The com­mon ar­gu­ment I’ve seen is that it must be back­ward cau­sa­tion if one box­ing pre­dictably comes out with more money than two box­ing.

Why can’t it just be that Omega is re­ally, re­ally good at cog­ni­tive psy­chol­ogy, has a com­plete map of your brain, and is able to use that to pre­dict your de­ci­sion so well that the odds of Omega’s pre­dic­tion be­ing wrong are ep­silon? This just seemed… well, ob­vi­ous to me. But most peo­ple ar­gu­ing “back­ward cau­sa­tion!” seem to be smarter than me.

The pos­si­bil­ities I see are ei­ther that I’m se­ri­ously miss­ing some­thing here, or even re­ally smart peo­ple can’t let go of the idea that our brains are free from phys­i­cal law on some level.

The en­tire point of Omega seems to be “Yeah, no, free will isn’t as pow­er­ful as you seem to think.” Given 100 peo­ple and ac­cess to a few megabytes of their con­ver­sa­tions, con­tacts lists, face­book, TShirt col­lec­tion and ra­dio/​tele­vi­sion/​web-sur­fing habits, you can prob­a­bly make a pre­dic­tion about how they’ll vote in the next elec­tion that will do bet­ter than chance. Omega is im­plied here to have far bet­ter mod­els of peo­ple than tar­geted ad­ver­tis­ing. What suc­cess rate would it take to con­vince peo­ple that Omega isn’t cheat­ing, but is just re­ally, re­ally clever?

Of course, Omega’s abil­ities aren’t re­ally speci­fied. Maybe it is us­ing time­travel. But the laws of physics as we know them seem to fa­vor “Omega un­der­stands the hu­man brain” over “Omega can see into the fu­ture”, so if this hap­pened in the real world, back­ward cau­sa­tion would not be my lead­ing hy­poth­e­sis.

Of course, the hy­poth­e­sis “Omega cheats with some re­mote-con­trol­led mechanism in­side box B” is even eas­ier than ex­plain­ing an alien su­per­in­tel­li­gence with an amaz­ing un­der­stand­ing of in­di­vi­d­ual brains. If we could ex­am­ine box B af­ter 1box­ing and 2box­ing, we could prob­a­bly ad­just the prob­a­bil­ity on the “Omega cheats” hy­poth­e­sis. I don’t know how to dis­t­in­guish the back­wards cau­sa­tion and perfect brain model hy­pothe­ses, though.

Of course, the point of the origi­nal post wasn’t “re­verse en­g­ineer Omega’s meth­ods”. The point was “Make de­ci­sions that pre­dictably suc­ceed, not de­ci­sions that pre­dictably fail but are oth­er­wise more rea­son­able”. Omega’s meth­ods are rele­vant only if they al­low us to make bet­ter de­ci­sions than we would with the given in­for­ma­tion.

• I’ve been fid­dling around with this in my head. I ar­rived at this ar­gu­ment for one-box­ing: Let us sup­pose a Rule, that we shall call W: FAITHFULLY FOLLOW THE RULE THAT, IF FOLLOWED FAITHFULLY, WILL ALWAYS OFFER THE GREATEST CHANCE OF THE GREATEST UTILITY To prove W one boxes, let us list all log­i­cal pos­si­bil­ities, which we’ll call W1 W2 and W3: W1 always one-box­ing W2 always two box­ing, and W3 some­times one-box­ing and some­times two box­ing. Other­wise, all of these rules are iden­ti­cal in ev­ery way, and iden­ti­cal to W in ev­ery way. Imag­in­ing that we’re Omega, we’d ob­vi­ously place noth­ing in the box of the agent which fol­lows W2, since it knows that agent would two-box.. Since this limits the util­ity gained, W2 is not W. W3 is a bit trick­ier, but a var­i­ant of W3 which two-boxes most of the time will prob­a­bly not be favoured by Omega, since this would re­duce his chance of be­ing cor­rect in his pre­dic­tion. This re­duces the chance of get­ting the great­est util­ity by how­ever much, and thus, dis­qual­ifies all close to W2 var­i­ants of W3. A perfect W1 would guaran­tee that the box would con­tain 1,000,000 dol­lars, since Omega would get it’s pre­dic­tion wrong in not re­ward­ing an agent who one-boxes. How­ever, this rule GUARANTEES not get­ting the 1,001,000 dol­lars, and there­fore is sub -op­ti­mal. Be­cause of Omega’s op­ti­miza­tion, there is no such rule in which that is the most likely op­tion, but if there is such a rule in which this is sec­ond-most-likely, that would prob­a­bly be W. In any case, W favours B over A. I was go­ing to ar­gue that W is more ra­tio­nal than a hy­po­thet­i­cal rule Z which I think is what makes most two-box­ers two-box, but maybe I’ll do that later, when I’m more sure I have time.

• I’m con­fused about why this prob­lem is differ­ent from other de­ci­sion prob­lems.

Given the prob­lem state­ment, this is not an acausal situ­a­tion. No physics is be­ing di­s­obeyed—Kramers Kronig still works, rel­a­tivity still works. It’s com­pletely rea­son­able that my choice could be pre­dicted from my source code. Why isn’t this just an­other ex­am­ple of prior in­for­ma­tion be­ing ap­pro­pri­ately ap­plied to a de­ci­sion?

Am I dodg­ing the ques­tion? Does EY’s new de­ci­sion the­ory ac­count for truly acausal situ­a­tions? If I based my de­ci­sion on the re­sult of, say, a ra­dioac­tive de­cay ex­per­i­ment performed af­ter Omega left, could I still op­ti­mize?

• “You shouldn’t find your­self dis­t­in­guish­ing the win­ning choice from the rea­son­able choice.”

I dis­agree. Let’s say there’s box A with \$1000 dol­lars in it, and box B with \$10,000 in it 1% of the time, and you can only pick one. If i pick A and my friend picks B, and they get the \$10,000, they might say to me that I should wish I was like them. But I’ll defend my choice as rea­son­able, even though it wasn’t the win­ning choice that time.

• I be­lieve it should be read as:

“You shouldn’t find your­self dis­t­in­guish­ing the [time­lessly] win­ning choice [(as calcu­lated from ex­pected util­ity over in­finite at­tempts)] from the rea­son­able choice.”

In your ex­am­ple, your friend picked the choice that won once. It was luck, and he’s happy, and all is well for him. How­ever, the ex­pected value of box B was \$100, which does not win over \$1000. Ar­guably, the gam­bling in it­self may have nonzero util­ity value, and the cer­tainty of ob­tain­ing \$1000 may also have nonzero util­ity value, but that seems ir­rele­vant in your ex­am­ple from the way it was for­mu­lated.

TL;DR: It seems like you’re dis­agree­ing more on the for­mu­la­tion or word­ing than the ac­tual prin­ci­ple.

• It seems to me that no ra­tio­nal­ist should ac­cept the ‘givens’ in this sce­nario with­out a lot of ev­i­dence.

So what am I left with. Some be­ing who hands out boxes, and 100 ex­am­ples of peo­ple who open 1 box and get \$1M or open both boxes and get \$1k. I am un­will­ing to ac­cept on faith a su­per-in­tel­li­gent alien, so I will make the sim­plify­ing as­sump­tion that the be­ing is in fact Penn & Tel­ler. In which case, the ques­tion sim­plifies to “Am I will­ing to bet at 1000:1 odds that Penn & Tel­ler aren’t able to make a box which van­ishes \$1M if I choose both boxes.” To which I re­spond, no.

No re­ver­sal causal­ity re­quired. No su­per­in­tel­li­gent pre­dic­tion re­quired. I sim­ply know that I can’t beat Penn & Tel­ler at their own game 999 times out of 1000.

• The solu­tion to this prob­lem is sim­ple and, in my eyes, pretty ob­vi­ous. Your de­ci­sion isn’t chang­ing the past, it’s sim­ply that the choice of Omega and your de­ci­sion have the same cause. As­sum­ing Omega em­u­lates your mind un­der the con­di­tions of when you’re mak­ing the choice, then the cause of the pre­dic­tion and the cause of your choice are the same (the origi­nal state of your mind is the cause). So choos­ing B is the ra­tio­nal choice. And any­ways, no mat­ter what method of pre­dic­tion Omega uses, the cause of his pre­dic­tion will always be the same as the cause of your choice (if it isn’t, then he doesn’t have any ba­sis for his pre­dic­tion, and he will there­fore have a lower suc­cess rate than 100% un­less he is re­ally, re­ally lucky).

And even if you don’t think of this when con­fronted by the prob­lem, the prob­a­bil­ity should be more than enough to con­vince you that B is the ra­tio­nal choice. If the Uni­verse says one thing and your model says an­other, fol­low the Uni­verse, not your model.

• Sorry, I’m new here, I am hav­ing trou­ble with the Idea that any­one would con­sider tak­ing both boxes in a real world situ­a­tion. How would this puz­zle be mod­eled differ­ently, ver­sus how would it look differ­ently if it were Penn and Tel­ler fly­ing Omega?

If Penn and Tel­ler were fly­ing Omega then they would have been able to pro­duce ex­actly the same re­sults as seen, with­out vi­o­lat­ing causal­ity or time trav­el­ling or perfectly pre­dict­ing peo­ple by just cheat­ing and emp­ty­ing the box af­ter you choose to take both.

Given that “it’s cheat­ing” is a sig­nifi­cantly more ra­tio­nal idea than “it’s smart enough to pre­dict 100 peo­ple” in terms of sim­plic­ity and re­sults seen, why not go with that as a ra­tio­nal rea­son to pick just box B? The only rea­son one would take both is if it proved it was not cheat­ing, how it could do that with­out also con­vinc­ing me of its pre­dic­tive pow­ers I don’t know, and once con­vinced of is pre­dic­tive pow­ers I would have to take Box B.

So tak­ing both boxes only makes sense if you know it is not cheat­ing, and know it can be wrong. I no­tice I am con­fused, how can you both know it is not cheat­ing, and not know that it is cor­rect in it’s pre­dic­tion.

I think that the rea­son this puz­zle begets ir­ra­tional­ity is that one of the fun­da­men­tal things you must do to parse the puz­zle is ir­ra­tional, that is ‘be­lieve that the ma­chine is not cheat­ing’, given the al­ter­na­tives and no fur­ther info.

• Yeah, this comes up a lot.

My usual way of ap­proach­ing it is to ac­knowl­edge that the thought ex­per­i­ment is ask­ing me to imag­ine be­ing in a par­tic­u­lar epistemic state, and then ask­ing me for my in­tu­itions about what I would do, and what it would be right for me to do, given that state. The fact that the speci­fied epistemic state is not one I can imag­ine reach­ing is beside the point.

This is com­mon for thought ex­per­i­ments. If I say “sup­pose you’re on a space­ship trav­el­ing at .999999999c, and you get in a trol­ley in­side the ship that runs in the di­rec­tion the ship is trav­el­ling at 10 m/​s, how fast are you go­ing?” it isn’t helpful to re­ply “No such space­ship ex­ists, so that con­di­tion can’t arise.” That’s ab­solutely true, but it is beside the point.

• The difficulty I am hav­ing here is not so much that the stated na­ture of the prob­lem is not real so much that it is ask­ing one to as­sume they are ir­ra­tional. With a .999999999c space­ship it is not ir­ra­tional to as­sume one is in a trol­ley on a space ship if one is in a trol­ley on a space ship. There is not enough in­for­ma­tion in the Omega puz­zle as it as­sumes you, the per­son it drops the boxes in front of, know that omega is pre­dict­ing, but does not tell you how you know that. As the men­tal state ‘know­ing it is pre­dict­ing’ is fun­da­men­tal to the puz­zle, not know­ing how one came to that con­clu­sion asks you to be a mag­i­cal thinker for the pur­pose of the puz­zle. I be­lieve that this may at least par­tially ex­plain why there seems to be a lack of con­sen­sus.

I also am sus­pi­cious of the am­bigu­ous na­ture of the word pre­dict, but am hav­ing trou­ble phras­ing the is­sue. Omega may be us­ing as­trol­ogy and hap­pen to have been right each of 100 times, or be liter­ally look­ing for­ward in time. Without know­ing how can one make the best choice?

All that said tak­ing just B is my plan, as with \$1,000,000 I can af­ford to lose \$1,000.

• I agree that I can’t imag­ine any jus­tified way of com­ing to be­lieve Omega has the prop­er­ties that I am pre­sumed to be­lieve Omega to have. So, yes, the thought ex­per­i­ment ei­ther as­sumes that I’ve ar­rived at that state in some un­jus­tified way (as you say, as­sume I’m ir­ra­tional, at least some­times) or that I’ve ar­rived at it in some jus­tified way I cur­rently have no inkling of (and there­fore can­not cur­rently imag­ine).

As­sum­ing that I’m ir­ra­tional some­times, and some­times there­fore ar­rive at be­liefs that aren’t jus­tified, isn’t too difficult for me; I have a lot of ex­pe­rience with do­ing that. (Far more ex­pe­rience than I have with rid­ing a trol­ley on a space­ship, come to that.)

But, sure, I can see where peo­ple whose ex­pe­rience doesn’t in­clude that, or whose self-image re­jects it re­gard­less of their ex­pe­rience, or who oth­er­wise have trou­ble imag­in­ing them­selves ar­riv­ing at be­liefs that aren’t ra­tio­nally jus­tified, might balk at that step.

Without know­ing how can one make the best choice?

If by “best choice” we mean the choice that has the best pos­si­ble re­sults, then in this case we ei­ther can­not make the best choice ex­cept by ac­ci­dent, or we always make the best choice, de­pend­ing on whether the things that didn’t in fact hap­pen were pos­si­ble be­fore they didn’t hap­pen, which there’s no par­tic­u­lar rea­son to be­lieve.

If by “best choice” we mean the choice that has the high­est ex­pected value given what we know when we make it, then we make the best choice by eval­u­at­ing what we know.

• Thanks, that does help a lit­tle, though I should say that I am pretty sure I hold a num­ber of ir­ra­tional be­liefs that I am yet to ex­cise. As­sum­ing that Omega liter­ally im­planted the idea into my head is a differ­ent thought ex­per­i­ment to Omega turned out to be pre­dict­ing is differ­ent to Omega say­ing that it pre­dicted the re­sult etc. Un­til I know how and why I know it is pre­dict­ing the re­sult I am not sure how I would act in the real case. How Omega told me that I was only al­lowed to pick box a and b or just b may or may not be helpful but ei­ther way not as im­por­tant as how I know it is pre­dict­ing.

Edit. There seem to be a num­ber of thought ex­per­i­ments wherein I have an ir­ra­tional be­lief that I can more ac­cu­ratly men­tally model, like how I may be­have if I thought that I was the King of England. Now I am won­der­ing what about this spe­cific prob­lem is giv­ing me trou­ble.

• Un­til I know how and why I know it is pre­dict­ing the re­sult I am not sure how I would act in the real case.

Fair enough.

For my own part, I find that I of­ten act on my be­liefs in a situ­a­tion with­out stop­ping to con­sider what my ba­sis for those be­liefs is, so it’s not too difficult for me to imag­ine act­ing on my posited be­liefs about Omega’s pre­dic­tive abil­ity while ig­nor­ing the ques­tion of where those be­liefs came from. I sim­ply ac­cept, for the sake of the ex­er­cise, that I do be­lieve it and act ac­cord­ingly.

Another way of look­ing at it you might find helpful is to leave aside al­to­gether the ques­tion of what I would or wouldn’t do, and what I can and can’t be­lieve, and in­stead ask what the right thing to do would be were this the ac­tual situ­a­tion.

E.g., if you give me a de­vice that is in­dis­t­in­guish­able from a re­volver, but is de­signed in such a way that plac­ing it to my tem­ple and firing the trig­ger doesn’t put a bul­let in my skull but in­stead causes Vast Quan­tities of Really Good Stuff to hap­pen, the right thing to do is put the de­vice to my tem­ple and fire the trig­ger. I won’t ac­tu­ally do that, be­cause I have no way of know­ing what the de­vice ac­tu­ally does, but whether I do it or not it’s the right thing to do.

• Thank you. By de­per­son­al­is­ing the ques­tion it makes it eas­ier for me to think about. If do you take one box or two be­comes should one take one box or two… I am still con­fused. I’m con­fi­dent that just box B should be taken, but I think that I need in­for­ma­tion that is im­plied to ex­ist but is not pre­sented in the prob­lem to be able to give a cor­rect an­swer. Namely the na­ture of the pre­dic­tions Omega has made.

With the prob­lem as stated I do not see how one could tell if Omega got lucky 100 times with a flawed sys­tem, or if it has a de­ter­minis­tic or causal­ity break­ing pro­cess that it fol­lows.

One thing I would say is that pick­ing B the most you could lose is 1000 dol­lars if B is empty. Pick­ing A and B the most you could gain over just B is 1000 dol­lars. Is it worth bet­ting a rea­son­able chance at \$1,000,000 for a \$1,000 gain if you beat a com­puter at a game 100 peo­ple failed to beat it at, es­pe­cially if it is a game you more or less ax­io­mat­i­cally do not un­der­stand how it is play­ing?

• Mm. I’m not re­ally un­der­stand­ing your think­ing here.

• Sorry, I am hav­ing difficulty ex­plain­ing as I am not sure what it is I am try­ing to get across, I lack the words. I am hav­ing trou­ble with the use of the word pre­dict, as it could im­ply any num­ber of meth­ods of pre­dic­tion, and some of those meth­ods change the an­swer you should give.

For ex­am­ple if it was pre­dict­ing by the colour of the player’s shoes it may have a micron over 50% chance of be­ing right, and just hap­pened to have been cor­rect the 100 times you heard of. In that case one should take a and b, if, on the other hand, it was a vis­i­tor from a higher ma­trix, and got its an­swer by simu­lat­ing you perfectly and at fast for­ward, then what­ever you want to take is the best op­tion and in my case that is B. If it is break­ing causal­ity by look­ing through a win­dow into the fu­ture, then take box B. My an­swers are con­di­tional on in­for­ma­tion I do not have. I am hav­ing trou­ble men­tally mod­el­ling this situ­a­tion with­out as­sum­ing one of these cases to be true.

• This seems a bizarre way of think­ing about it, to me. It’s as though you’d said “sup­pose there’s some­one walk­ing past Sam in the street, and Sam can shoot and kill them, ought Sam do it?” and I’d replied “well, I need to know how re­li­able a shot Sam is. If Sam’s odds of hit­ting the per­son are low enough, then it’s OK. And that de­pends on the make of gun, and how much train­ing Sam has had, and...”

I mean, sure, in the real world, those are per­haps rele­vant fac­tors (and per­haps not). But you’ve already told me to sup­pose that Sam can shoot and kill the passerby. If I as­sume that (which in the real world I would not be jus­tified in sim­ply as­sum­ing with­out ev­i­dence), the make of the gun no longer mat­ters.

Similarly, I agree that if all I know is that Omega was right in 100 tri­als that I’ve heard of, I should lend greater cre­dence to the hy­poth­e­sis that there were >>100 tri­als, the suc­cess­ful 100 were cher­ryp­icked, and Omega is not a par­tic­u­larly re­li­able pre­dic­tor. This falls into the same cat­e­gory as as­sum­ing Omega is sim­ply ly­ing… sure, it’s high­est-ex­pected-value thing to do in an analo­gous situ­a­tion that I might ac­tu­ally find my­self in, but that’s differ­ent from what the prob­lem as­sumes.

The prob­lem as­sumes that I know Omega has an N% pre­dic­tion rate. If I’m go­ing to en­gage with the prob­lem, I have to make that as­sump­tion. If I am un­able to make that as­sump­tion, and in­stead make var­i­ous other as­sump­tions that are differ­ent, then I am un­able to en­gage with the prob­lem.

Which is OK… en­gag­ing with New­combe’s prob­lem is not a par­tic­u­larly im­por­tant thing to be able to do. If I’m un­able to do it, I can still lead a fulfilling life.

• It’s sim­ply one of the rules of the thought ex­per­i­ment. If you bring in the hy­poth­e­sis that Omega is cheat­ing, you are talk­ing about a differ­ent thought ex­per­i­ment. That may be an in­ter­est­ing thought ex­per­i­ment in its own right, but it isn’t the thought ex­per­i­ment un­der dis­cus­sion, and the solu­tion you are propos­ing to your thought ex­per­i­ment is not a solu­tion to New­comb’s prob­lem.

• I think you went wrong when you said:

Next, let’s turn to the charge that Omega fa­vors ir­ra­tional­ists. I can con­ceive of a su­per­be­ing who re­wards only peo­ple >born with a par­tic­u­lar gene, re­gard­less of their choices. I can con­ceive of a su­per­be­ing who re­wards peo­ple whose >brains in­scribe the par­tic­u­lar al­gorithm of “De­scribe your op­tions in English and choose the last op­tion when or­dered >alpha­bet­i­cally,” but who does not re­ward any­one who chooses the same op­tion for a differ­ent rea­son. But Omega >re­wards peo­ple who choose to take only box B, re­gard­less of which al­gorithm they use to ar­rive at this de­ci­sion, and >this is why I don’t buy the charge that Omega is re­ward­ing the ir­ra­tional. Omega doesn’t care whether or not you fol­low >some par­tic­u­lar rit­ual of cog­ni­tion; Omega only cares about your pre­dicted de­ci­sion.

be­cause Omega doesn’t re­ward peo­ple for their choice to pick box B, he re­wards them for be­ing im­ple­men­ta­tions of any of the many al­gorithms that would pick box B.

I think that the causal de­ci­sion the­ory al­gorithm is the win­ning way for prob­lems where your mind is not read (when you take into ac­count that causal de­ci­sion the­ory can be swayed to make choices so as de­ceive oth­ers about your real al­gorithm). Prob­lems where your mind is read do not usu­ally show up in real life. I think there is no win­ning way for con­ceiv­able uni­verses in gen­eral, so I want to be an im­ple­men­ta­tion of the win­ning al­gorithm for this uni­verse, which seems to be causal de­ci­sion the­ory.

• So, I’m sure this isn’t an origi­nal thought but there are a lot of com­ments and my util­ity func­tion is rol­ling its eyes at the thought of go­ing through them all to see whether this com­ment is re­dun­dant, as com­pared to writ­ing the com­ment given I want to sort my thoughts out ver­bally any­way.

I think the stan­dard form of the ques­tion should be changed to the one with the as­ter­oid. To­tal de­struc­tion is to­tal de­struc­tion, but money is only worth a) what you can buy with it and b) the effort it takes to earn it.

I can earn \$1000 in a month. Some peo­ple could earn it in a week. What is the differ­ence be­tween \$1m and \$1m + \$1000? Yes, it’s tech­ni­cally a higher num­ber, but in terms of my life that is not a statis­ti­cally sig­nifi­cant differ­ence. Of course I’d rather definitely have \$1m than risk hav­ing noth­ing for the pos­si­bil­ity of hav­ing \$1m + \$1000.

The causal de­ci­sion the­ory ver­sions of this prob­lem don’t look ridicu­lous be­cause they take the safe op­tion, they look ridicu­lous be­cause the util­ity of two-box­ing is not sig­nifi­cant in com­par­i­son with the po­ten­tial util­ity of one-box­ing. That is, a one-boxer doesn’t lose much if they’re wrong: IF box 2 already con­tained noth­ing when they chose it, they only missed their chance at \$1000, whereas IF box 2 already con­tained \$1m a two-boxer misses their chance at \$1m.

Ob­vi­ously an ad­vanced de­ci­sion the­ory needs a way to rank the po­ten­tial risks—if you pos­tu­late it as the as­ter­oid, the risk is much more con­crete.

• shortly af­ter post­ing this I re­al­ised that the value to me of \$1000 is only rele­vant if you as­sume the odds of Omega pre­dict­ing your ac­tions cor­rectly are 50/​50ish. Need to think about this some more.

• This re­minds me eerily of the Calv­inist doc­trine of pre­des­ti­na­tion. The money is already there, and mak­ing fun of me for two-box­ing ain’t gonna change any­thing.

A ques­tion—how could Omega be a perfect pre­dic­tor, if I in fact have a third op­tion—namely leav­ing with­out tak­ing ei­ther box? This pos­si­bil­ity would, in any real-life situ­a­tion, lead me to two-box. I know this and ac­cept it.

Then there’s always the eco­nomic ar­gu­ment: If \$1000 is a sum of money that mat­ters a great deal to me, I’m two-box­ing. Other­wise, I’d pre­fer to one-box.

• Then there’s always the eco­nomic ar­gu­ment: If \$1000 is a sum of money that mat­ters a great deal to me, I’m two-box­ing.

Do you mean that \$1,000 mat­ters a great deal, but \$1,000,000 doesn’t mat­ter a great deal? If you buy that Omega is a perfect pre­dic­tor, then it’s im­pos­si­ble to walk away empty-handed. (Whether or not you should buy that in real life is it’s own is­sue.)

• Box B is already empty or already full [and will re­main the same af­ter I’ve picked it]

Do I have to be­lieve that state­ment is com­pletely and ut­terly true for this to be a mean­ingful ex­er­cise? It seems to me that I should treat that as du­bi­ous.

It seems to me that Omega is achiev­ing a high rate of suc­cess by some un­known good method. If I be­lieve Omega’s method is a hard-to-de­tect re­mote-con­trol­led money va­por­i­sa­tion pro­cess then clearly I should one-box.

A su­per in­tel­li­gence has many ways to get the re­sults it wants.

I am in­clined to think that I don’t know the mechanism with suffi­cient cer­tainty that I should rea­son my­self into two-box­ing against the ev­i­dence to date.

Does it mat­ter which un­de­tectable un­be­liev­able pro­cess Omega is us­ing for me to pick my strat­egy? I don’t think it does—I have to ac­knowl­edge that I’m out of my depth with this alien and ar­gu­ments against causal­ity defi­ance or the im­pos­si­bil­ity of un­de­tectable money va­por­isers are not go­ing to help me take the mil­lion.

Another tack: Omega isn’t a su­per in­tel­li­gence—he’s got a ship, a plan, and a lot of time on his hands. He turns up on mil­lions of wor­lds to play this game. His guesses are pretty lousy, he guesses right only x per­cent of the time. We are the only planet on which he’s con­sis­tently guessed right. We don’t know what x is in the full sam­ple size. Look­ing at what his re­sults are here, it looks good. Does it re­ally seem ra­tio­nal to sec­ond guess the sam­ple we see?

It seems to me that we have to ac­cept some pretty wild state­ments and then start rea­son­ing based on them for us to come to a los­ing strat­egy. If we doubt the premises to some de­gree then does it be­come clear that the most rea­son­able strat­egy is one-box­ing?

• This is an old thread, but I can’t imag­ine the prob­lem go­ing away any­time soon, so let me throw some chum into the wa­ters;

Omega says; “I pre­dict you’re a one boxer. I can un­der­stand that. You’ve got re­ally good rea­sons for pick­ing that, and I know you would never change your mind. So I’m go­ing to give you a slightly differ­ent ver­sion of the prob­lem; I’ve de­cided to make both boxes trans­par­ent. Oh and by the way, my pre­dic­tions aren’t 100% cor­rect.”

Ques­tion: Do you make any differ­ent de­ci­sions in the trans­par­ent box case?
If so, what was there about your origi­nal ar­gu­ment that is differ­ent in the trans­par­ent box case?

If you’re re­ally a one boxer, that means you can look at an empty box and still pick it.

I was sur­prised that the rec.puz­zles FAQ an­swer to this doesn’t ap­pear in the replies. (Maybe it’s here and I just missed it.)

While you are given that P(do X | pre­dict X) is high, it is not given that P(pre­dict X | do X) is high. In­deed, spec­i­fy­ing that P(pre­dict X | do X) is high would be equiv­a­lent to spec­i­fy­ing that the be­ing could use magic (or re­verse causal­ity) to fill the boxes. There­fore, the ex­pected gain from ei­ther ac­tion can­not be de­ter­mined from the in­for­ma­tion given.

In other words, we can’t tell if (how much) our ac­tions de­ter­mine the out­come, so we can’t make a ra­tio­nal de­ci­sion.

• This thread has gone a bit cold (are there other ones more ac­tive on the same topic?)

My ini­tial thoughts: if you’ve never heard of New­comb’s prob­lem, and come across it for the first time in real-time, then as soon as you start think­ing about it, the only thing to do is 2-box. Yes, Omega will have worked out you’ll do that, and you’ll only get \$1000, but the con­tents of the boxes are already set. It’s too late to con­vince Onega that you’re go­ing to 1 box.

On the other hand, if you have already heard and thought about the prob­lem, the ra­tio­nal thing to do is to con­di­tion your­self in ad­vance so that you will take 1 box in New­comb-type situ­a­tions, and ideally do so quite re­flex­ively, with­out even think­ing about it. That way, Omega will pre­dict (cor­rectly) that you will 1-box, and you’ll get the \$1 mil­lion.

This is fairly close to the stan­dard anal­y­sis, though what I’d dis­pute about the stan­dard ver­sion is that there is any­thing “ir­ra­tional” in so-con­di­tion­ing one­self. It seems to me that we train our­selves all the time to do things with­out think­ing about them (such as walk­ing, driv­ing to work, typ­ing out let­ters to spell words etc) and it’s perfectly rea­son­able for us to do that where it will have higher ex­pected util­ity for us.

There might even be a sig­nifi­cant prac­ti­cal is­sue here: quite pos­si­bly a lot of moral dis­ci­pline in­volves con­di­tion­ing of one­self in ad­vance to do things which don’t (at that time) max­imise util­ity. This is so we ac­tu­ally get to be put in po­si­tions of re­spon­si­bil­ity, where be­ing in such po­si­tions has higher util­ity than not be­ing in them—real-life New­comb prob­lems.In prac­tice, we seem to be quite good at ap­prox­i­mat­ing Omega with each other on a so­cial level; when hiring a se­cu­rity guard for in­stance, we seem to be quite good at pre­dict­ing who will defend our prop­erty rather than run off with it. Not perfect of course.

• It’s too late to con­vince Onega that you’re go­ing to 1 box.

You seem to be think­ing about Omega as if he’s a mind-reader that can only be af­fected by your thoughts at the time he set the boxes, in­stead of a pre­dic­tor/​simu­la­tor/​very good guesser of your fu­ture thoughts.

So it’s not “too late”.

and ideally do so quite re­flex­ively, with­out even think­ing about it.

What does it mat­ter if you’ll do it re­flex­ively or af­ter a great deal of thought? The prob­lem doesn’t say that re­flex­ive de­ci­sions are eas­ier for Omega to guess than ones fol­low­ing long de­liber­a­tion.

• I’m mod­el­ling Omega as a pre­dic­tor whose pre­dic­tion func­tion is based on the box-chooser’s cur­rent men­tal state (and pre­sum­ably the cur­rent state of the chooser’s en­vi­ron­ment). Omega can simu­late that state for­ward into the fu­ture and see what hap­pens, but this is still a func­tion of cur­rent state.

This is differ­ent from Omega be­ing a pre-cog who can (some­how) see di­rectly into the fu­ture, with­out any for­ward simu­la­tion etc.

• Omega can simu­late that state for­ward into the fu­ture and see what hap­pens, but this is still a func­tion of cur­rent state.

Yes. And what Omega dis­cov­ers as a re­sult of perform­ing the simu­la­tion de­pends on what de­ci­sion you’ll make, even if you en­counter the prob­lem for the first time, since a phys­i­cal simu­la­tion doesn’t care about cog­ni­tive nov­elty. As­sum­ing you’re digi­tally en­coded, it’s a log­i­cally valid state­ment that if you one-box, then Omega’s simu­la­tion says that you one-boxed, and if you two-box, then Omega’s simu­la­tion says that you two-boxed. In this sense you con­trol what’s in the box.

• I think this is the dis­con­nect… The chooser’s men­tal state when sam­pled by Omega causes what goes into the box. The chooser’s sub­se­quent de­ci­sions don’t cause what went into the box, so they don’t “con­trol” what goes into the box ei­ther. Con­trol is a causal term...

• The goal is to get more money, not nec­es­sar­ily to “causally con­trol” money. I agree that a pop­u­lar sense of “con­trol” prob­a­bly doesn’t in­clude what I de­scribed, but the ques­tion of whether that word should in­clude a new sense is a de­bate about defi­ni­tions, not about the thought ex­per­i­ment (the dis­am­biguat­ing term around here is “acausal con­trol”, though in the nor­mal situ­a­tions it in­cludes causal con­trol as a spe­cial case).

So long as we un­der­stand that I re­fer to the fact that it’s log­i­cally valid that if you one-box, then you get \$1,000,000, and if you two-box, then you get only \$1,000, there is no need to be con­cerned with that term. Since it’s true that if you two-box, then you only get \$1,000, then by two-box­ing you guaran­tee that it’s true that you two-box, ergo that you get \$1000. Cor­re­spond­ingly, if you one-box, that guaran­tees that it’s true that you get \$1,000,000.

(The sub­tlety is hid­den in the fact that it might be false that you one-box, in which case it’s also true that your one-box­ing im­plies that 18 is a prime. But if you ac­tu­ally one-box, that’s not the case! See this post for some dis­cus­sion of this sub­tlety and a model that makes the situ­a­tion some­what clearer.)

• It seems to me that if I’ve never be­fore been ex­posed to New­comb’s prob­lem, and Omega pre­sents me with it, there are two pos­si­bil­ities: ei­ther I will one-box, or I will two-box. If I one-box (even with­out hav­ing pre­com­mited to do­ing so, sim­ply by virtue of my thoughts at the mo­ment about the boxes), Omega will have pre­vi­ously worked out that I’m the sort of per­son who would one-box.

Why do you say that the only thing to do in the ab­sence of pre­com­mit­ment is two-box?

• In the case of fac­ing the prob­lem for the first time, in real-time, a per­son can only 1 box by ig­nor­ing the con­cept of a “dom­i­nant” strat­egy. Or by not re­ally un­der­stand­ing the prob­lem (the boxes re­ally are there with ei­ther \$1 mil­lion in or not and you can’t ac­tu­ally change that: Omega has no time travel or re­verse cau­sa­tion pow­ers). Or by hav­ing a util­ity some­thing other than money, which is not in it­self ir­ra­tional, but goes against the state­ment of the prob­lem.

For in­stance, I think an as­tute ra­tio­nal thinker could (per­haps) ar­gue in real-time “this looks like a sort of dis­guised moral prob­lem; Omega seems to be im­plic­itly test­ing my ethics i.e. test­ing my self-re­straint ver­sus my greed. So per­haps I should take 1”. How­ever, at that stage the 1-boxer prob­a­bly val­ues act­ing eth­i­cal­lly more than be­ing \$1000 richer. Or there might be other ra­tio­nal prefer­ences for not 2-box­ing such as get­ting a re­ally strong urge to 1-box at the time, and prefer­ing to satisfy the urge than to be \$1000 richer. Or know­ing that if you 2-box you’ll worry for the rest of your life whether that was the right thing, and this is just not worth \$1000. I think these are well-known “solu­tions” which all shift the util­ity func­tion and hence sidestep the prob­lem.

• I un­der­stand the ar­gu­ment, I just don’t un­der­stand what the nov­elty of the prob­lem has to do with it. That is, it seems the same prob­lem arises whether it’s a new prob­lem or not.

You’re of course right that there’s no time­travel in­volved. If I’m the sort of per­son who two-boxes, Omega will put \$1000 in. If I’m the sort of per­son who one-boxes, Omega will put \$1000000 in. (If I’m the sort of per­son whose be­hav­ior can’t be pre­dicted ahead of time, then Omega is ly­ing to me.)

So, what sort of per­son am I? Well, geez, how should I know? Un­like Omega, I’m not a re­li­able pre­dic­tor of my be­hav­ior. The way I find out what sort of per­son I am is by see­ing what I do in the situ­a­tion.

You seem to be in­sist­ing on there be­ing a rea­son for my one-box­ing be­yond that (like “I think Omega is test­ing my ethics” or “I pre­com­mit­ted to one-box­ing” or some such thing). I guess that’s what I don’t un­der­stand, here. Either I one-box, or I two-box. My rea­sons don’t mat­ter.

• You seem to be in­sist­ing on there be­ing a rea­son for my one-box­ing be­yond that (like “I think Omega is test­ing my ethics” or “I pre­com­mit­ted to one-box­ing” or some such thing). I guess that’s what I don’t un­der­stand, here. Either I one-box, or I two-box.

In­deed. “I like money” seems like a good enough rea­son to one box with­out any­thing more com­pli­cated!

• That’s just ev­i­den­tial de­ci­sion the­ory, right?

• That’s just ev­i­den­tial de­ci­sion the­ory, right?

I call it “I take free monies the­ory!” I don’t need a the­o­ret­i­cal frame­work to do that. At this point in time there isn’t a for­mal de­ci­sion the­ory that re­sults in all the same de­ci­sions that I en­dorse—ba­si­cally be­cause the guys are still work­ing out the kinks in UDT and for­mal­iza­tion is a real bitch some­times. They haven’t figured out a way to gen­er­al­ize the han­dling of coun­ter­fac­tu­als the way I would see them han­dled.

(ArisKat­saris nails it in the sibling).

• Well, New­comb’s prob­lem is sim­ple enough that ev­i­den­tial de­ci­sion the­ory suffices.

• I’m go­ing to track what’s hap­pened on the other threads dis­cussing New­comb’s para­dox, since I sus­pect there’s quite a lot of rep­e­ti­tion or over­lap. Be­fore sign­ing off though, does any­one here have a view on whether it mat­ters whether Omega is a perfect pre­dic­tor, or just a very good­pre­dic­tor?

Per­son­ally, I think it does mat­ter, and mat­ters rather a lot. The New­comb prob­lem can be stated ei­ther way.

Let’s start with the “very good” pre­dic­tor case, which I think is the most plau­si­ble one, since it just re­quires Omega to be a good judge of char­ac­ter.

Con­sider Alf, who is the “sort of per­son who 2-boxes”. Let’s say he has >99% chance of 2-box­ing and <1% chance of 1 box­ing (but he’s not to­tally de­ter­minis­tic and has oc­ca­sional whims, lapses or what­ever). If Omega is a good pre­dic­tor based on gen­eral judge of char­ac­ter, then Omega won’t have put the \$1 mil­lion in Alf’s boxes. So in the un­likely event that Alf ac­tu­ally does take just the one box then he’ll win noth­ing at all. This means that if Alf knows he’s ba­si­cally a 2-boxer (he as­signs some­thing like 99% cre­dence to the event that he 2-boxes) and knows that Omega is a good but im­perfect pre­dic­tor, Alf has a ra­tio­nale for re­main­ing a 2-boxer. This holds un­der both causal de­ci­sion the­ory and ev­i­den­tial de­ci­sion the­ory. The solu­tion of be­ing a 2-boxer is re­flec­tively-sta­ble; Alf can know he’s like that and stay like that.

But now con­sider Beth who’s the sort of per­son who 1-boxes. In the un­likely event that she takes both boxes, Omega will still have put the \$1 mil­lion in, and so Beth will win \$1001000. But now if Beth knows she’s a 1-boxer (say as­signs 99% cre­dence to tak­ing 1 box), and again knows that Omega is good but im­perfect, this puts her in an odd self-as­sess­ment po­si­tion, since it seems she has a clear ra­tio­nale to take both boxes (again un­der both ev­i­den­tial and causal de­ci­sion th­ery). If she re­mains a 1-boxer, then she is es­sen­tially pro­ject­ing of her­self that she has only 1% chance of mak­ing a \$-op­ti­mal choice i.e. she be­lieves of her­self ei­ther that she is not a ra­tio­nal util­ity max­imiser, or her util­ity func­tion is differ­ent from \$. If Beth truly is a \$ util­ity max­imiser, then Beth’s po­si­tion doesn’t look re­flec­tively sta­ble; though she could maybe have “trained” her­self to act in this way in New­comb situ­a­tions and is aware of the pre-con­di­tion­ing.

Fi­nally, con­sider Charles, who has never heard of the New­comb prob­lem, and doesn’t know whether he will 1-box or 2-box. How­ever, Charles is sure he is a \$-util­ity max­i­mizer. If he is causal de­ci­sion the­o­rist, he will quickly de­cide to 2 box, and so will model him­self like Alf. If he’s an ev­i­den­tial de­ci­sion the­o­rist, then he will ini­tially as­sign some prob­a­bil­ity to ei­ther 1 or 2 box­ing, calcu­late that his ex­pected util­ity is higher by 1 box­ing, and then start to model him­self like Beth. But then, he will re­al­ize this self-model is re­flec­tively un­sta­ble, since it re­quires him to model him­self as some­thing other than a \$ util­ity max­imiser, and he’s sure that’s what he is. After flap­ping about a bit, he will re­alze that the only re­flec­tively sta­ble solu­tion is to model him­self like Alf, and this makes it bet­ter for him to 2 box. Think­ing about the prob­lem too much forces him to 2 box.

In the event that Omega is a perfect pre­dic­tor, and the box-chooser knows this, then things get messier, be­cause now the only re­flec­tively-sta­ble solu­tion for the ev­i­den­tial de­ci­sion the­o­rist is to 1-box. (Beth thinks “I have 99% chance of 1-box­ing, and in the rare event that I de­cide to 2-box, Omega will have pre­dicted this, and my ex­pected util­ity will be lower; so I still have a ra­tio­nale to 1 box!). What about the causal de­ci­sion the­o­rist though? One difficulty is how the causal the­o­rist can re­ally be­lieve in Omega as a perfect pre­dic­tor with­out also be­liev­ing in some form of ret­ro­grade cau­sa­tion or time travel. This seems a strange set of be­liefs to hold in com­bi­na­tion. If the causal de­ci­sion the­o­rist squares the cir­cle by as­sign­ing some sort of pre-cog­ni­tive fac­ulty to Omega, or at least as­sign­ing some non-triv­ial cre­dence to such a pre-cog fac­ulty, then he can rea­son that there is af­ter all (with some cre­dence) a gen­uine (if bizarre) causal re­la­tion be­tween what he chooses, and what goes in the box, so he should 1-box. If he re­mains sure that there is no such causal re­la­tion, then he should 2 box. But we should note that the 2 box po­si­tion is dis­tinctly weaker in this case than in the “good but im­perfect” case.

• It is not clear to me that Alf’s po­si­tion as de­scribed here is sta­ble.

You say Alf knows Omega is a good (but im­perfect) pre­dic­tor. Just for speci­fic­ity, let’s say Alf has (and be­lieves he has) .95 con­fi­dence that Omega can pre­dict Alf’s box-se­lec­tion be­hav­ior with .95 ac­cu­racy. (Never mind how he ar­rived at such a high con­fi­dence; per­haps he’s seen sev­eral hun­dred tri­als.) And let’s say Alf val­ues money.

Given just that be­lief, Alf ought to be able to rea­son as fol­lows: “Sup­pose I open just one box. In that case, I ex­pect with ~.9 con­fi­dence that Omega placed \$1m+\$1k in the box. OTOH, sup­pose I open both boxes. In that case, I ex­pect with ~.9 con­fi­dence that Omega placed \$1k in the box.”

For sim­plic­ity, let’s as­sume Alf be­lieves Omega always puts ei­ther \$1k or \$1m+\$1k in the boxes (as op­posed to, say, putting in an an­gry bob­cat). So if Alf has .9 con­fi­dence in there be­ing \$1k in the boxes, he has .1 con­fi­dence in (\$1m+1k) in the boxes.

So, Alf ought to be able to con­clude that one-box­ing has an ex­pected value of (.9 \$1m + .1 \$1k) and two-box­ing has an ex­pected value of (.9 \$1k + .1 \$1m+1k). The ex­pected value of one-box­ing is greater than that of two-box­ing, so Alf ought to one-box.

So far, so good. But you also say that Alf has .99 con­fi­dence that Alf two-boxes… that is, he has .99 con­fi­dence that he will take the lower-value choice. (Again, never mind how he ar­rived at such high con­fi­dence… al­though iron­i­cally, we are now posit­ing that Alf is a bet­ter pre­dic­tor than Omega is.)

Well, this is a pickle! There do seem to be some con­tra­dic­tions in Alf’s po­si­tion.

Per­haps I’m miss­ing some key im­pli­ca­tions of be­ing a causal vs. an ev­i­den­tial de­ci­sion the­o­rist, here. But I don’t re­ally see why it should mat­ter. That just af­fects how Alf ar­rived at those var­i­ous con­fi­dence es­ti­mates, doesn’t it? Once we know the es­ti­mates them­selves, we should no longer care.

In­ci­den­tally, if Alf be­lieves Omega is a perfect pre­dic­tor (that is, Alf has .95 con­fi­dence that Omega can pre­dict Alf’s box-se­lec­tion with 1-ep­silon ac­cu­racy) the situ­a­tion doesn’t re­ally change much; the EV calcu­la­tion is (.95 \$1m + .05 \$1k) vs (.9 \$1k + .05 \$1m+1k), which gets you to the same place.

• OK, maybe it wasn’t to­tally clear. Alf is very con­fi­dent that he 2-boxes, since he thinks that’s the “right” an­swer to New­comb’s prob­lem. Alf is very con­fi­dent that Omega is a good pre­dic­tor, be­cause he’s a good judge of char­ac­ter, and will spot that Alf is a 2-boxer.

Alf be­lieves that in the rare, fluky event that he ac­tu­ally 1-boxes, then Omega won’t have pre­dicted that, since it is so out of char­ac­ter for Alf. Alf thinks Omega is a great pre­dic­tor, but not a perfect pre­dic­tor, and can’t fore­see such rare, fluky, out-of-char­ac­ter events. So there still won’t be the \$1 mil­lion in Alf’s boxes in the flukey event that he 1-boxes, and he will win noth­ing at all, not \$1 mil­lion. Given this be­lief set, Alf should 2-box, even if he’s an ev­i­den­tial de­ci­sion the­o­rist rather than a causal de­ci­sion the­o­rist. The po­si­tion is con­sis­tent and sta­ble.

Is that clearer?

• Alf be­lieves that in the rare, fluky event that he ac­tu­ally 1-boxes, then Omega won’t have pre­dicted that.

Ah! Yes, this clar­ifies mat­ters.

Sure, if Alf be­lieves that Omega has a .95 chance of pre­dict­ing Alf will two-box re­gard­less of whether or not he does, then Alf should two-box. Similarly, if Beth be­lieves Omega has a .95 chance of pre­dict­ing Beth will one-box re­gard­less of whether or not she does, then she also should two-box. (Though if she does, she should im­me­di­ately lower her ear­lier con­fi­dence that she’s the sort of per­son who one-boxes.)

This is im­por­tantly differ­ent from the stan­dard New­comb’s prob­lem, though.

You seem to be op­er­at­ing un­der the prin­ci­ple that if a con­di­tion is un­likely (e.g., Alf 1-box­ing) then it is also un­pre­dictable. I’m not sure where you’re get­ting that from.

By way of anal­ogy… my fire alarm is, gen­er­ally speak­ing, the sort of thing that re­mains silent… if I ob­serve it in six-minute in­ter­vals for a thou­sand ob­ser­va­tions, I’m pretty likely to find it silent in each case. How­ever, if I’m a good pre­dic­tor of fire alarm be­hav­ior, I don’t there­fore as­sume that if there’s a fire, it will still re­main silent.

Rather, as a good pre­dic­tor of fire alarms, what my model of fire alarms tells me is that “when there’s no fire, I’m .99+ con­fi­dent it will re­main silent; when there is a fire, I’m .99+ con­fi­dent it will make noise.” I can there­fore test to see if there’s a fire and, if there is, pre­dict it will make noise. Its noise is rare, but pre­dictable (for a good enough pre­dic­tor of fire alarm be­hav­ior).

• Re­mem­ber I have two mod­els of how Omega could work.

1) Omega is in essence an ex­cel­lent judge of char­ac­ter. It can re­li­ably de­cide which of its can­di­dates is “the sort of per­son who 1-boxes” and which is “the sort of per­son who 2-boxes”. How­ever, if he chooser ac­tu­ally does some­thing ex­tremely un­likely and out of char­ac­ter, Omega will get its pre­dic­tion wrong. This is a model for Omega that I could ac­tu­ally see work­ing, so it is the most nat­u­ral way for me to in­ter­pret New­comb’s thought ex­per­i­ment.

If Omega be­haves like this, then I think causal and ev­i­den­tial de­ci­sion the­ory al­ign. Both tell the chooser to 2-box, un­less the chooser has already pre-com­mit­ted to 1-box­ing. Both im­ply the chooser should pre-com­mit to 1-box­ing (if they can).

2) Omega is a perfect pre­dic­tor, and always gets its pre­dic­tions right. I can’t ac­tu­ally see how his model would work with­out re­verse cau­sa­tion. If re­verse causatiion is im­plied by the prob­lem state­ment, or choosers can rea­son­ably think it is im­plied, then both causal and ev­i­den­tial de­ci­sion the­ory al­ign and tell the chooser to 1-box.

From the sound of things, you are de­scribing a third model in which Omega can not only judge char­ac­ter, but can also re­li­ably de­cide whether some­one will act out of char­ac­ter or not. When faced with “the sort of per­son who 1-boxes”, but then—out of char­ac­ter − 2 boxes af­ter all, Omega will still with high prob­a­bil­ity guess cor­rectly that the 2-box­ing is go­ing to hap­pen, and so with­hold the \$ 1 mil­lion.

I tend to agree that in this third model causal and ev­i­den­tial de­ci­sion the­ory may be­come de­cou­pled, but again I’m not re­ally sure how this model works, or whether it re­quires back­ward cau­sa­tion again. I think it could work if the causal fac­tors lead­ing the chooser to act “out of char­ac­ter” in the par­tic­u­lar case are already em­bed­ded in the chooser’s brain state when scanned by Omega, so at that stage it is already highly prob­a­ble that the chooser will act out of char­ac­ter this time. But the model won’t work if the fac­tors caus­ing out of char­ac­ter be­havi­our arise be­cause of very rare, ran­dom, brain events hap­pen­ing af­ter the scan­ning (say a few stray neu­rons fire which in 99% of cases wouldn’t fire af­ter the scanned brain state, and these cause a cas­cade even­tu­ally lead­ing to a differ­ent choice). Omega can’t pre­dict that type of event with­out be­ing a pre-cog.

Thanks any­way though; you’ve cer­tainly made me think about the prob­lem a bit fur­ther...

• So, what does it mean for a brain to do one thing 99% of the time and some­thing else 1% of the time?

If the 1% case is a gen­uinely ran­dom event, or the re­sult of some mys­te­ri­ous sort of un­pre­dictable free will, or oth­er­wise some­thing that isn’t the effect of the causes that pre­cede it, and there­fore can’t be pre­dicted short of some mys­te­ri­ous acausal pre­cog­ni­tion, then I agree that it fol­lows that if Omega is a good-but-not-perfect pre­dic­tor, then Omega can­not pre­dict the 1% case, and New­comb’s prob­lem in its stan­dard form can’t be im­ple­mented even in prin­ci­ple, with all the con­se­quences pre­vi­ously dis­cussed.

Con­versely, if brain events—even rare ones—are in­stead the effects of causes that pre­cede them, then a good-but-not-perfect pre­dic­tor can make good-but-not-perfect pre­dic­tions of the 1% case just as read­ily as the 99% case, and these prob­lems don’t arise.

Per­son­ally, I con­sider brain events the effects of causes that pre­cede them. So if I’m the sort of per­son who one-boxes 99% of the time and two-boxes 1% of the time, and Omega has a suffi­cient un­der­stand­ing of the causes of hu­man be­hav­ior to make 95% ac­cu­rate pre­dic­tions of what I do, then Omega will pre­dict 95% of my (com­mon) one-box­ing as well as 95% of my (rare) two-box­ing. Fur­ther, if I some­how come to be­lieve that Omega has such an un­der­stand­ing, then I will pre­dict that Omega will pre­dict my (rare) two-box­ing, and there­fore I will pre­dict that two-box­ing loses me money, and there­fore I will one-box sta­bly.

• So, what does it mean for a brain to do one thing 99% of the time and some­thing else 1% of the time?

For the sake of the least con­ve­nient world as­sume that the brain is par­tic­u­larly sen­si­tive to quan­tum noise. This ap­plies in the ac­tual world too albeit at a far, far lower rate than 1% (but hey… perfect). That leaves a perfect pre­dic­tor perfectly pre­dict­ing that in the branches with most of the quan­tum goo (pick a word) the brain will make one choice while in the oth­ers it will make the other.

In this case it be­comes a mat­ter of how the coun­ter­fac­tual is speci­fied. The most ap­pro­pri­ate one seems to be with Omega filling the large box with an amount of money pro­por­tional to how much of the brain will be one box­ing. A brain that ac­tively flips a quan­tum coin would then be granted a large box with half the mil­lion.

The only other ob­vi­ous al­ter­na­tive speci­fi­ca­tion of Omega that wouldn’t break the coun­ter­fac­tual given this this con­text are a hard cut­off and some spe­cific de­gree of ‘prob­a­bil­ity’.

As you say the one box­ing re­mains sta­ble un­der this un­cer­tainty and even im­perfect pre­dic­tors.

• I’m not sure what the quan­tum-goo ex­pla­na­tion is adding here.

If Omega can’t pre­dict the 1% case (whether be­cause it’s due to un­pre­dictable quan­tum goo, or for what­ever other rea­son… pick­ing a spe­cific ex­pla­na­tion only sub­jects me to a con­junc­tion fal­lacy) then Omega’s be­hav­ior will not re­flect the 1% case, and that com­pletely changes the math. Some­one for whom the 1% case is two-box­ing is then en­tirely jus­tified in two-box­ing in the 1% case, since they ought to pre­dict that Omega can­not pre­dict their two-box­ing. (As­sum­ing that they can rec­og­nize that they are in such a case. If not, they are best off one-box­ing in all cases. Though it fol­lows from our premises that they will two-box 1% of the time any­way, though they might not have any idea why they did that. That said, com­pat­i­bil­ist de­ci­sion the­ory makes my teeth ache.)

Any­way, yeah, this is as­sum­ing some kind of hard cut­off strat­egy, where Omega puts a mil­lion dol­lars in a box for some­one it has > N% con­fi­dence will one-box.

If in­stead Omega puts N% of \$1m in the box if Omega has N% con­fi­dence the sub­ject will one-box, the re­sult isn’t ter­ribly differ­ent if Omega is a good pre­dic­tor.

I’m com­pletely lost by the “pro­por­tional to how much of the brain will be one box­ing” strat­egy. Can you say more about what you mean by this? It seems likely to me that most of the brain nei­ther one-boxes nor two-boxes (that is, is not in­volved in this choice at all) and most of the re­main­der does both (that is, performs the same op­er­a­tions in the two-box­ing case as in the one-box­ing case).

• I’m not sure what the quan­tum-goo ex­pla­na­tion is adding here.

A perfect pre­dic­tor will pre­dict cor­rectly and perfectly that the brain both one boxes and two boxes in differ­ent Everett branches (with vastly differ­ent weights). This is differ­ent in na­ture to an im­perfect pre­dic­tor that isn’t able to model the be­hav­ior of the brain with com­plete cer­tainty yet given prefer­ences that add up to nor­mal it re­quires that you use the same math. It means you do not have to aban­don the premise “perfect pre­dic­tor” for the prob­a­bil­is­tic rea­son­ing to be nec­es­sary.

I’m com­pletely lost by the “pro­por­tional to how much of the brain will be one box­ing” strat­egy.

How much weight the ev­erett branches in which it one box have rel­a­tive to the ev­erett branches in which it two boxes.

Allow me to em­pha­sise:

As you say the one box­ing re­mains sta­ble un­der this un­cer­tainty and even im­perfect pre­dic­tors.

(I think we agree?)

• Ah, I see what you mean.

Yes, I think we agree. (I had pre­vi­ously been un­sure.)

• Omega can’t pre­dict that type of event with­out be­ing a pre-cog.

As­sume that the per­son choos­ing the boxes is a whole brain em­u­la­tion, and that Omega runs a perfect simu­la­tion, which guaran­tees for­mal iden­tity of Omega’s pre­dic­tion and per­son’s ac­tual de­ci­sion, even though the com­pu­ta­tions are performed sep­a­rately.

• So the chooser in this case is a fully de­ter­minis­tic sys­tem, not a real-live hu­man brain with some chance of ran­dom firings screw­ing up Omega’s pre­dic­tion?

Wow, that’s an in­ter­est­ing case, and I hadn’t re­ally thought about it! One in­ter­est­ing point though—sup­pose I am the chooser in that case; how can I tell which simu­la­tion I am? Am I the one which runs af­ter Omega made its pre­dic­tion? Or am I the one which Omega runs in or­der to make its pre­dic­tion, and which does have a gen­uine causal effect on what goes in the boxes? It seems I have no way of tel­ling, and I might (in some strange sense) be both of them. So causal de­ci­sion the­ory might ad­vise me to 1-box af­ter all.

• So the chooser in this case is a fully de­ter­minis­tic sys­tem, not a real-live hu­man brain with some chance of ran­dom firings screw­ing up Omega’s pre­dic­tion?

This is more of a way of point­ing out a spe­cial case that shares rele­vant con­sid­er­a­tions with TDT-like ap­proach to de­ci­sion the­ory (in this ex­treme iden­ti­cal-simu­la­tion case it’s just Hofs­tadter’s “su­per­ra­tional­ity”).

If we start from this case and grad­u­ally make the pre­dic­tion model and the player less and less similar to each other (per­haps by mak­ing the model less de­tailed), at which point do the con­sid­er­a­tions that make you one-box in this edge case break? Clearly, if you change the pre­dic­tion model just a lit­tle bit, cor­rect an­swer shouldn’t im­me­di­ately flip, but CDT is no longer ap­pli­ca­ble out-of-the-box (ar­guably, even if you “con­trol” two iden­ti­cal copies, it’s also not di­rectly ap­pli­ca­ble). Thus, a need for gen­er­al­iza­tion that ad­mits im­perfect acausal “con­trol” over suffi­ciently similar de­ci­sion-mak­ers (and suffi­ciently ac­cu­rate pre­dic­tions) in the same sense in which you “con­trol” your iden­ti­cal copies.

• That might give you the right an­swer when Omega is simu­lat­ing you perfectly, but pre­sum­ably you’d want to one-box when Omega was simu­lat­ing a slightly lossy, non-sen­tient ver­sion of you and only pre­dicted cor­rectly 90% of the time. For that (i.e. for all real world New­comblike prob­lems), you need a more so­phis­ti­cated de­ci­sion the­ory.

• Well no, not nec­es­sar­ily. Again, let’s take Alf’s view. (Note I ed­ited this post re­cently to cor­rect the dis­play of the ma­tri­ces)

Re­mem­ber that Alf has a high prob­a­bil­ity of 2 box­ing, and he knows this about him­self. Whether he would ac­tu­ally do bet­ter by 1-box­ing will de­pend how well Omega’s “mis­taken” simu­la­tions are cor­re­lated with the (rare, freaky) event that Alf 1 boxes. Ba­si­cally, Alf knows that Omega is right at least 90% of the time, but this doesn’t re­quire a very so­phis­ti­cated simu­la­tion at all, cer­tainly not in Alf’s own case. Omega can run a very crude simu­la­tion, say “a clear” 2-boxer, and not fill box B (so Alf won’t get the \$ 1 mil­lion. Ba­si­cally, the game out­come would have a prob­a­bil­ity ma­trix like this:

`````` Box B filled.               Box B empty.

0.                          0.99.               Alf 2 boxes

0.                          0.01.               Alf 1 boxes
``````

No­tice that Omega has less than 1% chance of a mis­taken pre­dic­tion.

But, I’m sure you’re think­ing, won’t Omega run a ful­ler simu­la­tion with 90% ac­cu­racy and pro­duce a prob­a­bil­ity ma­trix like this?

``````   Box B filled.          Box B empty.

0.099.                     0.891.                    Alf 2 boxes

0.009.                     0.001.                    Alf 1 boxes
``````

Well Omega could do that, but now its prob­a­bil­ity of er­ror has gone up from 1% to 10%, so why would Omega bother?

Let’s com­pare to a more ba­sic case: weather fore­cast­ing. Say I have a simu­la­tion model which takes in the cur­rent at­mo­spheric state above a land sur­face, runs it for­ward a day, and tries to pre­dict rain. It’s pretty good: if there is go­ing to be rain, then the simu­la­tion pre­dicts rain 90% of the time; if there is not go­ing to be rain, then it pre­dicts rain only 10% of the time. But now some­one shows me a desert, and asks me to pre­dict rain: I’m not go­ing to use a simu­la­tion with a 10% er­ror rate, I’m just go­ing to say “no rain”.

So it seems in the case of Alf. Pro­vided Alf’s chance of 1-box­ing is low enough (i.e. lower than the un­der­ly­ing er­ror rate of Omega’s simu­la­tions) then Omega can always do best by just say­ing “a clear 2-boxer” and not filling the B box. Omega may also say to him­self “what an ut­ter schmuck” but he can’t fault Alf’s ap­pli­ca­tion of de­ci­sion the­ory. And this ap­plies whether or not Alf is a causal de­ci­sion the­o­rist or an ev­i­den­tial de­ci­sion the­o­rist.

• In­ci­den­tally, your fire alarm may be prac­ti­cally use­less in the cir­cum­stances you de­scribe. Depend­ing on the rel­a­tive prob­a­bil­ities (small prob­a­bil­ity that the alarm goes off when there is not a fire ver­sus even smaller prob­a­bil­ity that there gen­uinely is a fire) then you may find that es­sen­tially all the alarms are false alarms. You may get fed up re­spond­ing to false alarms and ig­nore them. When pre­dict­ing very rare events, the pre­dic­tion sys­tem has to be ex­tremely ac­cu­rate.

This is re­lated to the anal­y­sis be­low about Omega’s simu­la­tion be­ing only 90% ac­cu­rate ver­sus a re­ally con­vinced 2-boxer (who has only a 1% chance of 1-box­ing). Or of simu­lat­ing rain in a desert.

• This thread has gone a bit cold (are there other ones more ac­tive on the same topic?)

• Thanks for this… I’m look­ing at them.

If I’m cor­rect, the gen­eral thrust seems to be “there is a prob­lem with both causal de­ci­sion the­ory and ev­i­den­tial de­ci­sion the­ory, since they some­times recom­mend differ­ent things, and some­times EDT seems right, whereas at other times CDT seems right. So we need a broader the­ory”.

I’m not to­tally con­vinced of this need, since I think that in many ways of in­ter­pret­ing the New­comb prob­lem, EDT and CDT lead to es­sen­tially the same con­clu­sion. They both say pre-com­mit to 1-box­ing. If you haven’t pre­com­mit­ted, they both say 2-box (in some in­ter­pre­ta­tions) or they both say 1-box (in other in­ter­pre­ta­tions). And the cases where they come apart are meta­phys­i­cally rather prob­le­matic (e.g. Omega’s pre­dic­tions must be perfect or nearly-so with­out pre-cog­ni­tion or re­verse cau­sa­tion; Omega’s simu­la­tion of the 2-boxer must be ac­cu­rate enough to catch the rare oc­ca­sions when he 1-boxes, but with­out that simu­la­tion it­self be­com­ing sen­tient.)

How­ever, again, thanks for the refer­ences and for a few new things to think about.

• If in 35 AD you were told that there were only 100 peo­ple who had seen Je­sus dead and en­tombed and then had seen him al­ive af­ter­wards, and that there were no peo­ple who had seen him dead and en­tombed who had seen his dead body af­ter­wards, would you be­lieve he had been re­s­ur­rected?

In New­comb’s prob­lem as stated, we are told 100 peo­ple have got­ten the pre­dicted an­swer. Then no mat­ter how un­likely our pri­ors put on a su­per­in­tel­li­gent alien be­ing able to pre­dict what we would do, we should ac­cept this as proof.

This seems like a pretty sym­met­ric ques­tion to me. A one boxer should say, if con­sis­tent, sure, 100 peo­ple saw it it is true. No mat­ter what pri­ors we put on the re­s­ur­rec­tion of Je­sus be­ing true.

To me, it is in­cred­ibly more likely that ei­ther peo­ple are ly­ing to me, or at least be­ing wrong. I have seen ma­gi­ci­ans make things ap­pear and dis­ap­pear in boxes that were already sealed, af­ter they left. It is WAY more likely that this is some kind of test and/​or scam.

Which is not to say I wouldn’t one-box, I would! What­ever scam Omega is run­ning, I’d rather have the mil­lion dol­lars, or prove Omega a fraud by find­ing an empty box, then to have only \$1000, or prove Omega wrong by find­ing a full box and hav­ing \$1001000.

And this is pre­cisely what I would an­nounce to the peo­ple be­fore pub­li­cly open­ing the one box, and this is, if it is not a fraud, Omega would have known I would do.

As ot 100 times to prove some­thing that un­likely? Sie­grfied and Roy have made thou­sands of tigers ap­pear and dis­ap­pear in cages they could not have had suffi­cient ac­cess too. As odd as they are, it is un­likely (IMHO) that they are su­per­in­tel­li­gent aliens.

• If Omega has already left, I open box B first, take what­ever is in it, and then open box A.

I guess my cog­ni­tion just breaks down over the idea of Omega. To me, New­comb’s prob­lem seems akin to a the­olog­i­cal ar­gu­ment. Either we are talk­ing about a purely the­o­ret­i­cal idea that is meant to illus­trate ab­stract de­ci­sion the­ory, in which case I don’t care how many boxes I take, be­cause it has no bear­ing on any­thing tied to re­al­ity, or we are ac­tu­ally talk­ing about the real uni­verse, in which case I take both boxes be­cause I don’t be­lieve in alien su­per­in­tel­li­gences ca­pa­ble of fore­see­ing my choices any more than I be­lieve in an an­thro­po­mor­phic de­ity.

• If Omega has already left, I open box B first, take what­ever is in it, and then open box A.

La­bel­ing “I de­cide to lose” as a snark just seems odd.

I guess my cog­ni­tion just breaks down over the idea of Omega. To me, New­comb’s prob­lem seems akin to a the­olog­i­cal ar­gu­ment. Either we are talk­ing about a purely the­o­ret­i­cal idea that is meant to illus­trate ab­stract de­ci­sion the­ory, in which case I don’t care how many boxes I take, be­cause it has no bear­ing on any­thing tied to re­al­ity, or we are ac­tu­ally talk­ing about the real uni­verse, in which case I take both boxes be­cause I don’t be­lieve in alien su­per­in­tel­li­gences ca­pa­ble of fore­see­ing my choices any more than I be­lieve in an an­thro­po­mor­phic de­ity.

You are con­fused. Us­ing Omega is merely a sim­plifi­ca­tion of real pos­si­ble situ­a­tions. That is, any situ­a­tion in which you and the other player have some de­gree of mu­tual knowl­edge. Since those situ­a­tions are com­pli­cated they will some­times call for co­op­er­a­tion (one box­ing, here) but of­ten other con­sid­er­a­tions or in­suffi­cient mu­tual knowl­edge will over­ride and call for defec­tion (two box­ing).

If you wish to con­sider the effect of just, say, the mass of a cow then as­sum­ing a spher­i­cal cow in a vac­uum is use­ful. If the con­clu­sion you reach about the mass of said cow doesn’t suit you and you say “but there are no spher­i­cal cows in vac­u­ums!” then you are us­ing an ex­cuse to avoid bit­ing the bul­let, not show­ing your su­pe­rior aware­ness of re­al­ity.

• You are con­fused.

Yeah, that’s gen­er­ally what “I guess my cog­ni­tion breaks down” means.

If you wish to con­sider the effect of just, say, the mass of a cow then as­sum­ing a spher­i­cal cow in a vac­uum is use­ful. If the con­clu­sion you reach about the mass of said cow doesn’t suit you and you say “but there are no spher­i­cal cows in vac­u­ums!”

I think you can rea­son­ably ex­pect peo­ple to be­have in real life as if they ex­pect the laws of physics to ap­prox­i­mate rea­son­ably closely what new­to­nian me­chan­ics pre­dicts about spher­i­cal point masses. What I was say­ing, how­ever, is that you would be wrong to pre­dict that I defect in pris­on­ers’ dilem­mas based on my 2-box­ing, be­cause for me New­comb’s prob­lem isn’t con­nected to those prob­lems for rea­sons already stated. I hy­poth­e­size that I am not alone in that.

• What I was say­ing, how­ever, is that you would be wrong to pre­dict that I defect in pris­on­ers’ dilem­mas based on my 2-box­ing, be­cause for me New­comb’s prob­lem isn’t con­nected to those prob­lems for rea­sons already stated. I hy­poth­e­size that I am not alone in that.

And I said you are con­fused re­gard­ing this be­lief and the stated rea­sons. I don’t doubt that oth­ers are con­fused as well—it’s a rather com­mon re­sponse.

• I think it is im­por­tant to make a dis­tinc­tion be­tween what our choice is now, while we are here, sit­ting at a com­puter screen, un­con­fronted by Omega, and our choice when ac­tu­ally con­fronted by Omega. When ac­tu­ally con­fronted by Omega, your choice has been de­ter­mined. Take both boxes, take all the money. Right now, sit­ting in your comfy chair? Take the mil­lion-dol­lar box. In the comfy chair, the con­tra-fac­tual na­ture of the ex­per­i­ment ba­si­cally gives you an Out­come Pump. So take the mil­lion-dol­lar box, be­cause if you take the mil­lion-dol­lar box, it’s full of a mil­lion dol­lars. But when it ac­tu­ally hap­pens, the situ­a­tion is differ­ent. You aren’t in your comfy chair any­more.

• I’m not in my comfy chair any more, and I still take the mil­lion. Why wouldn’t I?

• Be­cause the mil­lion is already there, along with the thou­sand. Why not get all of it?

• The mil­lion isn’t there, be­cause Omega’s simu­la­tion was of you con­fronting Omega, not of you sit­ting in a comfy chair.

• You aren’t dou­ble­think­ing hard enough, then.

• I don’t know if this is a joke—I have a poor sense of hu­mour—but you do know Omega pre­dicts your ac­tual be­havi­our, right? As in, all things taken into ac­count, what you will ac­tu­ally do.

• I am be­ing some­what … ab­surd, and on pur­pose, at that. But I have enough ar­ro­gance ly­ing around in my brain to be­lieve that I can trick the su­per-in­tel­li­gence.

• Sorry—I’m always in­clined to take peo­ple on the in­ter­net liter­ally. I used to mess with my friends us­ing the same kind of ow-my-brain Pri­soner’s-dilemma som­er­saults, and still I couldn’t recog­nise a joke.

• That’s alright. My hu­mor, in real life, is based en­tirely on the fact that only I know I’m jok­ing at the time, and the other per­son won’t re­al­ize it un­til three days later, when they spon­ta­neously start laugh­ing for no rea­son they can safely ex­plain. Is that as­i­nine? Yes. Is it hilar­i­ous? Hell, yes. So I apol­o­gize. I’ll try not to do that.

• Yes. Is it hilar­i­ous? Hell, yes.

Not es­pe­cially, un­for­tu­nately. There is some­thing to be said for ap­pear­ing that you don’t give a @#%! whether other peo­ple get your hu­mor in real time but it works best if you care a whole lot about mak­ing your hu­mor funny to your au­di­ence at the time and then just act like you don’t care about the re­sponse you get. Even if peo­ple get your joke three days later you still typ­i­cally end up slightly worse off for the failed trans­ac­tion.

• Ah. Wrong refer­ent. It’s hilar­i­ous for me, and it may, at some point, be hilar­i­ous for them. But it’s mostly funny for me. That would be why I took time to men­tion that it was also, in fact, as­i­nine.

• Be­cause I’d end up with only a thou­sand, as op­posed to a mil­lion. And I want the mil­lion.

• It’s strange. I perfectly agree with the ar­gu­ment here about ra­tio­nal­ity—the ra­tio­nal­ity I want is the ra­tio­nal­ity that wins, not the ra­tio­nal­ity that is more rea­son­able. This agrees with my priv­ileg­ing truth as a lead­ing which is use­ful, not which nec­es­sar­ily makes the best pre­dic­tions. But in other points on the site, it always seems that cor­re­spon­dence is priv­ileged over value.

As for New­combs para­dox, I sug­gest writ­ing out all the rele­vant propo­si­tions a la Jaynes, with non-zero prob­a­bil­ities for all propo­si­tions. Make it a real prob­lem, not an ideal­ized and con­tra­dic­tory one—ba­si­cally the con­tra­dic­tion be­tween the re­ports of 100 ac­cu­rate tri­als by Omega, the as­sump­tion that there was no cheat­ing in­volved, the as­sump­tion about no re­verse time causal­ity, etc. If you do so, your pri­ors will tell you the right an­swer.

Ha—al­though I ex­pect your be­lief in for­ward time causal­ity is higher than your con­fi­dence in your use of Jaynes for­mal­ism.

• Well, for me there are two pos­si­ble hy­poth­e­sis for that :

1. The boxes are not what they seem. For ex­am­ple, box B con­tains nano-ma­chin­ery that de­tects if you one-box or not, cre­ate money if you one-box, and then self-de­struct the nano-ma­chin­ery.

2. Omega is smart enough to be able to pre­dict if I’ll one-box or two-box (he scanned my brain, runned it in a simu­la­tion, and saw my I do… I hope he didn’t turn off the simu­la­tion af­ter­wards, or he would have kil­led “me” then !).

In both cases, I should one-box. So I’ll one-box. I don’t re­ally get the ra­tio­nal for two-box­ing. Be it a type-1 or type-2 rea­son, in both cases, Omega is able to re­ward me for one-box­ing if that what he wants, and with 100 prior cases, he re­ally seems to be want­ing that.

• I see your gen­eral point, but it seems like the solu­tion to the Omega ex­am­ple is triv­ial if Omega is as­sumed to be able to pre­dict ac­cu­rately most of the time:
(let­ting C = Omega pre­dicted cor­rectly; let’s as­sume for sim­plic­ity that Omega’s fal­li­bil­ity is the same for false pos­i­tives and false nega­tives)

• if you chose just one box, your ex­pected util­ity is \$1M * P(C)

• if you chose both boxes, your ex­pected util­ity is \$1K + \$1M (1 - P(C))
Set­ting these equal to find the equil­ibrium point:
1000000
P(C) = 1000 + 1000000 (1 - P(C))
1000
P(C) = 1001 − 1000 P(C)
2000
P(C) = 1001
P(C) = 1001/​2000 = 0.5005 = 50.05%

So as long as you are at least 50.05% sure that Omega’s model of the uni­verse de­scribes you ac­cu­rately, you should pick the one box. It’s a lit­tle con­fus­ing be­cause it seems like cause pre­cedes effect in this situ­a­tion, but that’s not the case; your be­havi­our af­fects the be­havi­our of a simu­la­tion of you. As­sum­ing Omega is always right: if you take one box, then you are the type of per­son who would take the one box, and Omega will see that you are, and you will win. So it’s the clear choice.

• I don’t grasp why this prob­lem seems so hard and con­voluted. Of course you have to one-box, if you two-box you’ll lose for sure. From my per­spec­tive two-box­ing is ir­ra­tional...

If Omega can flawlessly pre­dict the fu­ture, this con­firms a de­ter­minis­tic world at the atomic scale. To be a perfect pre­dic­tor Omega would also need to have a perfect model of my brain at ev­ery stage of mak­ing my “de­ci­sion”—thus Omega can see the fu­ture and perfectly pre­dict whether or not I’m gonna two-box or not.

If my brain is wired up in such a way as to choose two-box­ing, then Omega will have pre­dicted that. It doesn’t mat­ter whether or not Omaga left already and box 1 already ei­ther con­tains 1M\$ or 0\$. No mat­ter how long I ru­mi­nate back and forth, if I two-box I’ve lost be­cause Omega is a perfect pre­dic­tor and would thus have pre­dicted it.

If Omega in­deed has all the prop­er­ties that are claimed, then there are only two pos­si­ble out­comes: If you take one box, you’ll get 1M\$, if you take two, then you get 1000\$. It is true, that box 1 ei­ther con­tains 1M\$ or noth­ing by the time Omega left—but what the box con­tains is still 100% cor­re­lated with my up­com­ing fi­nal de­ci­sion and noth­ing is go­ing to change that. End of story. Ergo, CDT is wrong and a model that’s at odds with re­al­ity.

PS: In­ter­est­ingly, if open­ing the lid on these boxes is the trig­ger mo­ment that counts as a “de­ci­sion”, you could just put the opaque box into an X-ray and this act alone would in­stantly trans­form Omega into a liar, re­gard­less of whether it con­tained 1M\$ or noth­ing. It couldn’t pos­si­bly show an empty box with­out mak­ing Omega a liar, be­cause con­trary to what it said I could no longer ac­tu­ally de­cide to open only box 1 and get the 1M\$. Con­versely, if the box does con­tain 1M\$, then I could just two-box, mak­ing Omega a liar with re­spect to its pre­dic­tion.

So Omega would HAVE TO speci­fi­cally for­bid peep­ing into the opaque box. If it didn’t do that, Omega would risk be­ing a liar one way or an­other, once I looked into the 1st box with­out open­ing it and ei­ther found 1M\$ or noth­ing.

• To perfectly model your thought pro­cesses, it would be enough that your brain ac­tivity be de­ter­minis­tic; it doesn’t fol­low that the uni­verse is de­ter­minis­tic. The fact that my com­puter can model a Nin­tendo well enough for me to play video games does not im­ply that a Nin­tendo is built out of de­ter­minis­tic el­e­men­tary par­ti­cles, and a Nin­tendo em­u­la­tor that simu­lated ev­ery el­e­men­tary par­ti­cle in­ter­ac­tion in the Nin­tendo it was em­u­lat­ing would be ridicu­lously in­effi­cient.

• The “no back­wards causal­ity” ar­gu­ment seems like a case of con­fla­tion of cor­re­la­tion and cau­sa­tion. Your de­ci­sion doesn’t retroac­tively cause Omega to fill the boxes in a cer­tain way; some prior state of the world causes your thought pro­cesses and Omega’s pre­dic­tion, and the cor­re­la­tion is ex­actly or al­most ex­actly 1.

• The “no back­wards causal­ity” ar­gu­ment seems like a case of con­fla­tion of cor­re­la­tion and cau­sa­tion. Your de­ci­sion doesn’t retroac­tively cause Omega to fill the boxes in a cer­tain way; some prior state of the world causes your thought pro­cesses and Omega’s pre­dic­tion, and the cor­re­la­tion is ex­actly or al­most ex­actly 1.

EDIT: Cor­re­la­tion co­effi­cients don’t work like that, but what­ever. You get what I mean.

• Ac­tu­ally I take it back. I think that what I would do de­pends on what I know of how Omega func­tions (ex­actly what ev­i­dence lead me to be­lieve that he was good at pre­dict­ing this).

Omega #1: (and I think this one is the most plau­si­ble) You are given a mul­ti­ple choice per­son­al­ity test (not know­ing what’s about to hap­pen). You are then told that you are in a New­comb situ­a­tion and that Omega’s pre­dic­tion is based on your test an­swers (maybe they’ll even show you Omega’s code af­ter the test is over). Here I’ll two-box. If I am pun­ished I am not be­ing pun­ished for my de­ci­sion to two-box, I am be­ing pun­ished for my test an­swers, and in re­al­ity am prob­a­bly be­ing pun­ished for hav­ing per­son­al­ity traits that cor­re­late well with be­ing a two-boxer. I can ra­tio­nally re­gret hav­ing the wrong per­son­al­ity traits.

Omega #2: You are sent through the New­comb dilemma, given an am­ne­sia pill and then sent through for real. Omega’s pre­dic­tion is what­ever you did the first time (this is similar to the simu­la­tion case). If I know this is go­ing on, I clearly one-box be­cause I don’t know whether this is the first time through or the sec­ond time through.

Omega #3: Omega makes his pre­dic­tion by ob­serv­ing me and us­ing a time ma­chine. Clearly I one-box.

Omega #4: It is in­scribed in the laws of physics some­where the Omega can­not make a pre­dic­tion that comes out wrong. Clearly I one-box.

But I think that the prob­lem as stated is ill posed since I don’t know what my prob­a­bil­ity dis­tri­bu­tion over Omegas should be (given that it de­pends a lot on ex­actly what ev­i­dence con­vinces me that Omega is ac­tu­ally a good pre­dic­tor).

• The first case di­rectly con­tra­dicts the speci­fi­ca­tions of the prob­lem, since the idea then be­comes to imag­ine you were the sort of per­son who would one-box and an­swer like that, then two box. This might not work for ev­ery­one, but a suffi­ciently clever agent should man­age it.

If you are imag­in­ing a per­son­al­ity test un­der­taken in se­cret, or be­fore you knew you were fac­ing New­comb’s prob­lem, and stat­ing you would two-box, then it seems like you one-box when it is ab­solutely cer­tain that omega is right, but two-box if you can think of some way (how­ever un­likely) that he might be wrong.

If you don’t see the prob­lem with this then I sug­gest you read some of the se­quence posts about ab­solute cer­tainty.

• In the first case, I image the test un­der­taken in se­cret. Or more re­al­is­ti­cally Omega mea­sures these per­son­al­ity traits from listen­ing to my con­ver­sa­tions, or read­ing things I post on­line.

I don’t de­cide based on whether there is a pos­si­bil­ity that Omega is wrong. #2 can cer­tainly be wrong (for ex­am­ple if I de­cide based on coin flip) and even #3 can prob­a­bly mess up. My point is that in case #1 the ar­gu­ment from the post no longer works. If I two-boxed and didn’t get \$1M, I might envy an­other per­son for their per­son­al­ity traits (which cor­re­late with one-box­ing), but not their de­ci­sion to one-box.

I think what I am try­ing to do is split Omega’s de­ci­sion pro­ce­dure into cases where ei­ther:

• His pre­dic­tion is clearly caused by my de­ci­sion (so I should one-box)

• His pre­dic­tion is not caused by my de­ci­sion (and so I can two-box with­out re­gret­ting my choice)

(#2 is a spe­cial case where I try to be clever.)

• Okay, I mi­s­un­der­stood you.

Even now, I think I would still one-box in case#1. For one thing, it is clearly in my in­ter­ests, think­ing about the prob­lem in ad­vance, to re­solve to do so, since the per­son­al­ity test will re­veal this fact and I will get the mil­lion.

Would you agree with me that far? If so, how do you han­dle the prob­lem that you seem to be mak­ing differ­ent de­ci­sions at differ­ent times, with­out re­ceiv­ing any new in­for­ma­tion in be­tween.

• Do you re­ally think that merely de­cid­ing to one-box in such a situ­a­tion would change your per­son­al­ity in a way that gets picked up by the test? If it does, do you want to mod­ify your per­son­al­ity in a mea­surable way just so that you can win if you hap­pen to run into a New­comb prob­lem?

Sup­pose for ex­am­ple it had been de­ter­mined em­piri­cally that whether or not one was re­li­gious cor­re­lated well with the num­ber of boxes you took. This could then be one of the things that the per­son­al­ity test mea­sures. Are you say­ing that a pre­com­mit­ment would change your re­li­gious be­liefs, or that you would change them in ad­di­tion to de­cid­ing to one-box (in which case, why are you chang­ing the lat­ter at all)?

The point in case 1 is that they are not mak­ing a di­rect mea­sure­ment of your de­ci­sion. They are merely mea­sur­ing ex­ter­nal fac­tors so that for 99% of peo­ple these fac­tors agree with their de­ci­sion (I think that this is im­plau­si­ble, but not sig­nifi­cantly more im­plau­si­ble than the ex­is­tence of Omega in the first place). It seems to me very un­likely that just chang­ing your mind on whether you should one-box would also au­to­mat­i­cally change these other fac­tors. And if it does, do you nec­es­sar­ily want to be mess­ing around with your per­son­al­ity just to win this game that will al­most cer­tainly never come up?

• If merely de­cid­ing to one-box is not picked up by the test, and does not offer even a slight in­crease in the prob­a­bil­ity that the money is there (even 51% as op­posed to 50% would be enough) then the test is not very good, in which case I would two-box. How­ever, this seems to con­tra­dict the stated fact the Omega is in fact a very good pre­dic­tor of de­ci­sions.

As a gen­eral prin­ci­ple, I am most definitely in­ter­ested in mod­ify­ing my per­son­al­ity to in­crease the num­ber of situ­a­tions in which I win. If I wasn’t, I prob­a­bly wouldn’t be on LW. The re­li­gion ex­am­ple is a straw­man, as it seems clear that ap­ply­ing the mod­ifi­ca­tion “be­lieve in God” will cause me to do worse in many other much more com­mon situ­a­tions, whereas “one-box in New­comb-type dilemma’s” doesn’t seem likely to have many side effects.

If Omega re­ally is just mea­sur­ing ex­ter­nal fac­tor’s, then how do you know he won’t pick up on my de­ci­sion to always one-box. The de­ci­sion was not made in a vac­uum, it was caused by my per­son­al­ity, my style of think­ing and my level of in­tel­li­gence, all of which are things hat any rea­son­ably com­pe­tent pre­dic­tor should pick up on.

As long as the test is rea­son­ably good, I will still my mil­lion with a higher prob­a­bil­ity, and that’s all that re­ally mat­ters to me.

• How about this ver­sion of Omega (and this is one that I think could ac­tu­ally be im­ple­mented to be 90% ac­cu­rate). First off, box A is painted with pic­tures of snakes and box B with pic­tures of ba­nanas. Omega’s pre­dic­tion pro­ce­dure is (and you are told this by the peo­ple run­ning the ex­per­i­ment) that if you are a hu­man he pre­dicts that you two-box and if you are a chim­panzee, he pre­dicts that you one-box.

I don’t think that 10% of peo­ple would give up \$1000 to prove Omega wrong, and if you think so, why not make it \$10^6 and \$10^9 in­stead of \$10^3 and \$10^6.

I feel like this ver­sion satis­fies the as­sump­tions of the prob­lem and makes it clear that you should two-box in this situ­a­tion. There­fore any claims that one-box­ing is the cor­rect solu­tion need to at least be qual­ified by ex­tra as­sump­tions about how Omega op­er­ates.

• In this ver­sion Omega may be pre­dict­ing de­ci­sion’s in gen­eral with some ac­cu­racy, but it does not seem like he is pre­dict­ing mine.

So it ap­pears there are cases where I two-box. I think in gen­eral my speci­fi­ca­tion of a New­comb-type prob­lem, has two re­quire­ments:

An out­side ob­server who ob­served me to two-box would pre­dict with high-prob­a­bil­ity that the money is not there. An out­side ob­server who ob­served me to one-box would pre­dict with high-prob­a­bil­ity that the money is there.

The above ver­sion of the prob­lem clearly does not meet the sec­ond re­quire­ment.

If this is what you meant by your state­ment that the prob­lem is am­bigu­ous, then I agree. This is one of the rea­sons I favour a for­mu­la­tion in­volv­ing a brain-scan­ner rather than a neb­u­lous godlike en­tity, since it seems more use­ful to fo­cus on the par­tic­u­larly para­dox­i­cal cases rather than the easy ones.

• I don’t think that you change of just that de­ci­sion would be picked up on a per­son­al­ity test. Your chang­ing that de­ci­sion is un­likely to change how you an­swer ques­tions not di­rectly re­lat­ing to New­comb’s prob­lem. The test would pick up your style of think­ing that lead you to this de­ci­sion, but mak­ing the de­ci­sion differ­ently would not change your style of think­ing. Per­haps an ex­am­ple that illus­trates my point even bet­ter:

Omega #1.1: Bases his pre­dic­tion on a ge­netic test.

Now I agree that it is un­likely that this will get 99% ac­cu­racy, but I think it could plau­si­bly ob­tain, say, 60% ac­cu­racy, which shouldn’t re­ally change the is­sue at hand. Re­mem­ber that Omega does not need to mea­sure things that cause you to de­cide one way or an­other, he just needs to mea­sure things that have a pos­i­tive cor­re­la­tion with it.

As for mod­ify­ing your per­son­al­ity… Should I re­ally be­lieve that you be­lieve that ar­gu­ments that you are mak­ing here, or are you just wor­ried that you are go­ing to be in this situ­a­tion and that Omega will base his pre­dic­tion on your posts?

• Good point with the ge­netic test ar­gu­ment, in that situ­a­tion I prob­a­bly would two-box. The same might ap­ply to any suffi­ciently poor per­son­al­ity test, or to a ver­sion of Omega that bases his de­ci­sion of the posts I make on Less Wrong (al­though I think if my sole rea­son for be­ing here was sig­nal­ling my will­ing­ness to make cer­tain choices in cer­tain dilemma’s I could prob­a­bly find bet­ter ways to do it).

I usu­ally imag­ine Omega does bet­ter than that, and that his meth­ods are at least as so­phis­ti­cated as figur­ing out how I make de­ci­sions, then ap­ply­ing that al­gorithm to the prob­lem at hand (the source of this as­sump­tion is that the first time I saw the prob­lem Omega was a su­per­com­puter that scanned peo­ple’s brains).

As for the per­son­al­ity mod­ifi­ca­tion thing, I re­ally don’t see what you find so im­plau­si­ble about the idea that I’m not at­tached to my flaws, and would elimi­nate them if I had the chance.

• I agree that the stan­dard in­ter­pre­ta­tion of Omega gen­er­ally in­volves brain scans. But there is still a differ­ence be­tween run­ning a simu­la­tion (Omega #2), or check­ing for rele­vant cor­re­lat­ing per­son­al­ity traits. The later I would claim is at least some­what analo­gous to ge­netic test­ing, though ad­mit­tedly the case is some­what murk­ier. I guess per­haps the Omega that is most in the spirit of the ques­tion is where he does a brain scan and searches for your cached an­swer of “this is what I do in New­comb prob­lems”.

As for per­son­al­ity mod­ifi­ca­tion, I don’t see why chang­ing my stored val­ues for how to be­have in New­comb situ­a­tions would change how I be­have in non-New­comb situ­a­tions. I also don’t see why these changes would nec­es­sar­ily be an im­prove­ment.

• “I don’t see why chang­ing my stored val­ues for how to be­have in New­comb situ­a­tions would change how I be­have in non-New­comb situ­a­tions.”

It wouldn’t, that’s the point. But it would im­prove your perfor­mance in New­comb situ­a­tions, so there’s no down­side (for an ex­am­ple of a new­comb type para­dox which could hap­pen in the real world, see Parfit’s hitch-hiker, given that I am not a perfect liar I would not con­sider it too un­likely that I will face a situ­a­tion of that gen­eral type (if not that ex­act situ­a­tion) at some point in my life).

• My point was that if it didn’t change your be­hav­ior in non-New­comb situ­a­tions, no rea­son­able ver­sion of Omega #1 (or re­ally any Omega that does not use ei­ther brain scans or lie de­tec­tion could tell the differ­ence).

As for chang­ing my ac­tions in the case of Parfit’s hitch-hiker, say that the chances of ac­tu­ally run­ning into this situ­a­tion (with some­one who can ac­tu­ally lie de­tect and in a situ­a­tion with no third al­ter­na­tives, and where my in­ter­nal sense of fair­ness wouldn’t just cause me to give him the \$100 any­way) is say 10^-9. This means that chang­ing my be­hav­ior would save me an ex­pected say 3 sec­onds of life. So if you have a way that I can ac­tu­ally pre­com­mit my­self that takes less than 3 sec­onds to do, I’m all ears.

• It wouldn’t have to be that ex­act situ­a­tion.

In fact, it is ap­pli­ca­ble in any situ­a­tion where you need to make a promise to some­one who has a rea­son­able chance of spot­ting if you lie (I don’t know about you but I of­ten get caught out when I lie), and while you pre­fer fol­low­ing through on the promise to not mak­ing it, you also pre­fer go­ing back on the promise to fol­low­ing through on it, (tech­ni­cally they need to have a good enough chance of spot­ting you, with “good enough” de­ter­mined by your rel­a­tive prefer­ences).

That’s quite a generic situ­a­tion, and I would es­ti­mate at least 10% prob­a­bil­ity that you en­counter it at some point, al­though the stakes will hope­fully be lower than your life.

• Per­haps. Though I be­lieve that in the vast ma­jor­ity of these cases my in­ter­nal (and per­haps ir­ra­tional) sense of fair­ness would cause me to keep my word any­way.

• 1) I would one-box. Here’s where I think the stan­dard two-boxer ar­gu­ment breaks down. It’s the idea of mak­ing a de­ci­sion. The two-boxer idea is that once the boxes have been fixed the course of ac­tion that makes the most money is tak­ing both boxes. Un­less there is re­verse causal­ity go­ing on here, I don’t think that any­one dis­putes this. If at that mo­ment you could make a choice to­tally in­de­pen­dently of ev­ery­thing lead­ing up to that point you would two-box. Un­for­tu­nately, the very ex­is­tence of Omega im­plies that such a feat is im­pos­si­ble.

2) A mildly silly ar­gu­ment for one-box­ing: Omega plau­si­bly makes his de­ci­sion by run­ning a simu­la­tion of you. If you are the real copy, it might be best to two-box, but if you are the simu­la­tion then one-box­ing earns real-you \$1000000. Since you can’t dis­t­in­guish whether this is real-you or simu­la­tion-you, you should one-box.

3) Would it change things for peo­ple if in­stead of \$1000000 vs \$1000 it were \$1001 vs \$1000? Where is the line drawn?

4) Eliezer: just cu­ri­ous about how you deal with para­doxes about in­finity in your util­ity func­tion. If for each n, on day n you are offered to sac­ri­fice one unit of util­ity that day to gain one unit of util­ity on day 2n and one unit on day 2n+1 what do you do? Each time you do it you seem to gain a unit of util­ity, but if you do it ev­ery day you end up worse than you started.

• 4) Eliezer: just cu­ri­ous about how you deal with para­doxes about in­finity in your util­ity func­tion. If for each n, on day n you are offered to sac­ri­fice one unit of util­ity that day to gain one unit of util­ity on day 2n and one unit on day 2n+1 what do you do? Each time you do it you seem to gain a unit of util­ity, but if you do it ev­ery day you end up worse than you started.

dankane, Eliezer an­swered your ques­tion in this com­ment, and maybe some­where else, too, that I don’t yet know of.

• If he wasn’t re­ally talk­ing about in­fini­ties, how would you parse this com­ment (the liv­ing for­ever part):

“There is no finite amount of life lived N where I would pre­fer a 80.0001% prob­a­bil­ity of liv­ing N years to an 0.0001% chance of liv­ing a googol­plex years and an 80% chance of liv­ing for­ever.”

At very least this should im­ply that for ev­ery N there is an f(N) so that he would rather have a 50% chance of liv­ing f(N) years and a 50% chance of dy­ing in­stantly than hav­ing a 100% chance of liv­ing for N years. We could then con­sider the game where if he is go­ing to live for N years he is re­peat­edly offered the chance to in­stead live f(N) years with 50% prob­a­bil­ity and 0 years with 50% prob­a­bil­ity. Tak­ing the bet n+1 times clearly does bet­ter than tak­ing it n times, but the strat­egy “take the bet un­til you lose” guaran­tees him a very short life ex­pec­tancy.

If your util­ity func­tion is un­bounded you can run into para­doxes like this.

• A way of think­ing of this “para­dox” that I’ve found helpful is to see the two-boxer as imag­in­ing more out­comes than there ac­tu­ally are. For a pay­off ma­trix of this sce­nario, the two-boxer would draw four pos­si­ble out­comes: \$0, \$1000, \$1000000, and \$1001000 and would try for \$1000 or \$1001000. But if Omega is a perfect pre­dic­tor, than the two that in­volve it mak­ing a mis­take (\$0 and \$1001000) are very un­likely. The one-boxer sees only the two plau­si­ble op­tions and goes for \$1000000.

• The link to that the­sis doesn’t seem to work for me.

A quick google turned up one that does

• What this is re­ally say­ing is “if some­thing im­pos­si­ble (ac­cord­ing to your cur­rent the­ory of the world) ac­tu­ally hap­pens, then rather than in­sist­ing it’s im­pos­si­ble and ig­nor­ing it, you should re­vise your the­ory to say that’s pos­si­ble”. In this case, the im­pos­si­ble thing is re­verse causal­ity; since we are told of ev­i­dence that re­verse causal­ity has hap­pened in the form of 100 suc­cess­ful pre­vi­ous ex­per­i­ments, we must re­vise our the­ory to ac­cept that re­verse causal­ity ac­tu­ally can hap­pen. This would lead us to the con­clu­sion that we should take one box. Alter­na­tively, we could de­cide that our sup­posed ev­i­dence is un­trust­wor­thy and that we are be­ing lied to when we are told that Omega made 100 suc­cess­ful pre­dic­tions – we might think that this prob­lem de­scribes a non­sen­si­cal, im­pos­si­ble situ­a­tion, similarly to if we were told that there was a bar­ber who shaves ev­ery­one who does not shave them­self.

• Re: “Do you take both boxes, or only box B?”

It would sure be nice to get hold of some more data about the “100 ob­served oc­ca­sions so far”. If Omega only vis­its two-box­ers—or tries to min­imise his out­go­ings—it would be good to know that. Such in­for­ma­tion might well be ac­cessible—if we have enough in­for­ma­tion about Omega to be con­vinced of his ex­is­tence in the first place.

• My solu­tion to the prob­lem of the two boxes:

Flip a coin. If heads, both A & B. If tails, only A. (If the su­per­in­tel­li­gence can pre­dict a coin flip, make it a ra­dioac­tive de­cay or some­thing. Eat quan­tum, Hal.)

In all se­ri­ous­ness, this is a very odd prob­lem (I love it!). Of course two boxes is the ra­tio­nal solu­tion—it’s not as if post-facto cog­i­ta­tion is go­ing to change any­thing. But the prob­lem state­ment seems to im­ply that it is ac­tu­ally im­pos­si­ble for me to choose the choice I don’t choose, i.e., choice is ac­tu­ally im­pos­si­ble.

Some­thing is ab­surd here. I sus­pect it’s the idea that my choice is to­tally pre­dictable. There can be a ran­dom el­e­ment to my choice if I so choose, which kills Omega’s plan.

• What wedrifid said. See also Ra­tion­al­ity is Sys­tem­atized Win­ning and the sec­tion of What Do We Mean By “Ra­tion­al­ity”? about “In­stru­men­tal Ra­tion­al­ity”, which is gen­er­ally what we mean here when we talk about ac­tions be­ing ra­tio­nal or ir­ra­tional. If you want to get more money, than the in­stru­men­tally ra­tio­nal ac­tion is the epistem­i­cally ra­tio­nal an­swer to the ques­tion “What course of ac­tion will cause me to get the most money?”.

If you ac­cept the premises of Omega thought ex­per­i­ments, then the right an­swer is one-box­ing, pe­riod. If you don’t ac­cept the premises, it doesn’t make sense for you to be an­swer­ing it one way or the other.

• I thought about this last night and also came to the con­clu­sion that ran­dom­iz­ing my choice would not “as­sume the worst” as I ought to.

And I fully ac­cept that this is just a thought ex­per­i­ment & physics is a cheap way out. I will now take the premises or leave them. :)

• Of course two boxes is the ra­tio­nal solu­tion—it’s not as if post-facto cog­i­ta­tion is go­ing to change any­thing.

No it isn’t. If you like money it is ra­tio­nal to get more money. Take one box.

• I sus­pect it’s the idea that my choice is to­tally predictable

At face, that does sound ab­surd. The prob­lem is that you are un­der­es­ti­mat­ing a su­per­in­tel­li­gence. Imag­ine that the uni­verse is a com­puter simu­la­tion, so that a set of phys­i­cal laws plus a very, very long string of ran­dom num­bers is a com­plete causal model of re­al­ity. The su­per­in­tel­li­gence knows the laws and all of the ran­dom num­bers. You still make a choice, even though that choice ul­ti­mately de­pends on ev­ery­thing that pre­ceded it. See http://​​wiki.less­wrong.com/​​wiki/​​Free_will

I think much of the de­bate about New­comb’s Prob­lem is about the defi­ni­tion of su­per­in­tel­li­gence.

• It is a com­mon as­sump­tion in these sorts of prob­lems that if Omega pre­dicts that you will con­di­tion your choice on a quan­tum event, it will not put the money in Box B.

• I’m a bit ner­vous, this is my first com­ment here, and I feel quite out of my league.

Re­gard­ing the “free will” as­pect, can one game the sys­tem? My ra­tio­nal choice would be to sit right there, arms crossed, and choose no box. In­stead, hav­ing thus dis­proved Omega’s in­fal­li­bil­ity, I’d wait for Omega to come back around, and try to weasel some knowl­edge out of her.

Ra­tion­ally, the in­tel­li­gence that could model mine and pre­dict my likely ac­tion (yet fail to pre­dict my in­ac­tion enough to not bother with me in the first place), is an in­tel­li­gence I’d like to have a chat with. That chat would be likely to have tremen­dously more util­ity for me than \$1,000,000.

Is that a valid choice? Does it dis­prove Omega’s in­fal­li­bil­ity? Is it a ra­tio­nal choice?

If mess­ing with the ques­tion is not a con­struc­tive ad­di­tion to the de­bate, ac­cept my apolo­gies, and flame me lightly, please.

• Hi. This is a rather old post, so you might not get too many replies.

New­comb’s prob­lem of­ten comes with the caveat that, if Omega thinks you’re go­ing to game the sys­tem, it will leave you with only the \$1,000. But yes, we like clever an­swers here, al­though we also like to con­sider, for the pur­poses of thought ex­per­i­ments, the least con­ve­nient pos­si­ble world in which the loop­holes we find have been closed.

Also, may I sug­gest vis­it­ing the wel­come thread?

• I would use a true quan­tum ran­dom gen­er­a­tor. 51% of the time I would take only one box. Other­wise I would take two boxes. Thus Omega has to guess that I will only take one box, but I have a 49% chance of tak­ing home an­other \$1000. My ex­pected win­nings will be \$1000490 and I am per Eliezer’s defi­ni­tion more ra­tio­nal than he.

• This is why I restate the prob­lem to ex­clude the mil­lion when peo­ple choose ran­domly.

• Eliezer, why didn’t you an­swer the ques­tion I asked at the be­gin­ning of the com­ment sec­tion of this post?

• I think I’ve solved it.

I’m a lit­tle late to this, and given the amount of time peo­ple smarter than my­self have spent think­ing about this it seems naive even to my­self to think that I have found a solu­tion to this prob­lem. That be­ing said, try as I might, I can’t find a good counter ar­gu­ment to this line of rea­son­ing. Here goes...

The hu­man brain’s func­tion is still mostly a black box to us, but the demon­strated pre­dic­tive power of this alien is strong ev­i­dence that this is not the case with him. If he re­ally can pre­dict hu­man de­ci­sions, than the mere fact that you are choos­ing one box is the best way for you to en­sure that will be what is pre­dicted.

The stan­dard at­tack on this line of rea­son­ing seems to be that since his pre­dic­tion hap­pened in the past, your de­ci­sion can’t in­fluence it. But it already has in­fluenced it. He was aware of the de­ci­sion be­fore you made it (ev­i­denced by his pre­dic­tive power). In fact, it is not re­ally a de­ci­sion in the sense of “freely” choos­ing one of two op­tions (in the way that most peo­ple use “freely” at least). Think of this de­ci­sion as just ex­tremely com­pli­cated and seem­ingly un­pre­dictable data anal­y­sis, where the un­pre­dictabil­ity comes from never be­ing able to know in­ti­mately ev­ery part of the de­ci­sion pro­cess and the in­puts. But if one could perfectly crack the “black box” of your de­ci­sion, as this alien ap­pears to have done (at least this seems by far the most plau­si­ble ex­pla­na­tion to me) then one could pre­dict de­ci­sions with the ac­cu­racy the alien pos­sesses. In other words, the gears were already in mo­tion for your de­ci­sion to be made, and the alien was already wit­ness whether you re­al­ized it or not. In that sense you aren’t mak­ing your de­ci­sion af­ter­words when you think you are, you are ac­tu­ally re­al­iz­ing the de­ci­sion that you were already set up to make at an ear­lier time.

If you agree with what I have writ­ten above, your ob­vi­ous best de­ci­sion is to just go ahead and pick one box, and hope that the alien would have pre­dicted this. Based on the ev­i­dence, that will prob­a­bly be enough to make the one mil­lion show up. De­cid­ing in­stead to go for two boxes for any rea­son what­so­ever will prob­a­bly mean that the mil­lion won’t be there. The time is­sue is just an illu­sion caused by your im­perfect knowl­edge and data pro­cess­ing that takes time.

• Re: First, fore­most, fun­da­men­tally, above all else: Ra­tional agents should WIN.

When Deep Blue beat Gary Kas­parov, did that prove that Gary Kas­parov was “ir­ra­tional”?

It seems as though it would be un­rea­son­able to ex­pect even highly ra­tio­nal agents to win—if pit­ted against su­pe­rior com­pe­ti­tion. Ra­tional agents can lose in other ways as well—e.g. by not hav­ing ac­cess to use­ful in­for­ma­tion.

Since there are plenty of ways in which ra­tio­nal agents can lose, “win­ning” seems un­likely to be part of a rea­son­able defi­ni­tion of ra­tio­nal­ity.

• thinks—Okay, so if I un­der­stand you cor­rectly now, the es­sen­tial thing I was miss­ing that you meant to im­ply was that the util­ity of liv­ing for­ever must nec­es­sar­ily be equal to (can­not be larger than) the limit of the util­ities of liv­ing a finite num­ber of years. Then, if u(live for­ever) is finite, p times the differ­ence be­tween u(live for­ever) and u(live n years) must be­come ar­bi­trar­ily small, and thus, even­tu­ally smaller than q times the differ­ence be­tween u(live n years) and u(live googol­plex years). You then ar­rive at a con­tra­dic­tion, from which you con­clude that u(live for­ever) = the limit of u(live n years) can­not be finite. Okay. Without the qual­ifi­ca­tion I was miss­ing, the con­di­tion wouldn’t be in­con­sis­tent with a bounded util­ity func­tion, since the differ­ence wouldn’t have to get ar­bi­trar­ily small, but the qual­ifi­ca­tion cer­tainly seems rea­son­able.

(I would still pre­fer for all pos­si­bil­ities con­sid­ered to have defined util­ities, which would mean ex­tend­ing the range of the util­ity func­tion be­yond the real num­bers, which would mean that u(live for­ever) would, tech­ni­cally, be an up­per bound for {u(live n years) | n in N} -- that’s what I had in mind in my last para­graph above. But you’re not re­quired to share my prefer­ences on fram­ing the is­sue, of course :-))

• Benja, the no­tion is that “live for­ever” does not have any finite util­ity, since it is bounded be­low by a se­ries of finite life­times whose util­ity in­creases with­out bound.

• Given how many times Eliezer has linked to it, it’s a lit­tle sur­pris­ing that no­body seems to have picked up on this yet, but the para­graph about the util­ity func­tion not be­ing up for grabs seems to have a pretty se­ri­ous tech­ni­cal flaw:

There is no finite amount of life lived N where I would pre­fer a 80.0001% prob­a­bil­ity of liv­ing N years to an 0.0001% chance of liv­ing a googol­plex years and an 80% chance of liv­ing for­ever. This is a suffi­cient con­di­tion to im­ply that my util­ity func­tion is un­bounded.

Let p = 80% and let q be one in a mil­lion. I’m pretty sure that what Eliezer has in mind is,

(A) For all n, there is an even larger n’ such that (p+q)u(live n years) < pu(live n’ years) + q*(live a googol­plex years).

This in­deed means that {u(live n’ years) | n’ in N} is not up­wards bounded—I did check the math :-) --, which means that u is not up­wards bounded, which means that u is not bounded. But what he ac­tu­ally said was,

(B) For all n, (p+q)u(live n years) ⇐ pu(live for­ever) + q*u(live googol­plex years)

That’s not only differ­ent from A, it con­tra­dicts A! It doesn’t im­ply that u needs to be bounded, of course, but it flat out states that {u(live n years) | n in N} is up­wards bounded by (pu(live for­ever) + qu(live googol­plex years))/​(p+q).

(We may per­haps see this as rea­son enough to ex­tend the do­main of our util­ity func­tion to some su­per­set of the real num­bers. In that case it’s no longer nec­es­sary for the util­ity func­tion to be un­bounded to satisfy (A), though—al­though we might in­vent a new con­di­tion like “not bounded by a real num­ber.”)

• If ran­dom num­ber gen­er­a­tors not de­ter­minable by Omega ex­ist, gen­er­ate one bit of en­tropy. If not, take the mil­lion bucks. Quan­tum ran­dom­ness any­one?

• As a ra­tio­nal­ist, it might be worth­while to take the one box just so those Omega know-it-alls will be wrong for once.

• Some­how I’d never thought of this as a ra­tio­nal­ist’s dilemma, but rather a de­ter­minism vs free will illus­tra­tion. I still see it that way. You can­not both be­lieve you have a choice AND that Omega has perfect pre­dic­tion.

The only “ra­tio­nal” (in all senses of the word) re­sponse I sup­port is: shut up and mul­ti­ply. Es­ti­mate the chance that he has pre­dicted wrong, and if that gives you +ex­pected value, take both boxes. I phrase this as ad­vice, but in fact I mean it as pre­dic­tion of ra­tio­nal be­hav­ior.

• One be­lated point, some peo­ple seem to think that Omega’s suc­cess­ful pre­dic­tion is vir­tu­ally im­pos­si­ble and that the ex­per­i­ment is a purely fan­ciful spec­u­la­tion. How­ever it seems to me en­tirely plau­si­ble that hav­ing you fill out a ques­tion­naire while be­ing brain scanned might well bring this situ­a­tion into prac­ti­cal­ity in the near fu­ture. The ques­tions, if filled out cor­rectly, could char­ac­ter­ize your per­son­al­ity type with enough ac­cu­racy to give a very strong pre­dic­tion about what you will do. And if you lie, in the fu­ture that might be de­tected with a brain scan. I don’t see any­thing about this sce­nario which is ab­surd, im­pos­si­ble, or even par­tic­u­larly low prob­a­bil­ity. The one prob­lem is that there might well be a cer­tain frac­tion of peo­ple for whom you re­ally can’t pre­dict what they’ll do, be­cause they’re right on the edge and will de­cide more or less at ran­dom. But you could ex­clude them from the ex­per­i­ment and just give those with solid pre­dic­tions a shot at the boxes.

• Okay, maybe I am stupid, maybe I am un­fa­mil­iar with all the liter­a­ture on the prob­lem, maybe my English sucks, but I fail to un­der­stand the fol­low­ing:
-
Is the agent aware of the fact that one box­ers get 1 000 000 at the mo­ment Omega “scans” him and pre­sents the boxes?

OR

OR

Is agent un­aware of the fact that Omega re­wards one-box­ers at all?
-
P.S.: Also, as most “de­ci­sion para­doxes”, this one will have differ­ent solu­tions de­pend­ing on the con­text (is the agent a starv­ing child in Africa, or a “mega­corp” CEO)

• “If it ever turns out that Bayes fails—re­ceives sys­tem­at­i­cally lower re­wards on some prob­lem, rel­a­tive to a su­pe­rior al­ter­na­tive, in virtue of its mere de­ci­sions—then Bayes has to go out the win­dow.”

What ex­actly do you mean by mere de­ci­sions? I can con­struct prob­lems where agents that use few com­pu­ta­tional re­sources win. Bayesian agents by your own ad­mis­sion have to use en­ergy to get in mu­tual in­for­ma­tion with the en­vi­ron­ment (a state I am still sus­pecious of), so they have to use en­ergy, mean­ing they lose.

• the dom­i­nant con­sen­sus in mod­ern de­ci­sion the­ory is that one should two-box...there’s a com­mon at­ti­tude that “Ver­bal ar­gu­ments for one-box­ing are easy to come by, what’s hard is de­vel­op­ing a good de­ci­sion the­ory that one-boxes”

Those are con­trary po­si­tions, right?

Robin Ha­son:
Pu­n­ish­ment is or­di­nary, but New­comb’s prob­lem is sim­ple! You can’t have both.

The ad­van­tage of an or­di­nary situ­a­tion like pun­ish­ment is that game the­o­rists can’t deny the fact on the ground that gov­ern­ments ex­ist, but they can claim it’s be­cause we’re all ir­ra­tional, which doesn’t leave many di­rec­tions to go in.

• Paul G, al­most cer­tainly, right? Still, as you say, it has lit­tle bear­ing on one’s an­swer to the ques­tion.

In fact, not true, it does. Is there any­thing to stop my­self mak­ing a men­tal pact with all my simu­la­tion bud­dies (and ‘my­self’, who­ever he be) to go for Box B?

• When the stakes are high enough I one-box, while grit­ting my teeth. Other­wise, I’m more in­ter­ested in demon­strat­ing my “ra­tio­nal­ity” (Eliezer has con­vinced me to use those quotes).

Per­haps we could just spec­ify an agent that uses re­verse cau­sa­tion in only par­tic­u­lar situ­a­tions, as it seems that hu­mans are ca­pa­ble of do­ing.

• they would just in­sist that there is an im­por­tant differ­ence be­tween de­cid­ing to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am

But that’s ex­actly what strate­gic in­con­sis­tency is about. Even if you had de­cided to take only box B at 7:00am, by 7:06am a ra­tio­nal agent will just change his mind and choose to take both boxes. Omega knows this, hence it will put noth­ing into box B. The only way out is if the AI self-com­mits to take only box B is a way that’s ver­ifi­able by Omega.

• How about sim­ply mul­ti­ply­ing? Treat Omega as a fair coin toss. 50% of a mil­lion is half-a-mil­lion, and that’s vastly big­ger than a thou­sand. You can ig­nore the ques­tion of whether omega has filled the box, in de­cid­ing that the un­cer­tain box is more im­por­tant. So much more im­por­tant, that the chance of gain­ing an ex­tra 1000 isn’t worth the bother of try­ing to beat the puz­zle. You just grab the im­por­tant box.

• Eliezer, if a smart crea­ture mod­ifies it­self in or­der to gain strate­gic ad­van­tages from com­mit­ting it­self to fu­ture ac­tions, it must think could bet­ter achieve its goals by do­ing so. If so, why should we be con­cerned, if those goals do not con­flict with our goals?

• Paul, if we were de­ter­mined, what would you mean when you say that “we ought not to care”? Do you mean to say that the out­come would be bet­ter if we didn’t care? The fact that the car­ing is part of the causal chain does have some­thing to do with this: the out­come may be de­ter­mined by whether or not we care. So if you con­sider one out­come bet­ter than an­other (only one re­ally pos­si­ble, but both pos­si­ble as far as you know), then ei­ther “car­ing” or “not car­ing” might be prefer­able, de­pend­ing on which one would lead to each out­come.

• I do un­der­stand. My point is that we ought not to care whether we’re go­ing to con­sider all the pos­si­bil­ities and benefits.

Oh, but you say, our car­ing about our con­sid­er­a­tion pro­cess is a de­ter­mined part of the causal chain lead­ing to our con­sid­er­a­tion pro­cess, and thus to the out­come.

Oh, but I say, we ought not to care* about that car­ing. Again, re­curse as needed. Noth­ing you can say about the fact that a cog­ni­tion is in the causal chain lead­ing to a state of af­fairs counts as a point against the claim that we ought not to care about whether or not we have that cog­ni­tion if it’s un­avoid­able.

• Eliezer, I don’t read the main thrust of your post as be­ing about New­comb’s prob­lem per se. Hav­ing dis­t­in­guished be­tween ‘ra­tio­nal­ity as means’ to what­ever end you choose, and ‘ra­tio­nal­ity as a way of dis­crim­i­nat­ing be­tween ends’, can we agree that the whole specks /​ tor­ture de­bate was some­thing of a red her­ring ? Red her­ring, be­cause it was a dis­cus­sion on us­ing ra­tio­nal­ity to dis­crim­i­nate be­tween ends, with­out hav­ing first defined one’s meta-ob­jec­tives, or, if one’s meta-ob­jec­tives in­volved he­do­nism, es­tab­lish­ing the rules for perform­ing math over sub­jec­tive ex­pe­riences. To illus­trate the dis­tinc­tion us­ing your other ex­am­ple, I could state that I pre­fer to save 400 lives cer­tainly, sim­ply be­cause the pur­ple fairy in my closet tells me to (my ar­bi­trary preferred ob­jec­tive), and that would be perfectly le­gi­t­i­mate. It would only be in­co­her­ent if I also de­clared it to be a strat­egy which would max­imise the num­ber of lives saved if a ma­jor­ity of peo­ple adopted it in similar cir­cum­stances (a differ­ent ar­bi­trary preferred ob­jec­tive). I could in fact have as preferred meta-ob­jec­tive for the uni­verse that all the squilth in flob­juck­stooge be glob­berised, and that would be perfectly le­gi­t­i­mate. An FAI (or a BFG, for that mat­ter (Roald Dahl, not Tom Hall)) could scan me and work to­wards cre­at­ing the uni­verse in which my propo­si­tion is mean­ingful, and make sure it hap­pens. If now some­one else’s preferred meta-ob­jec­tive for the uni­verse is en­sur­ing that the princess on page 3 gets a fairy cake, how is the FAI to pri­ori­tise ?

• Eleizer: whether or not a fixed fu­ture poses a prob­lem for moral­ity is a hotly dis­puted ques­tion which even I don’t want to touch. For­tu­nately, this prob­lem is one that is pretty much wholly or­thog­o­nal to moral­ity. :-)

But I feel like in the pre­sent prob­lem the fixed fu­ture is­sue is a key to dis­solv­ing the prob­lem. So, as­sume the box de­ci­sion is fixed. It need not be the case that the stress is fixed too. If the stress isn’t fixed, then it can’t be rele­vant to the box de­ci­sion (the box is fixed re­gard­less of your de­ci­sion be­tween stress and no-stress). If the stress IS fixed, then there’s no de­ci­sion left to take. (Ex­cept pos­si­bly whether or not to stress about the stress, call that stress*, and re­curse the ar­gu­ment ac­cord­ingly.)

In gen­eral, for any pair of ac­tions X and Y, where X is de­ter­mined, ei­ther X is con­di­tional on Y, in which case Y must also be de­ter­mined, or not con­di­tional on Y, in which case Y can be ei­ther de­ter­mined or non-de­ter­mined. So ap­peal­ing to Y as part of the pro­cess that leads to X doesn’t mean that some­thing we could do to Y makes a differ­ence if X is de­ter­mined.

• The in­ter­est­ing thing about this game is that Omega has mag­i­cal su­per-pow­ers that al­low him to know whether or not you will back out on your com­mit­ment ahead of time, and so you can make your com­mit­ment cred­ible by not be­ing go­ing to back out on your com­mit­ment. If that makes any sense.

• We don’t even need a su­per­in­tel­li­gence. We can prob­a­bly pre­dict on the ba­sis of per­son­al­ity type a per­son’s de­ci­sion in this prob­lem with an 80% ac­cu­racy, which is already suffi­cient that a ra­tio­nal per­son would choose only box B.

• If we as­sume that Omega al­most never makes a mis­take and we al­low the chooser to use true ran­dom­iza­tion (per­haps by us­ing quan­tum physics) in mak­ing his choice, then Omega must make his de­ci­sion in part through see­ing into the fu­ture. In this case the chooser should ob­vi­ously pick just B.

• Laura,

Once we can model the prob­a­bil­ities of the var­i­ous out­comes in a non­con­tro­ver­sial fash­ion, the spe­cific choice to make de­pends on the util­ity of the var­i­ous out­comes. \$1,001,000 might be only marginally bet­ter than \$1,000,000 -- or that ex­tra \$1,000 could have some sig­nifi­cant ex­tra util­ity.

• I’d love to say I’d find some way of pick­ing ran­domly just to piss Omega off, but I’d prob­a­bly just one-box it. A mil­lion bucks is a lot of money.

• It’s of­ten stipu­lated that if Omega pre­dicts you’ll use some ran­dom­izer it can’t pre­dict, it’ll pun­ish you by act­ing as if it pre­dicted two-box­ing.

• New­comb’s prob­lem doesn’t spec­ify how Omega chooses the ‘cus­tomers’. It’s a quite re­al­is­tic pos­si­bil­ity that it sim­ply has not offered the choice to any­one that would use a ran­dom­izer, and cher­ryp­icked only the peo­ple which have at least 99.9% ‘pre­dic­tion strength’.

• (And the most favourable plau­si­ble out­come for ran­dom­iz­ing would be scal­ing the pay­off ap­pro­pri­ately to the prob­a­bil­ity as­signed.)

• Would that make you a su­per­su­per­in­tel­li­gence? Since I pre­sume by “pick­ing ran­domly” you mean ran­domly to Omega, in other words Omega can­not find and pro­cess enough in­for­ma­tion to pre­dict you well.

Other­wise what does “pick­ing ran­domly” mean?

• The defi­ni­tion of omega as some­thing that can pre­dict your ac­tions leads it to have some weird pow­ers. You could pick a box based on the out­come of a quan­tum event with a 50% chance, then omega would have to van­ish in a puff of phys­i­cal im­plau­si­bil­ity.

• I sus­pect Omega would know you were go­ing to do that, and would be able to put the box in a su­per­po­si­tion de­pen­dent on the same quan­tum event, so that in the branches where you 1-box, box B con­tains \$1mil­lion, and where you 2-box it’s empty.

• What’s wrong with Omega pre­dict­ing a “quan­tum event”? “50% chance” is not an ob­jec­tive state­ment, and it may well be that Omega can pre­dict quan­tum events. (If not, can you ex­plain why not, or re­fer me to an ex­pla­na­tion?)

• From wikipedia

“In the for­mal­ism of quan­tum me­chan­ics, the state of a sys­tem at a given time is de­scribed by a com­plex wave func­tion (some­times referred to as or­bitals in the case of atomic elec­trons), and more gen­er­ally, el­e­ments of a com­plex vec­tor space.[9] This ab­stract math­e­mat­i­cal ob­ject al­lows for the calcu­la­tion of prob­a­bil­ities of out­comes of con­crete ex­per­i­ments.”

This is the best for­mal­ism we have for pre­dict­ing things at this scale and it only spits out prob­a­bil­ities. I would be sur­prised if some­thing did a lot bet­ter!

• As I un­der­stand it, prob­a­bil­ities are ob­served be­cause there are ob­servers in two differ­ent am­pli­tude blobs of con­figu­ra­tion space (to use the lan­guage of the quan­tum physics se­quence) but “the one we are in” ap­pears to be ran­dom to us. And math­e­mat­i­cally I think quan­tum me­chan­ics is the same un­der this view in which there is no “in­her­ent, phys­i­cal” ran­dom­ness (so it would still be the best for­mal­ism we have for pre­dict­ing things).

Could you say what “phys­i­cal ran­dom­ness” could be if we don’t al­low refer­ence to quan­tum me­chan­ics? (i.e. is that the only ex­am­ple? and more to the point, does the no­tion make any sense?)

• You seem to have tran­si­tioned to an­other ar­gu­ment here… please clar­ify what this has to do with omega and its abil­ity to pre­dict your ac­tions.

• The new ar­gu­ment is about whether there might be in­her­ently un­pre­dictable things. If not, then your pick­ing a box based on the out­come of a “quan­tum event” shouldn’t make Omega any less phys­i­cally plau­si­ble,

• What I didn’t un­der­stand is why you re­moved quan­tum ex­per­i­ments from the dis­cus­sion. I be­lieve it is very plau­si­ble to have some­thing that is phys­i­cally un­pre­dictable, as long as the thing do­ing the pre­dict­ing is bound by the same laws as what you are try­ing to pre­dict.

Con­sider a world made of re­versible bi­nary gates with the same num­ber of in­puts as out­puts (that is ev­ery in­put has a unique out­put, and vice versa).

We want to pre­dict one com­plex gate. Not a prob­lem, just clone all the in­puts and copy the gate. How­ever you have to do that only us­ing re­versible bi­nary gates. Lets start with clon­ing the bits.

In is what you are try­ing to copy with­out mod­ify­ing so that you can pre­dict what af­fect it will have on the rest of the sys­tem. You need a min­i­mum of two out­puts, so you need an­other in­put B.

You get to cre­ate the gate in or­der to copy the bit and pre­dict the sys­tem. The ideal truth table looks some­thing like

`In | B | Out | Copy`

`0 | 0 | 0 | 0`

`0 | 1 | 0 | 0`

`1 | 0 | 1 | 1`

`1 | 1 | 1 | 1`

This vi­o­lates our re­versibil­ity as­sump­tion. The best copier we could make is

`In | B | Out | Copy`

`0 | 0 | 0 | 0`

`0 | 1 | 1 | 0`

`1 | 0 | 0 | 1`

`1 | 1 | 1 | 1`

This copies pre­cisely, but mucks up the out­put mak­ing our copy use­less for pre­dic­tion. If you could con­trol B, or knew the value of B then we could cor­rect the Out­put. But as I have shown here find­ing out the value of a bit is non-triv­ial. The best we could do would be to find sources of bits with statis­ti­cally pre­dictable prop­er­ties then use them for du­pli­cat­ing other bits.

The world is ex­pected to be re­versible, and the no clon­ing the­o­rem ap­plies to re­al­ity which I think is stric­ter than my ex­am­ple. How­ever I hope I have shown how a sim­ple lawful uni­verse can be hard to pre­dict by some­thing in­side it.

In short, stop think­ing of your­self (and Omega) as an ob­server out­side physics that does not in­ter­act with the world. Copy­ing is dis­turb­ing.

• I be­lieve it is very plau­si­ble to have some­thing that is phys­i­cally un­pre­dictable, as long as the thing do­ing the pre­dict­ing is bound by the same laws as what you are try­ing to pre­dict.

[at­tempted proof omit­ted]

I hope I have shown how a sim­ple lawful uni­verse can be hard to pre­dict by some­thing in­side it.

In short, stop think­ing of your­self (and Omega) as an ob­server out­side physics that does not in­ter­act with the world. Copy­ing is dis­turb­ing.

Even though I do not have time to re­flect on the at­tempted proof and even though the at­tempted proof is best de­scribed as a stab at a sketch of a proof and even though this “re­versible logic gates” ap­proach to a proof prob­a­bly can­not be turned into an ac­tual proof and even though Nick Tar­leton just ex­plained why the “one box or two box de­pend­ing on an in­her­ently un­pre­dictable event” strat­egy is not par­tic­u­larly rele­vant to New­comb’s, I voted this up and I con­grat­u­late the au­thor (wh­pear­son) be­cause it is an at­tempt at an origi­nal proof of some­thing very cool (namely, limits to an agent’s abil­ity to learn about its en­vi­ron­ment) and IMHO prob­a­bly rele­vant to the Friendli­ness pro­ject. More proofs and in­formed stabs at proofs, please!

• Hmm, changed my mind, should have thought more be­fore writ­ing… the EDT virus has early symp­toms of caus­ing peo­ple to use EDT be­fore pro­gress­ing to ter­rible ill­ness and death. It seems EDT would then recom­mend not us­ing EDT.

• If you won’t ex­plic­itly state your anal­y­sis, maybe we can try 20 ques­tions?

I have sus­pected that sup­posed “para­doxes” of ev­i­den­tial de­ci­sion the­ory oc­cur be­cause not all the ev­i­dence was con­sid­ered. For ex­am­ple, the fact that you are us­ing ev­i­den­tial de­ci­sion the­ory to make the de­ci­sion.

Agree/​dis­agree?

• Really? A Phd ? Se­ri­ously ?

If Omega said “You shall only take Box B or I will smite thee” and then pro­ceeded to smite a 100 in­fidels who dared to two box the ra­tio­nal choice would be ob­vi­ous (es­pe­cially if the smit­ing hap­pened af­ter O left)

is this re­ally difficult to show math­e­mat­i­cly ?

• I would take box B, be­cause it would be empty.

• Sorry if this has already been ad­dressed. I didn’t take the time to read all 300 com­ments.

It seems to me that if there were an om­ni­scient Omega, the world would be de­ter­minis­tic, and you wouldn’t have free will. You have the illu­sion of choice, but your choice is already known by Omega. Hence, try (it’s fu­tile) to make your illu­sory choice a one-boxer.

Per­son­ally, I don’t be­lieve in de­ter­minism or the con­cept of Omega. This is a nice thought ex­per­i­ment though.

• How does adding in­de­ter­minism help make the prob­lem go away? If Omega only pre­dicts cor­rectly 99% of the time, what gets clar­ified?

• You are bet­ting a pos­i­tive ex­tra pay­out of \$1,000 against a net loss of -\$999,000 that there are no Black Swans[1] at all in this situ­a­tion.

Given that you already have 100 points of ev­i­dence that tak­ing Box A makes Box B empty (added to the ev­i­dence that Omega is more in­tel­li­gent than you). I’d say that’s a Bad Bet to make.

Given the amount of un­cer­tainty in the world, choos­ing Box B in­stead of try­ing to “beat the sys­tem” seems like the ra­tio­nal step to me.

Edit I’ve given the Math in a com­ment be­low to show how to calcu­late when to make ei­ther de­ci­sion.

[1] ie some­thing you didn’t think of that makes Box B empty even af­ter Omega’s gone away, or an in­visi­ble portkey in box B that is ac­ti­vated the mo­ment you pick up Box A, or Omega’s time-ma­chine that let him go for­ward to see your de­ci­sion be­fore putting the money into the boxes… or a de­vice us­ing some hand-wavey quan­tum state that lets ei­ther Box A be taken or Box B’s con­tents to ex­ist…

• So work­ing the math on that

Let P(BS) = prob­a­bil­ity of a Black Swan be­ing involved

This makes the av­er­age pay­out work out to:

1-Box = \$1,000,000

2-Box = \$1,001,000 (1 - P(BS)) + \$1,000 P(BS)

Now it seems to be that the av­er­age 2-boxer is as­sum­ing that P(BS) = 0, which would make the 2-Box solu­tion always == \$1,001,000 which would, of course, always beat the 1-box solu­tion.

and maybe in this toy-prob­lem, they’re right to as­sume P(BS) = 0 But IRL that’s al­most never the case—af­ter all, 0 is not a prob­a­bil­ity yes?

So as­sume that P(BS) is non-zero. t what point would it be worth it to choose the 1-Box solu­tion and what point the 2-Box solu­tion? Lets run the math:

1,000,000 = 1,001,000(1-x) + 1000x = 1001000 − 1001000x + 1000x = 1001000 - (1002000x)

=> 1000000 − 1001000 = −1002000x

=> x = −1000/​-100200

=> x = 0.000998004

So, the es­ti­mated prob­a­bil­ity of Black Swan ex­ist­ing only has to be greater than 0.0998% for the 1-Box solu­tion to have a greater ex­pected pay­out and there­fore the 1-Box op­tion is the more ra­tio­nal::Bayesian choice

OTOH, if you can guaran­tee that P(BS) is less than 0.0998%, then the ra­tio­nal choice is to 2-Box.