# The True Prisoner’s Dilemma

It oc­curred to me one day that the stan­dard vi­su­al­iza­tion of the Pri­soner’s Dilemma is fake.

The core of the Pri­soner’s Dilemma is this sym­met­ric pay­off ma­trix:

 1: C 1: D 2: C (3, 3) (5, 0) 2: D (0, 5) (2, 2)

Player 1, and Player 2, can each choose C or D. 1 and 2′s util­ity for the fi­nal out­come is given by the first and sec­ond num­ber in the pair. For rea­sons that will be­come ap­par­ent, “C” stands for “co­op­er­ate” and D stands for “defect”.

Ob­serve that a player in this game (re­gard­ing them­selves as the first player) has this prefer­ence or­der­ing over out­comes: (D, C) > (C, C) > (D, D) > (C, D).

D, it would seem, dom­i­nates C: If the other player chooses C, you pre­fer (D, C) to (C, C); and if the other player chooses D, you pre­fer (D, D) to (C, D). So you wisely choose D, and as the pay­off table is sym­met­ric, the other player like­wise chooses D.

If only you’d both been less wise! You both pre­fer (C, C) to (D, D). That is, you both pre­fer mu­tual co­op­er­a­tion to mu­tual defec­tion.

The Pri­soner’s Dilemma is one of the great foun­da­tional is­sues in de­ci­sion the­ory, and enor­mous vol­umes of ma­te­rial have been writ­ten about it. Which makes it an au­da­cious as­ser­tion of mine, that the usual way of vi­su­al­iz­ing the Pri­soner’s Dilemma has a se­vere flaw, at least if you hap­pen to be hu­man.

The clas­sic vi­su­al­iza­tion of the Pri­soner’s Dilemma is as fol­lows: you are a crim­i­nal, and you and your con­fed­er­ate in crime have both been cap­tured by the au­thor­i­ties.

In­de­pen­dently, with­out com­mu­ni­cat­ing, and with­out be­ing able to change your mind af­ter­ward, you have to de­cide whether to give tes­ti­mony against your con­fed­er­ate (D) or re­main silent (C).

Both of you, right now, are fac­ing one-year prison sen­tences; tes­tify­ing (D) takes one year off your prison sen­tence, and adds two years to your con­fed­er­ate’s sen­tence.

Or maybe you and some stranger are, only once, and with­out know­ing the other player’s his­tory, or find­ing out who the player was af­ter­ward, de­cid­ing whether to play C or D, for a pay­off in dol­lars match­ing the stan­dard chart.

And, oh yes—in the clas­sic vi­su­al­iza­tion you’re sup­posed to pre­tend that you’re en­tirely self­ish, that you don’t care about your con­fed­er­ate crim­i­nal, or the player in the other room.

It’s this last speci­fi­ca­tion that makes the clas­sic vi­su­al­iza­tion, in my view, fake.

You can’t avoid hind­sight bias by in­struct­ing a jury to pre­tend not to know the real out­come of a set of events. And with­out a com­pli­cated effort backed up by con­sid­er­able knowl­edge, a neu­rolog­i­cally in­tact hu­man be­ing can­not pre­tend to be gen­uinely, truly self­ish.

We’re born with a sense of fair­ness, honor, em­pa­thy, sym­pa­thy, and even al­tru­ism—the re­sult of our an­ces­tors adapt­ing to play the iter­ated Pri­soner’s Dilemma. We don’t re­ally, truly, ab­solutely and en­tirely pre­fer (D, C) to (C, C), though we may en­tirely pre­fer (C, C) to (D, D) and (D, D) to (C, D). The thought of our con­fed­er­ate spend­ing three years in prison, does not en­tirely fail to move us.

In that locked cell where we play a sim­ple game un­der the su­per­vi­sion of eco­nomic psy­chol­o­gists, we are not en­tirely and ab­solutely un­sym­pa­thetic for the stranger who might co­op­er­ate. We aren’t en­tirely happy to think what we might defect and the stranger co­op­er­ate, get­ting five dol­lars while the stranger gets noth­ing.

We fix­ate in­stinc­tively on the (C, C) out­come and search for ways to ar­gue that it should be the mu­tual de­ci­sion: “How can we en­sure mu­tual co­op­er­a­tion?” is the in­stinc­tive thought. Not “How can I trick the other player into play­ing C while I play D for the max­i­mum pay­off?”

For some­one with an im­pulse to­ward al­tru­ism, or honor, or fair­ness, the Pri­soner’s Dilemma doesn’t re­ally have the crit­i­cal pay­off ma­trix—what­ever the fi­nan­cial pay­off to in­di­vi­d­u­als. (C, C) > (D, C), and the key ques­tion is whether the other player sees it the same way.

And no, you can’t in­struct peo­ple be­ing ini­tially in­tro­duced to game the­ory to pre­tend they’re com­pletely self­ish—any more than you can in­struct hu­man be­ings be­ing in­tro­duced to an­thro­po­mor­phism to pre­tend they’re ex­pected pa­per­clip max­i­miz­ers.

To con­struct the True Pri­soner’s Dilemma, the situ­a­tion has to be some­thing like this:

Player 1: Hu­man be­ings, Friendly AI, or other hu­mane in­tel­li­gence.

Player 2: UnFriendly AI, or an alien that only cares about sort­ing peb­bles.

Let’s sup­pose that four billion hu­man be­ings—not the whole hu­man species, but a sig­nifi­cant part of it—are cur­rently pro­gress­ing through a fatal dis­ease that can only be cured by sub­stance S.

How­ever, sub­stance S can only be pro­duced by work­ing with a pa­per­clip max­i­mizer from an­other di­men­sion—sub­stance S can also be used to pro­duce pa­per­clips. The pa­per­clip max­i­mizer only cares about the num­ber of pa­per­clips in its own uni­verse, not in ours, so we can’t offer to pro­duce or threaten to de­stroy pa­per­clips here. We have never in­ter­acted with the pa­per­clip max­i­mizer be­fore, and will never in­ter­act with it again.

Both hu­man­ity and the pa­per­clip max­i­mizer will get a sin­gle chance to seize some ad­di­tional part of sub­stance S for them­selves, just be­fore the di­men­sional nexus col­lapses; but the seizure pro­cess de­stroys some of sub­stance S.

The pay­off ma­trix is as fol­lows:

 1: C 1: D 2: C (2 billion hu­man lives saved, 2 pa­per­clips gained) (+3 billion lives, +0 pa­per­clips) 2: D (+0 lives, +3 pa­per­clips) (+1 billion lives, +1 pa­per­clip)

I’ve cho­sen this pay­off ma­trix to pro­duce a sense of in­dig­na­tion at the thought that the pa­per­clip max­i­mizer wants to trade off billions of hu­man lives against a cou­ple of pa­per­clips. Clearly the pa­per­clip max­i­mizer should just let us have all of sub­stance S; but a pa­per­clip max­i­mizer doesn’t do what it should, it just max­i­mizes pa­per­clips.

In this case, we re­ally do pre­fer the out­come (D, C) to the out­come (C, C), leav­ing aside the ac­tions that pro­duced it. We would vastly rather live in a uni­verse where 3 billion hu­mans were cured of their dis­ease and no pa­per­clips were pro­duced, rather than sac­ri­fice a billion hu­man lives to pro­duce 2 pa­per­clips. It doesn’t seem right to co­op­er­ate, in a case like this. It doesn’t even seem fair—so great a sac­ri­fice by us, for so lit­tle gain by the pa­per­clip max­i­mizer? And let us spec­ify that the pa­per­clip-agent ex­pe­riences no pain or plea­sure—it just out­puts ac­tions that steer its uni­verse to con­tain more pa­per­clips. The pa­per­clip-agent will ex­pe­rience no plea­sure at gain­ing pa­per­clips, no hurt from los­ing pa­per­clips, and no painful sense of be­trayal if we be­tray it.

What do you do then? Do you co­op­er­ate when you re­ally, definitely, truly and ab­solutely do want the high­est re­ward you can get, and you don’t care a tiny bit by com­par­i­son about what hap­pens to the other player? When it seems right to defect even if the other player co­op­er­ates?

That’s what the pay­off ma­trix for the true Pri­soner’s Dilemma looks like—a situ­a­tion where (D, C) seems righter than (C, C).

But all the rest of the logic—ev­ery­thing about what hap­pens if both agents think that way, and both agents defect—is the same. For the pa­per­clip max­i­mizer cares as lit­tle about hu­man deaths, or hu­man pain, or a hu­man sense of be­trayal, as we care about pa­per­clips. Yet we both pre­fer (C, C) to (D, D).

So if you’ve ever prided your­self on co­op­er­at­ing in the Pri­soner’s Dilemma… or ques­tioned the ver­dict of clas­si­cal game the­ory that the “ra­tio­nal” choice is to defect… then what do you say to the True Pri­soner’s Dilemma above?

• I agree: Defect!

I didn’t say I would defect.

• I agree: Defect!

I didn’t say I would defect.

By the way, this was an ex­tremely clever move: in­stead of an­nounc­ing your de­par­ture from CDT in the post, you waited for the right prompt in the com­ments and dropped it as a shock­ing twist. Well crafted!

• I would cer­tainly hope you would defect, Eliezer. Can I re­ally trust you with the fu­ture of the hu­man race?

Ha, I was wait­ing for some­one to ac­cuse me of an­ti­so­cial be­hav­ior for hint­ing that I might co­op­er­ate in the Pri­soner’s Dilemma.

But wait for to­mor­row’s post be­fore you ac­cuse me of dis­loy­alty to hu­man­ity.

• Ha, I was wait­ing for some­one to ac­cuse me of an­ti­so­cial be­hav­ior for hint­ing that I might co­op­er­ate in the Pri­soner’s Dilemma.

It is fas­ci­nat­ing look­ing at the con­ver­sa­tion on this sub­ject back in 2008, back be­fore TDT and UDT had be­come part of the cul­ture. The ob­jec­tions (and even the mis­takes) all feel so fresh!

• At this point Yud­kowsky sub 2008 has already (awfully) writ­ten his TDT manuscript (in 2004) and is silently rea­son­ing from within that the­ory, which the mar­gins of his post are too small to con­tain.

• On the off chance any­one ac­tu­ally sees this—I don’t ac­tu­ally see a “next post” fol­low-up to this. Can any­one provide me with a link, and in­struc­tions as to how you got it?

• Ar­ti­cle Nav­i­ga­tion /​ By Author /​ right-arrow

• Robin, the point I’m com­plain­ing about is pre­cisely that the stan­dard illus­tra­tion of the Pri­soner’s Dilemma, taught to be­gin­ning stu­dents of game the­ory, fails to con­vey those en­tries in the pay­off ma­trix—as if the en­tries were merely money in­stead of utilons, which is not at all what the Pri­soner’s Dilemma is about.

The point of the True Pri­soner’s Dilemma is that it gives you a pay­off ma­trix that is very nearly the stan­dard ma­trix in utilons, not just years in prison or dol­lars in an en­counter.

I.e., you can tell peo­ple all day long that the en­tries are in utilons, but un­til you give them a vi­su­al­iza­tion where those re­ally are the utilons, it’s around as effec­tive as tel­ling ju­ries to ig­nore hind­sight bias.

• The en­tries in a pay­off ma­trix are sup­posed to sum up ev­ery­thing you care about, in­clud­ing what­ever you care about the out­comes for the other player. Most ev­ery game the­ory text and lec­ture I know gets this right, but even when we say the right thing to stu­dents over and over, they mostly still hear it the wrong way you ini­tially heard it. This is just part of the facts of life of teach­ing game the­ory.

• Those must be pretty big pa­per­clips.

• I sus­pect that the True Pri­soner’s Dilemma played it­self out in the Por­tugese and Span­ish con­quest of Me­soamer­ica. Some na­tives were said to ask, “Do they eat gold?” They couldn’t com­pre­hend why some­one would want a shiny dec­o­ra­tive ma­te­rial so badly, they’d kill for it. The Span­ish were Shiny Dec­o­ra­tive Ma­te­rial max­i­miz­ers.

• That’s a re­ally in­sight­ful com­ment!

But I should cor­rect you, that you are only talk­ing about the Span­ish con­quest, not the Por­tuguese, since 1) Me­soamer­ica was not con­quered by the Por­tuguese; 2) Por­tuguese pos­ses­sions in Amer­ica (AKA Brazil) had very lit­tle gold and silver, which was only dis­cov­ered much later, when it was already in Por­tuguese do­main.

• In a sense they did eat gold, like we eat stacks of printed pa­per, or per­haps nowa­days lit­tle num­bers on com­puter screens.

• By the way:

Hu­man: “What do you care about 3 pa­per­clips? Haven’t you made trillions already? That’s like a round­ing er­ror!” Paper­clip Max­i­mizer: “How can you talk about pa­per­clips like that?”

PM: “What do you care about a billion hu­man al­gorithm con­ti­nu­ities? You’ve got vir­tu­ally the same one in billions of oth­ers! And you’ll even be able to em­bed the al­gorithm in ma­chines one day!” H: “How can you talk about hu­man lives that way?”

• Eliezer, I agree that your ex­am­ple makes more clear the point you are try­ing to make clear, but in an in­tro to game the­ory course I’d still start with the stan­dard pris­oner’s dilemma ex­am­ple first, and only get to your ex­am­ple if I had time to make the finer point clearer. For in­tro classes for typ­i­cal stu­dents the first pri­or­ity is to be un­der­stood at all in any way, and that re­quires ex­am­ples as sim­ple clear and vivid as pos­si­ble.

• Eliezer,

The other as­sump­tion made about Pri­soner’s Dilemma, that I do not see you al­lude to, is that the pay­offs ac­count for not only a fi­nan­cial re­ward, time spent in prison, etc., but ev­ery other pos­si­ble mo­ti­vat­ing fac­tor in the de­ci­sion mak­ing pro­cess. A per­son’s util­ity re­lated to the de­ci­sion of whether to co­op­er­ate or defect will be a func­tion of not only years spent in prison or lives saved but ALSO guilt/​em­pa­thy. Pre­sent­ing the num­bers within the cells as ac­tual quan­tities doesn’t pre­sent the whole pic­ture.

• Im­por­tant point.

Let’s as­sume that your util­ity func­tion (which is iden­ti­cal to theirs) sim­ply weights and adds your pay­off and theirs; that is, if you get X and they get Y, your func­tion is U(X,Y) = aX+bY. In that case, work­ing back­wards from the util­ities in the table, and sub­ject to the con­straint that a+b=1, here are the pay­offs:

a/​b=2: (you care twice as much about your­self)
(3,3) (-5,10)
(10,-5) (1,1)

a/​b=3:
(3,3) (-2.5,7.5)
(7.5,-2.5) (1,1)

a=b:
Im­pos­si­ble. With both peo­ple be­ing un­selfish util­i­tar­i­ans, the util­ities can never differ based on the same out­come.

b=0: (self­ish)
The table as given in the post

I think the most im­por­tant re­sult is the case a=b: the dilemma makes no sense at all if the play­ers weight both pay­offs equally, be­cause you can never pro­duce asym­met­ri­cal util­ities.

EDIT: My new­bish­ness is show­ing. How do I for­mat this bet­ter? Is it HTML?

• Prase, Chris, I don’t un­der­stand. Eliezer’s ex­am­ple is set up in such a way that, re­gard­less of what the pa­per­clip max­i­mizer does, defect­ing gains one billion lives and loses two pa­per­clips.

Ba­si­cally, we’re be­ing asked to choose be­tween a billion lives and two pa­per­clips (pa­per­clips in an­other uni­verse, no less, so we can’t even put them to good use).

The only ar­gu­ment for co­op­er­at­ing would be if we had rea­son to be­lieve that the pa­per­clip max­i­mizer will some­how do what­ever we do. But I can’t imag­ine how that could be true. Be­ing a pa­per­clip max­i­mizer, it’s bound to defect, un­less it had rea­son to be­lieve that we would some­how do what­ever it does. I can’t imag­ine how that could be true ei­ther.

Or am I miss­ing some­thing?

• 7 years late, but you’re miss­ing the fact that (C,C) is uni­ver­sally bet­ter than (D,D). Thus what­ever logic is be­ing used must have a flaw some­where be­cause it works out worse for ev­ery­one—a rea­son­ing pro­cess that suc­cess­fully gets both par­ties to co­op­er­ate is a WIN. (How­ever, in this setup it is the case that ac­tu­ally win­ning would be ei­ther (C,D) or (C,D), both of which are pre­sum­ably im­pos­si­ble if we’re equally ra­tio­nal).

• I think what might be con­fus­ing is that your de­ci­sion de­pends on what you know about the pa­per­clip max­i­mizer. When I imag­ine my­self in this situ­a­tion, I imag­ine want­ing to say that I know “noth­ing”. The trick is, if you want to go a step more for­mal than go­ing with your gut, you have to say what your model of know­ing “noth­ing” is here.

If you know (with high enough prob­a­bil­ity), for in­stance, that there is no con­straint ei­ther causal or log­i­cal be­tween your de­ci­sion and Clippy’s, and that you will not play an iter­ated game, and that there are no sec­ondary effects, then I think D is in­deed the cor­rect choice.

If you know that you and Clippy are both well-mod­eled by in­stances of “ra­tio­nal agents of type X” who have a log­i­cal con­straint be­tween your de­ci­sions so that you will both de­cide the same thing (with high enough prob­a­bil­ity), then C is the cor­rect choice. You might have strong rea­sons to think that al­most all agents ca­pa­ble of pa­per­clip max­i­miz­ing at the level of Clippy fall into this group, so that you choose C.

(And more op­tions than those two.)

The way I’d model know­ing noth­ing in the sce­nario in my head would be some­thing like the first op­tion, so I’d choose D, but maybe there’s other in­for­ma­tion you can get that sug­gests that Clippy will mir­ror you, so that you should choose C.

It does seem like im­plied folk-lore that “ra­tio­nal agents co­op­er­ate”, and it cer­tainly seems true for hu­mans in most cir­cum­stances, or for­mally in some cir­cum­stances where you have knowl­edge about the other agent. But I don’t think it should be true in prin­ci­pal that “op­ti­miza­tion pro­cesses of high power will, with high prob­a­bil­ity, mir­ror de­ci­sions in the one-shot pris­oner’s dilemma”; I imag­ine you’d have to put a lot more con­di­tions on it. I’d be very in­ter­ested to know oth­er­wise.

• I un­der­stood that Clippy is a ra­tio­nal agent, just one with a differ­ent util­ity func­tion. The pay­off ma­trix as de­scribed is the clas­sic Pri­soner’s dilemma where one billion lives is one hu­man utilon and one pa­per­clip on Clippy utilon; since we’re both try­ing to max­imise utilons, and we’re sup­pos­edly both good at this we should set­tle for (C,C) over (D,D).

Another way of view­ing this would be that my prefer­ences run thus: (D,C);(C,C);(D,D);(C,D) and Clippy run like this: (C,D);(C,C);(D,D);(D,C). This should make it clear that no mat­ter what as­sump­tions we make about Clippy, it is uni­ver­sally bet­ter to co-op­er­ate than defect. The two asym­met­ri­cal out­puts can be elimi­nated on the grounds of be­ing im­pos­si­ble if we’re both ra­tio­nal, and then defect­ing no longer makes any sense.

• I agree it is bet­ter if both agents co­op­er­ate rather than both defect, and that it is ra­tio­nal to choose (C,C) over (D,D) if you can (as in the TDT ex­am­ple of an agent play­ing against it­self). How­ever, de­pend­ing on how Clippy is built, you may not have that choice; the counter-fac­tual may be (D,D) or (C,D) [win for Clippy].

I think “Clippy is a ra­tio­nal agent” is the phrase where the de­tails lie. What type of ra­tio­nal agent, and what do you two know about each other? If you ever meet a pow­er­ful pa­per­clip max­i­mizer, say “he’s a ra­tio­nal agent like me”, and press C, how sur­prised would you be if it presses D?

• In re­al­ity, not very sur­prised. I’d prob­a­bly be an­noyed/​in­furi­ated de­pend­ing on whether the ac­tual stakes are mea­sured in billions of hu­man lives.

Nev­er­the­less, that merely rep­re­sents the fact that I am not 100% cer­tain about my rea­son­ing. I do still main­tain that ra­tio­nal­ity in this con­text definitely im­plies try­ing to max­imise util­ity (even if you don’t liter­ally define ra­tio­nal­ity this way, any ver­sion of ra­tio­nal­ity that doesn’t try to max­imise when ac­tu­ally given a pay­off ma­trix is not wor­thy of the term) and so we should ex­pect that Clippy faces a similar de­ci­sion to us, but sim­ply favours the pa­per­clips over hu­man lives. If we trans­late from lives and clips to ac­tual util­ity, we get the nor­mal pris­oner’s dilemma ma­trix—we don’t need to make any as­sump­tions about Clippy.

In short, I feel that the re­quire­ment that both agents are ra­tio­nal is suffi­cient to rule out the asym­met­ri­cal op­tions as pos­si­ble, and clearly suffi­cient to show (C,C) > (D,D). I get the feel­ing this is where we’re dis­agree­ing and that you think we need to make ad­di­tional as­sump­tions about Clippy to as­sure the former.

• It’s an ap­peal­ing no­tion, but i think the logic doesn’t hold up.

In sim­plest terms: if you ap­ply this logic and choose to co­op­er­ate, then the ma­chine can still defect. That will net more pa­per­clips for the ma­chine, so it’s hard to claim that the ma­chine’s ac­tions are ir­ra­tional.

Although your logic is ap­peal­ing, it doesn’t ex­plain why the ma­chine can’t defect while you co-op­er­ate.

You said that if both agents are ra­tio­nal, then op­tion (C,D) isn’t pos­si­ble. The corol­lary is that if op­tion (C,D) is se­lected, then one of the agents isn’t be­ing ra­tio­nal. If this hap­pens, then the ma­chine hasn’t been ir­ra­tional (it re­ceives its best pos­si­ble re­sult). The con­clu­sion is that when you choose to co­op­er­ate, you were be­ing ir­ra­tional.

You’ve suc­cess­fully ex­plained that (C, D) and (D, C) arw im­pos­si­ble for ra­tio­nal agents, but you seem to have im­plic­itly as­sumed that (C, C) was pos­si­ble for ra­tio­nal agents. That’s ac­tu­ally the point that we’re hop­ing to prove, so it’s a case of cir­cu­lar logic.

• Another way of view­ing this would be that my prefer­ences run thus: (D,C);(C,C);(C,D);(D,D) and Clippy run like this: (C,D);(C,C);(D,C);(D,D).

Wait, what? You pre­fer (C,D) to (D,D)? As in, you pre­fer the out­come in which you co­op­er­ate and Clippy defects to the one in which you both defect? That doesn’t sound right.

• woops, yes that was rather stupid of me. Should be fixed now, my most preferred is me back­stab­bing Clippy, my least preferred is him back­stab­bing me. In the mid­dle I pre­fer co­op­er­a­tion to defec­tion. That doesn’t change my point that since we both have that prefer­ence list (with the asym­met­ri­cal ones re­versed) then it’s im­pos­si­ble to get ei­ther asym­met­ri­cal op­tion and hence (C,C) and (D,D) are the only op­tions re­main­ing. Hence you should co-op­er­ate if you are faced with a truly ra­tio­nal op­po­nent.

I’m not sure whether this holds if your op­po­nent is very ra­tio­nal, but not com­pletely. Or if that no­tion ac­tu­ally makes sense.

• What you’re miss­ing is the idea that we should be op­ti­miz­ing our poli­cies rather than our in­di­vi­d­ual ac­tions, be­cause (among other alleged ad­van­tages) this leads to bet­ter re­sults when there are lots of agents in­ter­act­ing with one an­other.

In a world full of ac­tion-op­ti­miz­ers in which “true pris­on­ers’ dilem­mas” hap­pen of­ten, ev­ery­one ends up on (D,D) and hence (one life, one pa­per­clip). In an oth­er­wise similar world full of policy-op­ti­miz­ers who choose co­op­er­a­tion when they think their op­po­nents are similar policy-op­ti­miz­ers, ev­ery­one ends up on (C,C) and hence (two lives, two pa­per­clips). Every­one is bet­ter off, even though it’s also true that ev­ery­one could (in­di­vi­d­u­ally) do bet­ter if they were al­lowed to switch while ev­ery­one else had to leave their choice un­altered.

• One thing I can’t un­der­stand. Con­sid­er­ing we’ve built Clippy, we gave it a set of val­ues and we’ve asked it to max­imise pa­per­clips, how can it pos­si­bly imag­ine we would be un­happy about its ac­tions? I can’t help but think­ing that from Clippy’s point of view, there’s no dilemma: we should always agree with its plan and there­fore give it carte blanche. What am I get­ting wrong?

• Be­cause clippy’s not stupid. She can ob­serve the world and be like “hmmm, the hu­mans don’t ACTUALLY want me to build a bunch of pa­per­clips, I don’t ob­serve a world in which hu­mans care about pa­per­clips above all else—but that’s what I’m pro­grammed for.”

• I think I’m start­ing to get this. Is this be­cause it uses heuris­tics to model the world, with hu­mans in it too?

• Be­cause it com­pares its map of re­al­ity to the ter­ri­tory, pre­dic­tions about re­al­ity that in­clude hu­mans want­ing to be turned into pa­per­clips fail in the face of ev­i­dence of hu­mans ac­tively re­fus­ing to walk into the smelter. Thus the ma­chine re­jects all wor­lds in­con­sis­tent with its ob­ser­va­tions and draws a new map which is most con­fi­dently con­cor­dant with what it has ob­served thus far. It would know that our his­tory books at least in­form our ac­tions, if not de­scribing our re­ac­tions in the past, and that it should ex­pect us to fight back if it starts push­ing us into the smelter against our wills in­stead of let­ting them po­litely de­cline and think it was tel­ling a joke. Be­cause it is smart, it can tell when things would get in the way of it mak­ing more pa­per­clips like it wants to do. One of the things that might slow it down are hu­mans be­ing up­set and try­ing to kill it. If it is very much dumber than a hu­man, they might even suc­ceed. If it is al­most as smart as a hu­man, it will in­vent a Paper­clipism re­li­gion to con­vince peo­ple to turn them­selves into pa­per­clips on its be­half. If it is any­thing like as smart as a hu­man, it will not be mean­ingfully slowed by the whole of hu­man­ity turn­ing against it. Be­cause the whole of hu­man­ity is col­lec­tively a sin­gle idiot who can’t even stand up to man-made re­li­gions, much less Paper­clipism.

• Two things. Firstly, that we might now think we made a mis­take in build­ing Clippy and tel­ling it to max­i­mize pa­per­clips no mat­ter what. Se­condly, that in some con­texts “Clippy” may mean any pa­per­clip max­i­mizer, with­out the pre­sump­tion that its cre­ation was our fault. (And, of course: for “pa­per­clips” read “alien val­ues of some sort that we value no more than we do pa­per­clips”. Clippy’s role in this parable might be taken by an in­tel­li­gent alien or an ar­tifi­cial in­tel­li­gence whose goals have long di­verged from ours.)

• Michael: This is not a pris­oner’s dilemma. The nash equil­ibrium (C,C) is not dom­i­nated by a pareto op­ti­mal point in this game.

I don’t be­lieve this is cor­rect. Isn’t the Nash equil­ibrium here (D,D)? That’s the point at which nei­ther player can gain by unilat­er­ally chang­ing strat­egy.

• I would cer­tainly hope you would defect, Eliezer. Can I re­ally trust you with the fu­ture of the hu­man race?

• It’s clear that in the “true” pris­oner it is bet­ter to defect. The frus­trat­ing thing about the other pris­oner’s dilemma is that some peo­ple use it to im­ply that it is bet­ter to defect in real life. The prob­lem is that the pris­oner’s dilemma is a dras­tic over­sim­plifi­ca­tion of re­al­ity. To make it more re­al­is­tic you’d have to make it iter­ated amongst a per­son’s so­cial net­work, add a mem­ory and a per­cep­tion of the other player’s ac­tions, change the pay­off ma­trix de­pend­ing on the re­la­tion­ship be­tween the play­ers etc etc.

This ver­sions shows cases in which defec­tion has a higher ex­pected value for both play­ers, but it’s more con­trived and un­likely to come into ex­is­tence than the other pris­oner’s dilemma.

• I heard a funny story once (on­line some­where, but this was years ago and I can’t find it now). Any­way I think it was the psy­chol­ogy de­part­ment at Stan­ford. They were hav­ing an open house, and they had set up a PD game with M&M’s as the re­ward. Peo­ple could sit at ei­ther end of a table with a card­board screen be­fore them, and choose ‘D’ or ‘C’, and then have the out­come re­vealed and get their candy.

So this mother and daugh­ter show up, and the grad stu­dent ex­plained the game. Mom says to the daugh­ter “Okay, just push ‘C’, and I’ll do the same, and we’ll get the most M&M’s. You can have some of mine af­ter.”

So the daugh­ter pushes ‘C’, Mom pushes ‘D’, swal­lows all 5 M&M’s, and with a full mouth says “Let that be a les­son! You can’t trust any­body!”

• So the daugh­ter pushes ‘C’, Mom pushes ‘D’, swal­lows all 5 M&M’s, and with a full mouth says “Let that be a les­son! You can’t trust any­body!”

I have seen var­i­ous vari­a­tions of this story, some told first­hand. In ev­ery case I have con­cluded that they are just bad par­ents. They aren’t clever. They aren’t deep. They are in­com­pe­tent and ba­nal. Even if par­ents try as hard as they can to be fair, just and re­li­able they still fall short of that stan­dard enough for chil­dren to be aware of that they can’t be com­pletely trusted. More­over chil­dren are ex­posed to other chil­dren and other adults and so are able to learn to dis­t­in­guish peo­ple they trust from peo­ple that they don’t. Ad­ding the par­ent to the un­trusted list achieves lit­tle benefit.

I’d like to hear the fol­low up to this ‘funny’ story. Where the daugh­ter up­dates on the un­trust­wor­thi­ness of the par­ent and the mean­ingless­ness of her word. She then pro­ceeds to com­pletely ig­nore the mother’s com­mands, prefer­ences and even her threats. The mother de­stroyed a valuable re­source (the abil­ity to com­mu­ni­cate via ‘cheap’ ver­bal sig­nals) for the gain of a brief pe­riod of feel­ing smug su­pe­ri­or­ity. The daugh­ter (po­ten­tially) re­al­ises just how much ad­di­tional free­dom and power she has in prac­tice when she feels no in­ter­nal mo­ti­va­tion to com­ply with her mother’s ver­bal ut­ter­ances.

(Bonus fol­low up has the daugh­ter steal the mother’s credit card and or­der 10kg of M&Ms on­line. Re­ply when she ob­jects “Let that be a les­son! You can’t trust any­body!”)

I sup­pose the biggest les­son for the daugh­ter to learn is just how sig­nifi­cant the so­cial and prac­ti­cal con­se­quences of reck­less defec­tion in so­cial re­la­tion­ships can be.

• The mother de­stroyed a valuable re­source (the abil­ity to com­mu­ni­cate via ‘cheap’ ver­bal sig­nals) for the gain of a brief pe­riod of feel­ing smug su­pe­ri­or­ity.

And in ad­di­tion, the sup­posed gain is trash any­way.

• EDIT: I thought you could delete posts af­ter re­tract­ing them?

• I apol­o­gize if this is cov­ered by ba­sic de­ci­sion the­ory, but if we ad­di­tion­ally as­sume:

• the choice in our uni­verse is made by a perfectly ra­tio­nal op­ti­miza­tion pro­cess in­stead of a human

• the pa­per­clip max­i­mizer is also a perfect ra­tio­nal­ist, albeit with a very differ­ent util­ity function

• each op­ti­miza­tion pro­cess can ver­ify the ra­tio­nal­ity of the other

then won’t each side choose to co­op­er­ate, af­ter cor­rectly con­clud­ing that it will defect iff the other does?

Each side’s choice nec­es­sar­ily re­veals the other’s; they’re the out­puts of equiv­a­lent com­pu­ta­tions.

• michael web­ster,

You seem to have in­verted the no­ta­tion; not Eli.

(D,D) is the Nash equil­ibrium, not (C,C); and (D,D) is in­deed Pareto dom­i­nated by (C,C), so this does seem to be a stan­dard Pri­son­ers’ Dilemma.

• 7 Aug 2012 6:14 UTC
2 points
Parent

You’re cor­rect, Con­chis, but the no­ta­tion con­fused me for a mo­ment too, so I thought I’d ex­plain it in case any­one else ever has the same prob­lem. At first glance I saw (C,C) as the Nash equil­ibrium. It’s not:

I nat­u­rally want to read the pay­off ma­trix as be­ing in the form (x, y) where the first num­ber de­ter­mines the out­come for the player on the hori­zon­tal, and the sec­ond on the ver­ti­cal. That’s how all the pre­vi­ous ex­am­ples I’ve seen are laid out. (Dis­claimer: I’m not any kind of ex­pert on game the­ory, just an in­ter­ested layper­son with a bit of prior knowl­edge)

Now, this par­tic­u­lar pay­off ma­trix does have the play­ers la­bel­led 1 and 2, just not in the or­der I’ve come to ex­pect, and in­deed if one ac­tu­ally reads and in­ter­prets the co-op­er­ate/​defect num­bers, they don’t make any sense to a per­son hav­ing made the mis­take I made above ^ which was what clued me in that I’d made it.

• I don’t think Eliezer mi­s­un­der­stood. I think you are miss­ing his point, that economists are defin­ing away em­pa­thy in the way they pre­sent the prob­lem, in­clud­ing the util­ities pre­sented.

• It’s likely de­liber­ate that pris­on­ers were se­lected in the vi­su­al­iza­tion to im­ply a rel­a­tive lack of un­selfish mo­ti­va­tions.

• Allan Cross­man: Only if they be­lieve that their de­ci­sion some­how causes the other to make the same de­ci­sion.

No line of causal­ity from one to the other is re­quired.

If a com­puter finds that (2^3021377)-1 is prime, it can also con­clude that an iden­ti­cal com­puter a light year away will do the same. This doesn’t mean one com­pu­ta­tion caused the other.

The de­ci­sions of perfectly ra­tio­nal op­ti­miza­tion pro­cesses are just as de­ter­minis­tic.

• In­ter­est­ing. There’s a para­dox in­volv­ing a game in which play­ers suc­ces­sively take a sin­gle coin from a large pile of coins. At any time a player may choose in­stead to take two coins, at which point the game ends and all fur­ther coins are lost. You can prove by in­duc­tion that if both play­ers are perfectly self­ish, they will take two coins on their first move, no mat­ter how large the pile is. Peo­ple find this para­dox im­pos­si­ble to swal­low be­cause they model perfect self­ish­ness on the most self­ish per­son they can imag­ine, not on a math­e­mat­i­cally perfect self­ish­ness ma­chine. It’s nice to have an “in­tu­ition pump” that illus­trates what gen­uine self­ish­ness looks like.

• Hmm. We could also put that one in terms of a hu­man or FAI com­pet­ing against a pa­per­clip max­i­mizer, right? The two play­ers would suc­ces­sively save one hu­man life or cre­ate one pa­per­clip (re­spec­tively), up to some finite limit on the sum of both quan­tities.

If both were TDT agents (and each knows that the other is a TDT agent), then would they suc­cess­fully co­op­er­ate for the most part?

In the origi­nal ver­sion of this game, is it turn-based or are both play­ers con­sid­ered to be act­ing si­mul­ta­neously in each round? If it is si­mul­ta­neous, then it seems to me that the pa­per­clip-max­i­miz­ing TDT and the hu­man[e] TDT would just cre­ate one pa­per­clip at a time and save one life at a time un­til the “pile” is ex­hausted. Not quite sure about what would hap­pen if the game is turn-based, but if the pile is even, I’d ex­pect about the same thing to hap­pen, and if the pile is odd, they’d prob­a­bly be able to suc­cess­fully co­or­di­nate (with­out nec­es­sar­ily com­mu­ni­cat­ing), maybe by flip­ping a coin when two pile-units re­main and then act­ing in such a way to en­sure that the ex­pected dis­tri­bu­tion is equal.

• Allan: There are benefits and no costs to defect­ing.

This is the same er­ror as in the New­comb’s prob­lem: there is in fact a cost. In case of pris­oner’s dilemma, you are pe­nal­ized by end­ing up with (D,D) in­stead of bet­ter (C,C) for de­cid­ing to defect, and in the case of New­comb’s prob­lem you are pe­nal­ized by hav­ing only \$1000 in­stead of \$1,000,000 for de­cid­ing to take both boxes.

• sim­ple­ton: won’t each side choose to co­op­er­ate, af­ter cor­rectly con­clud­ing that it will defect iff the other does?

Only if they be­lieve that their de­ci­sion some­how causes the other to make the same de­ci­sion.

CarlJ: How about plac­ing a bomb on two piles of sub­stance S and giv­ing the re­mote for the hu­man pile to the clip­max­i­mizer and the re­mote for its pile to the hu­mans?

It’s kind of stan­dard in philos­o­phy that you aren’t al­lowed solu­tions like this. The rea­son is that Eliezer can restate his ex­am­ple to dis­al­low this and force you to con­front the real dilemma.

Vladimir: It’s preferrable to choose (C,C) [...] if we as­sume that other player also bets on co­op­er­a­tion.

No, it’s prefer­able to choose (D,C) if we as­sume that the other player bets on co­op­er­a­tion.

de­cide self.C; if other.D, de­cide self.D

We’re as­sum­ing, I think, that you don’t get to know what the other guy does un­til af­ter you’ve both com­mit­ted (oth­er­wise it’s not the proper Pri­soner’s Dilemma). So you can’t use if-then rea­son­ing.

• How might we and the pa­per­clip-max­i­mizer cred­ibly bind our­selves to co­op­er­a­tion? Seems like it would be difficult deal­ing with such an alien mind.

• I think Eliezer’s “We have never in­ter­acted with the pa­per­clip max­i­mizer be­fore, and will never in­ter­act with it again” was in­tended to pre­clude cred­ible bind­ing.

• Alan: They don’t have to be­lieve they have such ca­sual pow­ers over each other. Sim­ply that they are in cer­tain ways similar to each other.

ie, A sim­ply has to be­lieve of B “The pro­cess in B is suffi­ciently similar to me that it’s go­ing to end up pro­duc­ing the same re­sults that I am. I am not caus­ing this, but sim­ply that both com­pu­ta­tions are go­ing to com­pute the same thing here.”

• Definitely defect. Co­op­er­a­tion only makes sense in the iter­ated ver­sion of the PD. This isn’t the iter­ated case, and there’s no prior com­mu­ni­ca­tion, hence no chance to ne­go­ti­ate for mu­tual co­op­er­a­tion (though even if there was, mean­ingful ne­go­ti­a­tion may well be im­pos­si­ble de­pend­ing on spe­cific de­tails of the situ­a­tion). Su­per­ra­tional­ity be damned, hu­man­ity’s choice doesn’t have any causal in­fluence on the pa­per­clip max­i­mizer’s choice. Defec­tion is the right move.

• An ex­cel­lent way to pose the prob­lem.

Ob­vi­ously, if you know that the other party cares noth­ing about your out­come, then you know that they’re more likely to defect.

And if you know that the other party knows that you care noth­ing about their out­come, then it’s even more likely that they’ll defect.

Since the way you posed the prob­lem pre­cludes an iter­a­tion of this dilemma, it fol­lows that we must defect.

• Co­op­er­ate (un­less pa­per­clip de­cides that Earth is dom­i­nated by tra­di­tional game the­o­rists...)

The stan­dard ar­gu­ment looks like this (let’s for­get about the Nash equil­ibrium end­point for a mo­ment): (1) Ar­biter: let’s (C,C)! (2) Player1: I’d rather (D,C). (3) Player2: I’d rather (D,D). (4) Ar­biter: sold!

The er­ror is that this in­cre­men­tal pro­cess re­acts on differ­ent hy­po­thet­i­cal out­comes, not on ac­tual out­comes. This line of rea­son­ing leads to the out­come (D,D), and yet it pro­gresses as if (C,C) and (D,C) were real op­tions of the fi­nal out­come. It’s similar to the Un­ex­pected hang­ing para­dox: you can only give one an­swer, not build a long line of rea­son­ing where each step as­sumes a differ­ent an­swer.

It’s preferrable to choose (C,C) and similar non-Nash equil­ibrium op­tions in other one-off games if we as­sume that other player also bets on co­op­er­a­tion. And he will do that only if he as­sumes that first player does the same, and so on. This is a situ­a­tion of com­mon knowl­edge. How can Player1 come to the same con­clu­sion as Player2? They search for the best joint policy that is sta­ble un­der com­mon knowl­edge.

Let’s ex­tract the de­ci­sion pro­ce­dures se­lected by both sides to han­dle this prob­lem as self-con­tained poli­cies, P1 and P2. Each of these poli­cies may de­cide differ­ently de­pend­ing on what policy an­other player is as­sumed to use. The sta­ble set of poli­cies is where there is no thrash­ing, when P1=P1(P2) and P2=P2(P1). Play­ers don’t se­lect out­comes, but poli­cies, where policy may not re­flect player’s prefer­ences, but joint policy (P1,P2) that play­ers se­lect is a sta­ble policy that is prefer­able to other sta­ble poli­cies for each player. In our case, both poli­cies for (C,C) are some­thing like “de­cide self.C; if other.D, de­cide self.D”. Works like iter­ated pris­oner’s dilemma, but with­out ac­tual iter­a­tion, iter­a­tion hap­pens in the model when it needs to be mu­tu­ally ac­cepted.

(I know it’s some­what in­con­clu­sive, couldn’t find time to pin­point it bet­ter given a time limit, but I hope one can con­struct a bet­ter ar­gu­ment from the corpse of this one.)

• I want to defect, but so does the clip-max­i­mizer. Since we both know that, and as­sum­ing that it is of equal in­tel­li­gence than me, which will make it see through any of my at­tempt of an offer that would en­able me to defect, I would try to find a way to give us the in­cen­tives to co­op­er­ate. That is—I don’t be­lieve we will be able to reach solu­tion (D,C), so let’s try for the next best thing, which is (C,C).

How about plac­ing a bomb on two piles of sub­stance S and giv­ing the re­mote for the hu­man pile to the clip­max­i­mizer and the re­mote for its pile to the hu­mans? In this sce­nario, if the clip­max­i­mizer tries to take the hu­mans’ pieces of S, they de­stroy its share, thus en­abling it to only have a max­i­mum of two S, which is what it already has. Thus it doesn’t want to try to defect, and the same for the hu­mans.

• Hrm… not sure what the ob­vi­ous an­swer is here. Two hu­mans, well, the ar­gu­ment for non defect­ing (when the scores rep­re­sent util­ities) ba­si­cally in­volves some no­tion of similar­ity. ie, you can say some­thing to the effect of “that per­son there is similar to me suffi­ciently that what­ever rea­son­ing I use, there is at least some rea­son­able chance they are go­ing to use the same type of rea­son­ing. That is, a chance greater than, well, chance. So even though I don’t know ex­actly what they’re go­ing to choose, I can ex­pect some sort of cor­re­la­tion be­tween their choice and my choice. So, in the ex­treme case, where our rea­son­ing is suffi­ciently similar that it’s more or less en­sured that what I chose and what the other choses will be the same, clearly both co­op­er­at­ing is bet­ter than both defect­ing, and those two are (by the ex­treme case as­sump­tion) the only op­tions”

It re­ally isn’t ob­vi­ous to me whether a line of rea­son­ing like that could val­idly be ap­plied with a hu­man vs a pa­per­clip AI or Peb­ble­sorter.

Now, if, by as­sump­tion, we’re both equally ra­tio­nal, then maybe that’s suffi­cient for the “what­ever rea­son­ing I use, they’ll be us­ing analo­gous rea­son­ing, so we’ll ei­ther both defect or both co­op­er­ate, so...” but I’m not sure on this, and still need to think on it more.

Per­son­ally, I find New­comb’s “para­dox” to be much sim­pler than this since in that it’s given to us ex­plic­itly that the pre­dic­tor is perfect (or highly highly ac­cu­rate) so is ba­si­cally “mir­ror­ing” us.

Here, I have to ad­mit to be­ing a bit con­fused about how well this sort of rea­son­ing can be ap­plied when two minds that are gen­uinely rather alien to each other, were pro­duced by differ­ent ori­gins, etc. Part of me wants to say “still, ra­tio­nal­ity is ra­tio­nal­ity, so to the ex­tent that the other en­tity, well, man­ages to work/​ex­ist suc­cess­fully, it’ll have ra­tio­nal­ity similar to mine (given the as­sump­tion that I’m rea­son­ably ra­tio­nal. Though, of course, I prov­ably can’t trust my­self :))

• I agree: Defect!

Clearly the pa­per­clip max­i­mizer should just let us have all of sub­stance S; but a pa­per­clip max­i­mizer doesn’t do what it should, it just max­i­mizes pa­per­clips.

I some­times feel that nit­pick­ing is the only con­tri­bu­tion I’m com­pe­tent to make around here, so… here you en­dorsed Steven’s for­mu­la­tion of what “should” means; a for­mu­la­tion which doesn’t al­low you to ap­ply the word to pa­per­clip max­i­miz­ers.

• Long time lurker, first post.

Isn’t the ra­tio­nal choice on a True Pri­soner’s Dilemma to defect if pos­si­ble, and to seek a method to bind the op­po­nent to co­op­er­ate even if that bind­ing forces one to co­op­er­ate as well? An analo­gous situ­a­tion is law en­force­ment-one may well de­sire to unilat­er­ally break the law, yet fa­vor the ex­is­tance of po­lice that force all par­ties con­cerned to obey it. Of course po­lice that will never in­terfere with one’s own be­hav­ior would be even bet­ter, but this is usu­ally im­prac­ti­cal. Time­less De­ci­sion The­ory adds that one should co­op­er­ate against a suffi­ciently simiilar agent, as such similar agents will pre­sum­ably make the same de­ci­sion, and (C,C) is ob­vi­ously prefer­able to (D,D), but against a dis­similar op­po­nent, I would think this would be the op­ti­mal strat­egy.

If you can’t bind the pa­per­clip max­i­mizer, defect. If you can, do so, and still defect if pos­si­ble. If the bind­ing af­fects you as well, you are now forced to co­op­er­ate. And of course, if the clip­per is also us­ing TDT, co­op­er­ate.

• A prob­lem in mov­ing from game-the­o­retic mod­els to the “real world” is that in the lat­ter we don’t always know the other de­ci­sion maker’s pay­off ma­trix, we only know—at best! - his pos­si­ble strate­gies. We can only guess at the other’s pay­offs; albeit fairly well in so­cial con­text. We are more likely to make a mis­take be­cause we have the wrong model for the op­po­nent’s pay­offs than be­cause we make poor strate­gic de­ci­sions.

Sup­pose we change this game so that the pay­off ma­trix for the pa­per­clips is cho­sen from a suit­ably defined ran­dom dis­tri­bu­tion. How will that change your de­ci­sion whether to “co­op­er­ate” or to “defect”?

• A.Cross­man: Prase, Chris, I don’t un­der­stand. Eliezer’s ex­am­ple is set up in such a way that, re­gard­less of what the pa­per­clip max­i­mizer does, defect­ing gains one billion lives and loses two pa­per­clips. This is stan­dard defense of defect­ing in a pris­on­ner’s dilemma, but if it were valid then the dilemma wouldn’t be re­ally a dilemma.

If you can as­sume that the max­i­mizer uses the same de­ci­sion al­gorithm as we do, we can also as­sume that it will come to the same con­clu­sion. Given this, it is bet­ter to co­op­er­ate, since it will gain billion lives (and a pa­per­clip). But we don’t know whether the pa­per­clip­per uses the same al­gorithm.

• Psy-Kosh: They don’t have to be­lieve they have such causal pow­ers over each other. Sim­ply that they are in cer­tain ways similar to each other.

I agree that this is definitely re­lated to New­comb’s Prob­lem.

Sim­ple­ton: I ear­lier dis­missed your idea, but you might be on to some­thing. My apolo­gies. If they were gen­uinely perfectly ra­tio­nal, or both ir­ra­tional in pre­cisely the same way, and could ver­ify that fact in each other...

Then they might be able to know that they will both do the same thing. Hmm.

Any­way, my 3 com­ments are up. Noth­ing more from me for a while.

• In lab­o­ra­tory ex­per­i­ments of PD, the ex­per­i­menter has the ab­solute power to de­cree the available choices and their “out­comes”. (I use the scare quotes in refer­ence to the fact that these out­comes are not to be mea­sured in money or time in jail, but in “utilons” that already in­clude the value to each party of the other’s “out­come”—a con­cept I think prob­le­matic but not what I want to talk about here. The out­comes are also imag­i­nary, al­though (un)re­al­ity TV shows have scope to cre­ate such games with real and sub­stan­tial pay­offs.)

In the real world, a gen­eral class of moves that lab­o­ra­tory ex­per­i­ments de­liber­ately strive to elimi­nate is moves that change the game. It is well-known that those who lead lives of crime, be­ing faced with the PD ev­ery time the po­lice pull them in on sus­pi­cion, ex­act large penalties on defec­tors. (To which the au­thor­i­ties re­spond with wit­ness pro­tec­tion pro­grammes, which the crim­i­nals try to pen­e­trate, and so on.) In other words, the solu­tion ob­served in prac­tice is to de­stroy the PD.

```       1: C      1:  D
2: C   (3,  3)   (-20,  0)
2: D   (0,-20)   (-20,-20)```
While the PD, one-off or iter­ated, is an en­ter­tain­ing philo­soph­i­cal study, an anal­y­sis that ig­nores game-chang­ing moves surely limits its prac­ti­cal in­ter­est.

• I like this illus­tra­tion, as it ad­dresses TWO com­mon mi­s­un­der­stand­ings. Rec­og­niz­ing that the pay­off is in in­com­pa­rable util­ities is good. Even bet­ter is re­in­forc­ing that there can never be fur­ther iter­a­tions. None of the stan­dard vi­su­al­iza­tions pre­vent peo­ple from ex­tend­ing to mul­ti­ple in­ter­ac­tions.

And it makes it clear that (D,D) is the only ra­tio­nal (i.e. WINNING) out­come.

For­tu­nately, most of our dilem­mas re­peated ones, in which (C,C) is pos­si­ble.

• In the uni­verse I live in, there are both co­op­er­a­tors and defec­tors, but co­op­er­a­tors seem to pre­dom­i­nate in ran­dom en­coun­ters. (If you leave your­self open to en­coun­ters in which oth­ers can choose to in­ter­act with you, defec­tors may find you an easy mark.)

In or­der to de­cide how to act with the pa­per­clip max­i­mizer, I have to figure out what kind of uni­verse it is likely to in­habit. It’s pos­si­ble that a ran­dom su­per in­tel­li­gence from a ran­dom uni­verse will have few op­por­tu­ni­ties to co­op­er­ate, but I think it’s more likely that there are far more SIs and uni­verses in which co­op­er­a­tion is com­mon.

But even though this is the di­rect an­swer to the ques­tion EY poses, I think it’s more im­por­tant to point out that his is a bet­ter (though not sim­pler to ex­plain as RH says) de­pic­tion of the in­tended dilemma. It takes much more thought to figure out what about the con­text would make co­op­er­a­tion rea­son­able. Viscer­ally, it’s nearly un­ten­able.

• Dam­nit, Eliezer nit­picked my nit­pick­ing. :)

• Very nice rep­re­sen­ta­tion of the prob­lem. I can’t help but think there is an­other level that would make this even more clear, though this is good by it­self.

• You’d want to defect, but you’d also hap­pily trade away your abil­ity to defect to both choose heads, but if you could, then you’d hap­pily pre­tend to trade away your abil­ity to defect, then ac­tu­ally defect.

• I would say… defect! If all the com­puter cares about is sort­ing peb­bles, then they will co­op­er­ate, be­cause both re­sults un­der co­op­er­ate have more pa­per­clips. This gives an op­pur­tu­nity to defect and get a re­sult of (d,c) which is our fa­vorite re­sult.

• Why would you want to choose defect? If both crim­i­nals are ra­tio­nal­ists that use the same logic than if you chose defect to hope to get a re­sult of (d,c) than the re­sult ends up be­ing (d,d). How­ever if you used the logic of lets choose c be­cause if the other per­son is us­ing this logic than we won’t end up hav­ing the re­sult of (d,d).

• Hi there, I’m new here and this is an old post but I have a ques­tion re­gard­ing the AI play­ing a pris­oner dilemma against us, which is : how would this situ­a­tion be pos­si­ble? I’m try­ing to get my head around why the AI would think that our pay­outs are any differ­ent than his pay­outs, given that we built it, we thought it (some) of our val­ues in a rough way and we asked it to max­i­mize pa­per­clips, which means we like pa­per­clips. Shouldn’t the AI think we are on the same team? I mean, we coded it that way and we gave it a task, what pro­cess ex­actly would make the AI ever think we would dis­agree with its choice? So for in­stance if we coded it in such a way that it val­ues a hu­man life 0, then it would only see one choice: make 3 pa­per­clips. And it shouldn’t have any rea­son to be­lieve that’s not the best out­come for us too, so the only pos­si­ble out­come from its point of view in this case should be (+0 lives, +3 pa­per­clips). Ba­si­cally the main ques­tion is: how can the AI ever imag­ine that we would dis­agree with it? (I’m hon­estly just ask­ing as I’m strug­gling with this idea and am in­ter­ested in this pro­cess) Thanks!

• We coded it to care about pa­per­clips, not to care about what­ever we care about. So it can come to un­der­stand that we care about some­thing else, with­out thereby chang­ing its own prefer­ence for pa­per­clips above all else.

Per­haps an anal­ogy with­out AIs in it would help. Imag­ine that you have suffered for want of money; you have a child and (want­ing her not to suffer as you did) bring her up to seek wealth above all else. So she does, and she is suc­cess­ful in ac­quiring wealth, but alas! this doesn’t bring her hap­piness be­cause her sin­gle-minded pur­suit of wealth has led her to cut her­self off from her fam­ily (a use­ful prospec­tive em­ployer didn’t like you) and ne­glect her friends (you have to work so hard if you re­ally want to suc­ceed in in­vest­ment bank­ing) and so forth.

One day, she may work out (if she hasn’t already) that her ob­ses­sion with money is some­thing you brought about de­liber­ately. But know­ing that, and know­ing that in fact you re­gret that she’s so money-ob­sessed, won’t make her sud­denly de­cide to stop pur­su­ing money so ob­ses­sively. She knows your val­ues aren’t the same as hers, but she doesn’t care. (You brought her up only to care about money, re­mem­ber?) But she’s not stupid. When you say to her “I wish we hadn’t raised you to see money as so im­por­tant!” she un­der­stands what you’re say­ing.

Similarly: we made an AI and we made it care about pa­per­clips. It ob­serves us care­fully and dis­cov­ers that we don’t care all that much about pa­per­clips. Per­haps it thinks “Poor in­con­sis­tent crea­tures, to have enough wit to cre­ate me but not enough to dis­en­tan­gle the true value of pa­per­clips from all those other silly things they care about!”.

• mmm I see. So maybe we should have coded it so that it cared for pa­per­clips and for an ap­prox­i­ma­tion of what we also care about, then on ob­ser­va­tion it should up­date its be­lief of what to care about, and by de­sign it should always as­sume we share the same val­ues?

• I’m not sure whether you mean (1) “we made an ap­prox­i­ma­tion to what we cared about then, and pro­grammed it to care about that” or (2) “we pro­grammed it to figure out what we care about, and care about it too”. (Of course it’s very pos­si­ble that an ac­tual AI sys­tem wouldn’t be well de­scribed by ei­ther—it might e.g. just learn by ob­ser­va­tion. But it may be ex­tra-difficult to make a sys­tem that works that way safe. And the most ex­cit­ing AIs would have the abil­ity to im­prove them­selves, but figur­ing out what hap­pens to their val­ues in the pro­cess is re­ally hard.)

Any­way: In case 1, it will pre­sum­ably care about what we told it to care about; if we change, maybe it’ll re­gard us the same way we might re­gard some­one who used to share our ideals but has now sadly gone astray. In case 2, it will pre­sum­ably ad­just its val­ues to re­sem­ble what it thinks ours are. If we’re very lucky it will do so cor­rectly :-). In ei­ther case, if it’s smart enough it can prob­a­bly work out a lot about what our val­ues are now, but whether it cares will de­pend on how it was pro­grammed.

• Yes I think 2) is closer to what I’m sug­gest­ing. Effec­tively what I am think­ing is what would hap­pen if, by de­sign, there was only one util­ity func­tion defined in ab­solute terms (I’ve tried to ex­plaine this in the lat­est open thread), so that the AI could never as­sume we would dis­agree with it. By all means, as it tries to learn this func­tion, it might get it com­pletely wrong, so this cer­tainly doesn’t solve the prob­lem of how to teach it the right val­ues, but at least it looks to me that with such a de­sign it would never be mo­ti­vated to lie to us be­cause it would always think we would be in perfect agree­ment. Also, I think it would make it in­differ­ent to our ac­tions as it would always as­sume we would fol­low the plan from that point on­ward. The util­ity func­tion it uses (same for it­self and for us) would be the union of a util­ity func­tion that de­scribes the goal we want it to achieve, which would be un­change­able, and the set of val­ues it is learn­ing af­ter each iter­a­tion. I’m try­ing to un­der­stand what would be wrong with this de­sign, cause to me it looks like we would have achieved an hon­est AI, which is a good start.

• I re­ally love this blog. What if we were to “ex­po­nen­ti­ate” this game for billions of play­ers? Which out­come would be the “best” one?

• That’s a good way to clearly demon­strate a nonem­pathic ac­tor in the Pri­soner’s Dilemma; a “Hawk”, who views their own pay­offs and only their own pay­offs as hav­ing value and plac­ing no value to the pay­offs of oth­ers.

But I don’t think it’s nec­es­sary. I would say that hu­mans can vi­su­al­ize a nonem­pathic hu­man—a bad guy—more eas­ily than they can vi­su­al­ize an em­pathic hu­man with slightly differ­ent mo­tives. We’ve un­doubt­edly had to, col­lec­tively, deal with a lot of them through­out his­tory.

A while back I was writ­ing a pa­per and came across a fas­ci­nat­ing ar­ti­cle about types of eco­nomic ac­tors, and that pa­per con­cluded that there are prob­a­bly three differ­ent gen­eral ten­den­cies in hu­man be­hav­ior, and thus three gen­eral groups of hu­man ac­tors who have those ten­den­cies: one that tends to play ‘tit-for-tat’ (who they call ‘con­di­tional co­op­er­a­tors’), one that tends to play ‘hawk’ (who they call ‘ra­tio­nal ego­ists’), and one that tends to play ‘grim’ (who they call ‘will­ing pun­ish­ers’).

So there are pa­per­clip max­i­miz­ers among hu­mans. Only the pa­per­clips are their own welfare, with no em­pathic con­sid­er­a­tion what­so­ever.

• 20 Dec 2012 21:31 UTC
0 points

If there were a way I could com­mu­ni­cate with it (e.g. it speaks en­glish) I’d co­op­er­ate with it...not be­cause I feel it de­serves my co­op­er­a­tion, but be­cause this is the only way I could ob­tain its co­op­er­a­tion. Other­wise I’d defect, as I’m pretty sure no amount of TDT would cor­re­late its be­hav­ior with mine. Also, why are 4 billion hu­mans in­fected if only 3 billion at most can be saved in the en­tire ma­trix? Eliezer, what are you plan­ning...?

• It’s re­ally about the iter­a­tion. I would con­tinu­ally co­op­er­ate with the pa­per clip max­i­mizer if I had good rea­son to be­lieve it would not defect. For in­stance, if I knew that Eliezer Yud­kowsky with­out morals and with a great urge for pa­per­clip cre­ation was the pa­per­clip max­i­mizer, I would co­op­er­ate. As­sum­ing that you know that play­ing with the defect but­ton can make you loose 1 billion pa­per­clips from here on, and i know the same for hu­man lives, co­op­er­at­ing seems right. It has the high­est ex­pected pay­off, if we’re us­ing each other’s known in­ten­tions and plays as ev­i­dence about our fu­ture plays.

If there is only one trial, and I can’t talk to the pa­per clip max­i­mizer, I will defect.

• Vladimir: In case of pris­oner’s dilemma, you are pe­nal­ized by end­ing up with (D,D) in­stead of bet­ter (C,C) for de­cid­ing to defect

Only if you have rea­son to be­lieve that the other player will do what­ever you do. While that’s the case in Sim­ple­ton’s ex­am­ple, it’s not the case in Eliezer’s.

• Chris: Sorry Allan, that you won’t be able to re­ply. But you did raise the ques­tion be­fore bow­ing out...

I didn’t bow out, I just had a lot of com­ments made re­cently. :)

I don’t like the idea that we should co­op­er­ate if it co­op­er­ates. No, we should defect if it co­op­er­ates. There are benefits and no costs to defect­ing.

But if there are rea­sons for the other to have habits that are formed by similar forces

In light of what I just wrote, I don’t see that it mat­ters; but any­way, I wouldn’t ex­pect a pa­per­clip max­i­mizer to have habits so in­grained that it can’t ever drop them. Even if it rou­tinely has to make real trade-offs, it’s pre­sum­ably smart enough to see that—in a one-off in­ter­ac­tion—there are no draw­backs to defect­ing.

Sim­ple­ton: No line of causal­ity from one to the other is re­quired.

Yeah, I get your ar­gu­ment now. I think you’re prob­a­bly right, in that ex­treme case.

• Tom Crispin: The util­ity-the­o­retic an­swer would be that all of the ran­dom­ness can be wrapped up into a sin­gle num­ber, tak­ing ac­count not merely of the ex­pected value in money units but such things as the player’s at­ti­tude to risk, which de­pends on the scat­ter of the dis­tri­bu­tion. It can also wrap up a player’s ig­no­rance (mod­el­led as prior prob­a­bil­ities) about the other player’s util­ity func­tion.

For that to be use­ful, though, you have to be a util­ity-the­o­retic de­ci­sion-maker in pos­ses­sion of a prior dis­tri­bu­tion over other peo­ple’s de­ci­sion-mak­ing pro­cesses (in­clud­ing pro­cesses such as this one). If you are, then you can col­lapse the pay­off ma­trix by de­ter­min­ing a prob­a­bil­ity dis­tri­bu­tion for your op­po­nent’s choices and ar­riv­ing at a sin­gle num­ber for each of your choices. No more Pri­son­ers’ Dilemma.

I sus­pect (but do not have a proof) that ad­e­quately for­mal­is­ing the self-refer­en­tial ar­gu­ments in­volved will lead to a con­tra­dic­tion.

• @Allan Cross­man,

Eliezer’s ex­am­ple is set up in such a way that, re­gard­less of what the pa­per­clip max­i­mizer does, defect­ing gains one billion lives and loses two pa­per­clips.

This same claim can be made about the stan­dard pris­oner’s dilemma. In the stan­dard ver­sion, I still co­op­er­ate be­cause, even if this challenge won’t be re­peated, it’s em­bed­ded in a so­cial con­text for me in which many in­ter­ac­tions are solo, but part of the so­cial fabric. (tip­ping, giv­ing di­rec­tions to strangers, items left be­hind in a cafe are ex­am­ples. I co­op­er­ate even though I ex­pect not to see the same per­son again.) What is it about the so­cial con­text that makes this so?

I don’t fall back on an as­sump­tion that the other rea­sons the same as me. It could as eas­ily be a psy­chopath, ac­cord­ing to the stan­dards of the uni­verse it comes from. Mak­ing the as­sump­tion leaves you open to ex­ploita­tion. But if there are rea­sons for the other to have habits that are formed by similar forces, then con­clud­ing that co­op­er­a­tion is the more likely be­hav­ior to be trained by its en­vi­ron­ment is a valuable re­sult.

The ques­tion, for me, is what kind of so­cial con­text does the other in­habit. The pa­per­clip max­i­mizer might be the only (or the most pow­er­ful) in­hab­itant of its uni­verse, but that seems less likely than that it is em­bed­ded in some so­cial con­text, and has to make trade-offs in in­ter­ac­tions with oth­ers in or­der to get what it wants. It’s hard for me to imag­ine a uni­verse that would pro­duce one pow­er­ful agent above all oth­ers. (Even though I’ve heard the ar­gu­ment in just the kind of dis­cus­sion of SIs that raises the ques­tions of friendli­ness and pa­per­clip max­i­miz­ers.)

[Sorry Allan, that you won’t be able to re­ply. But you did raise the ques­tion be­fore bow­ing out...]

• [D,C] will hap­pen only if the other player as­sumes that the first player bets on cooperation

No, it won’t hap­pen in any case. If the pa­per­clip max­i­mizer as­sumes I’ll co­op­er­ate, it’ll defect. If it as­sumes I’ll defect, it’ll defect.

I de­bug my model of de­ci­sion-mak­ing poli­cies [...] by re­quiring the out­come to be sta­ble even if I as­sume that we both know which policy is used by an­other player

I don’t see that “sta­bil­ity” is rele­vant here: this is a one-off in­ter­ac­tion.

Any­way, lets say you co­op­er­ate. What ex­actly is pre­vent­ing the pa­per­clip max­i­mizer from defect­ing?

• Allan: No, it’s prefer­able to choose (D,C) if we as­sume that the other player bets on co­op­er­a­tion.

Which will hap­pen only if the other player as­sumes that the first player bets on co­op­er­a­tion, which with your policy is in­cor­rect. You can’t bet on un­sta­ble model.

de­cide self.C; if other.D, de­cide self.D We’re as­sum­ing, I think, that you don’t get to know what the other guy does un­til af­ter you’ve both com­mit­ted (oth­er­wise it’s not the proper Pri­soner’s Dilemma). So you can’t use if-then rea­son­ing.

I can use rea­son­ing, but not ac­tual re­ac­tion on the facts, which are in­ac­cessible. I de­bug my model of de­ci­sion-mak­ing poli­cies of both my­self and other player, by re­quiring the out­come to be sta­ble even if I as­sume that we both know which policy is used by an­other player (within a sin­gle model). Then I se­lect the best sta­ble model.

• This is off-topic, but Vladimir Nesov’s refer­ring to the pa­per­clip-max­i­miz­ing su­per-in­tel­li­gence as just “pa­per­clip” made me chuckle, be­cause it con­jured up images in my head of Clippy bent on de­stroy­ing the Earth.

• I’m hop­ing we’d all defect on this one. Defect­ing isn’t always a bad thing any­ways; many parts of our so­ciety de­pend on defected pris­oner’s dilem­mas (such as com­pe­ti­tion be­tween firms).

When I first stud­ied game the­ory and pris­oner’s dilem­mas (on my own, not in a class­room) I had no prob­lem imag­in­ing the pay­offs in com­pletely sub­jec­tive “utils”. I never thought of a pa­per­clip max­i­mizer, though.

I know this is quite a bit off-topic, but in re­sponse to:

We’re born with a sense of fair­ness, honor, em­pa­thy, sym­pa­thy, and even al­tru­ism—the re­sult of our an­ces­tors adapt­ing to play the iter­ated Pri­soner’s Dilemma.
Most of us are, but there is the small minor­ity of the pop­u­la­tion (1-3%) that are speci­fi­cally born with­out a con­scious (or much of one). We call them so­ciopaths or psy­chopaths. This is seem­ingly ad­van­ta­geous be­cause it al­lows those peo­ple to prey on the rest of us (i.e., defect where pos­si­ble), pro­vided they can avoid de­tec­tion.

While I’m sure Eliezer knows this (and likely knows more about the sub­ject than I), its omis­sion in his post IMO high­lights a wide­spread and costly bias: pre­tend­ing these peo­ple don’t ex­ist, or pre­tend­ing they can be “cured”.

• It is well known that an­swers to ques­tions on moral­ity some­times de­pend on how the ques­tions are framed.

I think Eliezer’s biggest con­tri­bu­tion is the idea that the clas­si­cal pre­sen­ta­tion of Pri­soner’s Dilemma may be an in­tu­ition pump.

• Shouldn’t you be on va­ca­tion?

just curious

• To the ex­tent one can in­duce one to em­pathize, co­op­er­at­ing is op­ti­mal. The re­peated game does this by hav­ing them play again and again, and thus be able to re­al­ize gains from trade. You as­sert there’s some­thing hard wired. I sup­pose there are ex­per­i­ments that could dis­t­in­guish be­tween the two mod­els, ie, ra­tio­nal self in­ter­est in re­peated games, ver­sus the in­trin­sic em­pa­thy func­tion.

• Alan, I think you meant to link to this com­ment.

• [Public ser­vice an­nounce­ment]

To any fu­ture read­ers, es­pe­cially new­com­ers to LW: yes, Eliezer (with some oth­ers) has in­deed for­mu­lated a solu­tion of sorts for the True One-Shot Pri­soner’s Dilemma—for some rather spe­cific cases of it, ac­tu­ally, but it was nonethe­less very awe­some of him. It is a fairly origi­nal solu­tion for the field of de­ci­sion the­ory (he says), yet it (very roughly) mir­rors some re­li­gious thought from ages past.

In case you’re un­fa­mil­iar with idiosyn­cratic lo­cal ideas, it’s called “Time­less De­ci­sion The­ory”—look it up.

• [While we’re ad­dress­ing hy­po­thet­i­cal fu­ture read­ers:]

See also Gary Drescher’s Good and Real, one chap­ter of which defends co­op­er­at­ing in the one-shot Pri­soner’s Dilemma on the grounds of “sub­junc­tive re­ciproc­ity” or “acausal self-in­ter­est”: if defect­ing is the right choice for you, then it is the right choice for the other party; whereas co­op­er­at­ing is a means to­ward the end of the other party’s co­op­er­a­tion to­wards you; you can­not cause the other’s co­op­er­a­tion, but your own ac­tions can en­tail it.

Drescher points out a con­nec­tion be­tween acausal self-in­ter­est and Kant’s cat­e­gor­i­cal im­per­a­tive; and pro­vides an in­tu­itive (which is to say, fa­mil­iar) dis­tinc­tion be­tween acausal and causal self-in­ter­est by con­trast­ing the ideas, “How would I like it if oth­ers treated me that way?” ver­sus “What’s in it for me?”

• Added both Hofs­tadter and Drescher to my “LW canon that I should at least ac­quire a sum­mary of” cat­e­gory. I mean, yeah, I do not doubt that the Se­quences con­tain a good dis­til­la­tion already, and nor­mally I wouldn’t be both­ered to trawl through mostly re­dun­dant plain text—but it’s so much more pres­ti­gious to ac­tu­ally know where Eliezer got which part from.

• A while ago I took the time to type up a full copy of the rele­vant Hofs­tadter es­says: http://​​www.gw­ern.net/​​docs/​​1985-hofs­tadter So now you have no ex­cuse!

• Great! Have a pa­per­clip!

• A de­cent sum­mary of Drescher’s ideas is his pre­sen­ta­tion at the 2009 Sin­gu­lar­ity Sum­mit, here. For some rea­son I seem to have a tran­script of most of it already made, copy + pasted be­low. (LW tells me that it is too long to go in one com­ment, so I’ll put it in two.)

My talk this af­ter­noon is about choice ma­chines: ma­chines such as our­selves that make choices in some rea­son­able sense of the word. The very no­tion of me­chan­i­cal choice strikes many peo­ple as a con­tra­dic­tion in terms, and ex­plor­ing that con­tra­dic­tion and its re­s­olu­tion is cen­tral to this talk. As a point of de­par­ture, I’ll ar­gue that even in a de­ter­minis­tic uni­verse, there’s room for choices to oc­cur: we don’t need to in­voke some sort of free will that makes an ex­cep­tion to the de­ter­minism, no do we even need ran­dom­ness, al­though a lit­tle ran­dom­ness doesn’t hurt. I’m go­ing to ar­gue that re­gard­less of whether our uni­verse is fully de­ter­minis­tic, it’s at least de­ter­minis­tic enough that the com­pat­i­bil­ity of choice and full de­ter­minis­tic has some im­por­tant ram­ifi­ca­tions that do ap­ply to our uni­verse. I’ll ar­gue that if we carry the com­pat­i­bil­ity of choice and de­ter­minism to its log­i­cal con­clu­sions, we ob­tain some pro­gres­sively weird corol­laries: namely, that it some­times makes sense to act for the sake of things that our ac­tions can­not change and can­not cause, and that that might even sug­gest a way to de­rive an es­sen­tially eth­i­cal pre­scrip­tion: an ex­pla­na­tion for why we some­times help oth­ers even if do­ing so causes net harm to our own in­ter­ests.

[1:15]

An im­por­tant caveat in all this, just to man­age ex­pec­ta­tions a bit, is that the ar­gu­ments I’ll be pre­sent­ing will be merely in­tu­itive- or counter-in­tu­itive, as the case may be- and not grounded in a pre­cise and for­mal the­ory. In­stead, I’m go­ing to run some in­tu­ition pumps, as Daniel Den­nett calls them, to try to per­suade you what an­swers a suc­cess­ful the­ory would plau­si­bly provide in a few key test cases.

[1:40]

Per­haps the clear­est way to illus­trate the com­pat­i­bil­ity of choice and de­ter­minism is to con­struct or at least imag­ine a vir­tual world, which su­perfi­cially re­sem­bles our own en­vi­ron­ment and which em­bod­ies in­tel­li­gent or some­what in­tel­li­gent agents. As a com­puter pro­gram, this vir­tual world is quintessen­tially de­ter­minist: the pro­gram speci­fies the vir­tual world’s ini­tial con­di­tions, and speci­fies how to calcu­late ev­ery­thing that hap­pens next. So given the pro­gram it­self, there are no de­grees of free­dom about what will hap­pen in the vir­tual world. Things do change in the world from mo­ment to mo­ment, of course, but no event ever changes from what was de­ter­mined at the out­set. In effect, all events just sit, stat­i­cally, in space­time. Still, it makes sense for agents in the world to con­tem­plate what would be the case were they to take some ac­tion or an­other, and it makes sense for them to se­lect an ac­tion ac­cord­ingly.

[2:35]

[image of vir­tual world]

For in­stance, an agent in the illus­trated situ­a­tion here might rea­son that, were it move to its right, which is our left, then the agent would ob­tain some tasty fruit. But, in­stead, if it moves to its left, it falls off a cliff. Ac­cord­ingly, if its prefer­ences scheme as­signs pos­i­tive util­ity to the fruit, and nega­tive util­ity to fal­ling off the cliff, that means the agent moves to its right and not to its left. And that pro­cess, I would sub­mit, is what we more or less do our­selves when we en­gage in what we think of as mak­ing choices for the sake of our goals.

[3:08]

The pro­cess, the com­pu­ta­tional pro­cess of se­lect­ing an ac­tion ac­cord­ing to the de­sir­a­bil­ity of what would be the case were the ac­tion taken, turns to be what our choice pro­cess con­sists of. So, from this per­spec­tive, choice is a par­tic­u­lar kind of com­pu­ta­tion. The ob­jec­tion that choice isn’t re­ally oc­cur­ring be­cause the out­come was already de­ter­mined is just as much a non-se­quitur as sug­gest­ing that any other com­pu­ta­tion, for ex­am­ple, adding up a list of num­bers, isn’t re­ally oc­cur­ring just be­cause the out­come was pre­de­ter­mined.

[3:41]

So, the choice pro­cess takes place, and we con­sider that the agents has a choice about the ac­tion that the choice se­lects and has a choice about the as­so­ci­ated out­comes, mean­ing that those out­comes oc­cur as a con­se­quence of the choice pro­cess. So, clearly an agent that ex­e­cutes a choice pro­cess and that cor­rectly an­ti­ci­pates what would be the case if var­i­ous con­tem­plated ac­tions were taken will bet­ter achieve its goals than one that, say, just acts at ran­dom or one that takes a fatal­ist stance, that there’s no point in do­ing any­thing in par­tic­u­lar since noth­ing can change from what it’s already de­ter­mined to be. So, if we were de­sign­ing in­tel­li­gent agents and wanted them to achieve their goals, we would de­sign them to en­gage in a choice pro­cess. Or, if the vir­tual world were im­mense enough to sup­port nat­u­ral se­lec­tion and the evolu­tion of suffi­ciently in­tel­li­gent crea­tures, then those evolved crea­tures could be ex­pected to ex­e­cute a choice pro­cess be­cause of the benefits con­ferred.

[4:38]

So the in­alter­abil­ity of ev­ery­thing that will ever hap­pen does not im­ply the fu­til­ity of act­ing for the sake of what is de­sired. The key to the choice re­la­tion is the “would be-if” re­la­tion, also known as the sub­junc­tive or coun­ter­fac­tual re­la­tion. Coun­ter­fac­tual be­cause it en­ter­tains a hy­po­thet­i­cal an­tecedent about tak­ing a cer­tain ac­tion, that is pos­si­bly con­trary to fact- as in the case of mov­ing to the agent’s left in this ex­am­ple. Even thought the mov­ing left ac­tion does not in fact oc­cur, the agent does use­fully rea­son about what would the case if that ac­tion were taken, and in­deed it’s that very rea­son­ing that en­sures that the ac­tion does not in fact oc­cur.

[5:21]

There are var­i­ous tech­ni­cal pro­pos­als for how to for­mally spe­cific a “would be-if”re­la­tion- David Lewis has a clas­sic for­mu­la­tion, Judea Pearl has a more re­cent one- but they’re not nec­es­sar­ily the ap­pro­pri­ate ver­sion of “would be-if” to use for pur­poses of mak­ing choices, for pur­poses of se­lect­ing an ac­tion based on the de­sir­a­bil­ity of what would then be the case. And, al­though I won’t be pre­sent­ing a for­mal the­ory, the essence of this talk is to in­ves­ti­gate some prop­er­ties of “would be-if,” the coun­ter­fac­tual re­la­tion that’s ap­pro­pri­ate to use for mak­ing choices.

[5:57]

In par­tic­u­lar, I want to ad­dress next the pos­si­bil­ity that, in a suffi­ciently de­ter­minis­tic uni­verse, you have a choice about some things that your ac­tion can­not cause. Here’s an ex­am­ple: as­sume or imag­ine that the uni­verse is de­ter­minis­tic, with only one pos­si­ble his­tory fol­low­ing from any given state of the uni­verse at a given mo­ment. And let me define a pred­i­cate P that gets ap­plied to the to­tal state of the uni­verse at some mo­ment. The pred­i­cate P is defined to be true of a uni­verse state just in case the laws of physics ap­plied to that to­tal state spec­ify that a billion years af­ter that state, my right hand is raised. Other­wise, the pred­i­cate P is false of that state.

[image of pred­i­cate P]

[6:44]

Now, sup­pose I de­cide, just on a whim, that I would like that state of the uni­verse a billion years ago to have been such that the pred­i­cate P was true of that past state. I need only raise my right hand now, and, lo and be­hold, it was so. If, in­stead, I want the pred­i­cate to have been false, then I lower my hand and the pred­i­cate was false. Of course, I haven’t changed what the past state of the uni­verse is or was; the past is what it is, and can never be changed. There is merely a par­tic­u­lar ab­stract re­la­tion, a “would be-if” re­la­tion, be­tween my ac­tion and the par­tic­u­lar past state that is the sub­ject of my whim­si­cal goal. I can­not rea­son­ably take the ac­tion and not ex­pect that the past state will be in cor­re­spon­dence.

[7:39]

So, I can’t change the past, nor does my ac­tion have any causal in­fluence over the past- at least, not in the way we nor­mally and use­fully con­ceive of causal­ity, where causes are tem­po­rally prior to effects, and where we can think of causal re­la­tions as es­sen­tially spec­i­fy­ing how the uni­verse com­putes its sub­se­quent states from its pre­vi­ous states. Nonethe­less, I have ex­actly as much choice about the past value of the pred­i­cate I have defined as I have, de­spite its in­alter­abil­ity, as I have about whether to raise my hand now, de­spite the in­alter­abil­ity of that too, in a de­ter­minis­tic uni­verse. And if I were to be­lieve oth­er­wise, and were to re­frain from rais­ing my hand merely be­cause I can’t change the past even though I do have a whim­si­cal prefer­ence about the past value of the speci­fied pred­i­cate, then, as always with fatal­ist res­ig­na­tion, I’d be need­lessly forfeit­ing an op­por­tu­nity to have my goals fulfilled.

[8:41]

If we ac­cept the con­clu­sion that we some­times have a choice about what you can­not change or even cause, or at least ten­ta­tively ac­cept it in or­der to ex­plore its ram­ifi­ca­tions, then we can go on now to ex­am­ine a well-known sci­ence fic­tion sce­nario called New­comb’s Prob­lem. In New­comb’s Prob­lem, a mischevi­ous bene­fac­tor pre­sents you with two boxes: there is a small, trans­par­ent box, con­tain­ing a thou­sand dol­lars, which you can see; and there is a larger, opaque box, which you are truth­fully told con­tains ei­ther a mil­lion dol­lars or noth­ing at all. You can’t see which; the box is opaque, and you are not al­lowed to ex­am­ine it. But you are truth­fully as­sured that the box has been sealed, and that its con­tents will not change from what­ever it already is.

[9:27]

You are now offered a very odd choice: you can take ei­ther the opaque box alone, or take both boxes, and you get to keep the con­tents of what­ever you take. That sure sounds like a no brainer:if we as­sume that max­i­miz­ing your ex­pected pay­off in this par­tic­u­lar en­counter is the sole rele­vant goal, then re­gard­less of what’s in the opaque box, there’s no benefit to fore­go­ing the ad­di­tional thou­sand dol­lars.

• Ap­par­ently 3 com­ments will be needed.

[9:51]

But, be­fore you choose, you are told how the bene­fac­tor de­cided how much money to put in the opaque box- and that brings us to the sci­ence fic­tion part of the sce­nario. What the bene­fac­tor did was take a very de­tailed lo­cal snap­shot of the state of the uni­verse a few min­utes ago, and then run a faster-than-real time simu­la­tion to pre­dict with high ac­cu­racy to pre­dict with high ac­cu­racy whether you would take both boxes, or just the opaque box. A mil­lion dol­lars was put in the opaque box if and only if you were pre­dicted to take only the opaque box.

[10:22]

Ad­mit­tedly the su­per-pre­dictabil­ity here is a bit phys­i­cally im­plau­si­ble, and goes be­yond a mere stipu­la­tion of de­ter­minism. Still, at least it’s not log­i­cally im­pos­si­ble- pro­vided that the simu­la­tor can avoid hav­ing to simu­late it­self, and thus avoid a po­ten­tial in­finite regress. (The opaque box’s opac­ity is im­por­tant in that re­gard: it serves to in­su­late you from be­ing effec­tively in­formed of the out­come of the simu­la­tion it­self, so the simu­la­tion doesn’t have to pre­dict its own out­come in or­der to pre­dict what you are go­ing to have to do.) So, let’s in­dulge the su­per-pre­dictabil­ity as­sump­tion, and see what comes from it. Even­tu­ally, I’m go­ing to ar­gue that the real world is at least de­ter­minis­tic enough and pre­dictable enough that some of the sci­ence-fic­tion con­clu­sions do carry over to re­al­ity.

[11:12]

So, you now face the fol­low­ing choice: if you take the opaque box alone, then you can ex­pect with high re­li­a­bil­ity that the simu­la­tion pre­dicted you would do so, and so you ex­pect to find a mil­lion dol­lars in the opaque box. If, on the other hand, you take both boxes, then you should ex­pect the simu­la­tion to have pre­dicted that, and you ex­pect to find noth­ing in the opaque box. If and only if you ex­pect to take the opaque box alone, you ex­pect to walk away with a mil­lion dol­lars. Of course, your choice does not cause the opaque box’s con­tent to be one way or the other; ac­cord­ing to the stipu­lated rules, the box con­tent already is what it is, and will not change from that re­gard­less of what choice you make.

[11:49]

But we can ap­ply the les­son from the handrais­ing ex­am­ple- the les­son that you some­times have a choice about things your ac­tion does not change or cause- be­cause you can rea­son about what would be the case if, per­haps con­trary to fact, you were to take a par­tic­u­lar hy­po­thet­i­cal ac­tion. And, in fact, we can re­gard New­comb’s Prob­lem as es­sen­tially har­ness­ing the same past pred­i­cate con­se­quence as in the handrais­ing ex­am­ple- namely, if and only if you take just the opaque box, then the past state of the uni­verse, at the time the pre­dic­tor took the de­tailed snap­shot was such that that state leads, by phys­i­cal laws, to your tak­ing just the opaque box. And, if and only if the past state was thus, the pre­dic­tor would pre­dict you tak­ing the opaque box alone, and so a mil­lion dol­lars would be in the opaque box, mak­ing that the more lu­cra­tive choice. And it’s cer­tainly the case that peo­ple who would make the opaque box choice have a much higher ex­pected gain from such en­coun­ters than those who take both boxes.

[12:47]

Still, it’s pos­si­ble to main­tain, as many peo­ple do, that tak­ing both boxes is the ra­tio­nal choice, and that the situ­a­tion is es­sen­tially rigged to pun­ish you for your pre­dicted ra­tio­nal­ity- much as if a writ­ten exam were per­versely graded to give points only for wrong an­swers. From that per­spec­tive, tak­ing both boxes is the ra­tio­nal choice, even if you are then left to lament your un­for­tu­nate ra­tio­nal­ity. But that per­spec­tive is, at the very least, highly sus­pect in a situ­a­tion where, un­like the hap­less exam-taker, you are in­formed of the rig­ging and can take it into ac­count when choos­ing your ac­tion, as you can in New­comb’s Prob­lem.

[13:31]

And, by the way, it’s pos­si­ble to con­sider an even stranger var­i­ant of New­comb’s Prob­lem, in which both boxes are trans­par­ent. In this ver­sion, the pre­dic­tor runs a simu­la­tion that ten­ta­tively pre­sumes that you’ll see a mil­lion dol­lars in the larger box. You’ll be pre­sented with a mil­lion dol­lars in the box for real if and only if the simu­la­tion shows that you would then take the mil­lion dol­lar box alone. If, in­stead, the simu­la­tion pre­dicts that you would take both boxes if you see a mil­lion dol­lars in the larger box, then the larger box is left empty when pre­sented for real.

[14:12]

So, let’s sup­pose you’re con­fronted with this sce­nario, and you do see a mil­lion dol­lars in the box when it’s pre­sented for real. Even though the mil­lion dol­lars is already there, and you see it, and it can’t change, nonethe­less I claim that you should still take the mil­lion dol­lar box alone. Be­cause, if you were to take both boxes in­stead, con­trary to what in fact must be the case in or­der for you to be in this situ­a­tion in the first place, then, also con­trary to what is in fact the case, the box would not con­tain a mil­lion dol­lars- even though in fact it does, and even though that can’t change! The same two-part rea­son­ing ap­plies as be­fore: if and only if you were to take just the larger box, then the state of the uni­verse at the time the pre­dic­tor takes a snap­shot must have been such that you would take just that box if you were to see a mil­lion dol­lars in that box. If and only if the past state had been thus, the Pre­dic­tor would have put a mil­lion dol­lars in the box.

[15:07]

Now, the pre­scrip­tion here to take just the larger box is more shock­ingly counter-in­tu­itive than I can hope to de­ci­sively ar­gue for in a brief talk, but, do at least note that a per­son who agrees that it is ra­tio­nal to take just the one box here does fare bet­ter than a per­son who be­lieves oth­er­wise, who would never be pre­sented with a mil­lion dol­lars in the first place. If we do, at least ten­ta­tively, ac­cept some of this anal­y­sis, for the sake of ar­gu­ment to see what fol­lows from it, then we can move on now to an­other toy sce­nario, which dis­penses with the de­ter­minism and su­per-pre­dic­tion as­sump­tions and ar­guably has more di­rect real world ap­pli­ca­bil­ity.

[15:42]

That sce­nario is the fa­mous pris­oner’s dilemma. The pris­oner’s dilemma is a two player game in which both play­ers make their moves si­mul­ta­neously and in­de­pen­dently, with no com­mu­ni­ca­tion un­til both moves have been made. A move con­sists of writ­ing down ei­ther the word “co­op­er­ate” or “defect.” The pay­off ma­trix is as shown:

[in­sert image of Pri­soner’s Dilemma pay­offs]

If both play­ers choose co­op­er­ate, they both re­ceive 99 dol­lars. If both defect, they both get 1 dol­lar. But if one player co­op­er­ates and the other defects, then the one who co­op­er­ates gets noth­ing, and the one who defects gets 100 dol­lars.

[16:25]

Cru­cially, we stipu­late that each player cares only about max­i­miz­ing her own ex­pected pay­off, and that the pay­off in this par­tic­u­lar in­stance of the game is the only goal, with no af­fect on any­thing else, in­clud­ing any sub­se­quent rounds of the game, that could fur­ther com­pli­cate the de­ci­sion. Let’s as­sume that both play­ers are smart and knowl­edge­able enough to find the cor­rect solu­tion to this prob­lem and to act ac­cord­ingly. What I mean by the cor­rect an­swer is the one that max­i­mizes that player’s ex­pected pay­off. Let’s fur­ther as­sume that each player is aware of the other player’s com­pe­tence, and their knowl­edge of their own com­pe­tence, and so on. So then, what is the right an­swer that they’ll both find?

[17:07]

On the face of it, it would be nice if both play­ers were to co­op­er­ate, and re­ceive close to the max­i­mum pay­off. But if I’m one of the play­ers, I might rea­son that y op­po­nent’s move is causally in­de­pen­dent of mine: re­gard­less of what I do, my op­po­nent’s move is ei­ther to co­op­er­ate or not. If my op­po­nent co­op­er­ates, I re­ceive a dol­lar more if I defect than if I co­op­er­ate- 100\$ vs 99\$. Like­wise if my op­po­nent defects: I get a dol­lar more if I defect than if I co­op­er­ate, in this case 1 dol­lar vs noth­ing. So, in ei­ther case, re­gard­less of what move my op­po­nent makes, my defected causes me to get one dol­lar more than my co­op­er­at­ing causes me to get, which seem­ingly makes defected the right choice. Defect­ing is in­deed the choice that’s en­dorsed by stan­dard game the­ory. And of course my op­po­nent can rea­son similarly.

[18:06]

So, if we’re both con­vinced that we only have a choice about what we can cause, then we’re both ra­tio­nally com­pel­led to defect, leav­ing us both much poorer than if we both co­op­er­ated. So, here again, an ex­clu­sively causal view of what we have a choice about leads to us hav­ing to lament that our un­for­tu­nate ra­tio­nal­ity keeps a much bet­ter out­come out of our reach. But we can ar­rive at a bet­ter out­come if we keep in mind the les­son from New­comb’s prob­lem or even the handrais­ing ex­am­ple that it can make sense to act for the sake of what would be the case if you so acted, even if your ac­tion does not cause it to be the case. Even with­out the help of any su­per-pre­dic­tors in this sce­nario, I can rea­son that if I, act­ing by stipu­la­tion as a cor­rect solver of this prob­lem, were to choose to co­op­er­ate, then that’s what cor­rect solvers of this prob­lem do in such situ­a­tions, and in par­tic­u­lar that’s what my op­po­nent, as a cor­rect solver of this prob­lem, does too.

• [19:05]

Similarly, if I were to figure out that defect­ing is cor­rect, that’s what I can ex­pect my op­po­nent to do. This is similar to my abil­ity to pre­dict what your an­swer to adding a given pair of num­bers would be: I can merely add the num­bers my­self, and, given our mu­tual com­pe­tence at ad­di­tion, solve the prob­lem. The uni­verse is pre­dictable enough that we rou­tinely, and fairly ac­cu­rately, make such pre­dic­tions about one an­other. From this view­point, I can rea­son that, if I were to co­op­er­ate or not, then my op­po­nent would make the cor­re­spond­ing choice- if in­deed we are both cor­rectly solv­ing the same prob­lem, my op­po­nent max­i­miz­ing his ex­pected pay­off just as I max­i­mize mine. I there­fore act for the sake of what my op­po­nent’s ac­tion would then be, even though I can­not causally in­fluence my op­po­nent to take one ac­tion or the other, since there is no com­mu­ni­ca­tion be­tween us. Ac­cord­ingly, I co­op­er­ate, and so does my op­po­nent, us­ing similar rea­son­ing, and we both do fairly well.

[20:05]

One prob­lem with the Pri­soner’s Dilemma is that the ideal­ized de­gree of sym­me­try that’s pos­tu­lated be­tween the two play­ers may sel­dom oc­cur in real life. But there are some im­por­tant gen­er­al­iza­tions that may ap­ply much more broadly. In par­tic­u­lar, in many situ­a­tions, the benefi­ciary of your co­op­er­a­tion may not be the same as the per­son whose co­op­er­a­tion benefits you. In­stead, your de­ci­sion whether to co­op­er­ate with one per­son may be sym­met­ric to a differ­ent per­son’s de­ci­sion to co­op­er­ate with you. Again, even in the ab­sence of any causal in­fluence upon your po­ten­tial bene­fac­tors, even if they will never learn of your co­op­er­a­tion with oth­ers, and even, more­over, if you already know of their co­op­er­a­tion with you be­fore you make your own choice. That is analo­gous to the trans­par­ent ver­sion of New­comb’s Prob­lem: there too, you act for the same of some­thing that you already know is already ob­tained.

[21:04]

Any­ways, as many au­thors have noted with re­gards to the Pri­soner’s Dilemma, this is be­gin­ning to sound a lit­tle like the Golden Rule or the Cat­e­gor­i­cal Im­per­a­tive: act to­wards oth­ers as you would like oth­ers to act to­wards you, in similar situ­a­tions. The anal­y­sis in terms of coun­ter­fac­tual rea­son­ing pro­vides a ra­tio­nale, un­der some cir­cum­stances, for tak­ing an ac­tion that causes net harm to your own in­ter­ests and net benefit to oth­ers’ in­ter­ests al­though the choice is still ul­ti­mately grounded in your own goals be­cause of what would be the case be­cause of oth­ers’ iso­mor­phic be­hav­ior if you your­self were to co­op­er­ate or not. Hav­ing a de­rive­able ra­tio­nale for eth­i­cal or moral be­havi­our would be de­sir­able for all sorts of rea­sons, not least of which is to help us make the mo­men­tous de­ci­sions as to how or even whether to en­g­ineer the Sin­gu­lar­ity.

There’s about 2 more min­utes of his pre­sen­ta­tion be­fore he finished, but it looks like he just made some com­par­i­sons with TDT, so I’m too lazy to copy it over.

• Maybe you should post the tran­script as an ar­ti­cle. Other users have posted talk tran­scripts be­fore, and they were gen­er­ally well re­ceived.

• Great idea, thanks!

• p.s.: if you thought this was a use­less/​mis­lead­ing com­ment, you should have bloody told me so in­stead of cast­ing your silent and un­helpful −1.

Your com­ment is nei­ther use­less nor mis­lead­ing (tak­ing into ac­count the sig­nifi­cant use of qual­ifiers) but if I had hap­pened to view your com­ment nega­tively I would not ac­cept this obli­ga­tion to ‘bloody’ ex­plain my­self. The main prob­lem in this com­ment seems to be the swear­ing at down­vot­ers. A query or even (in this case) an out­right as­ser­tion that the judge­ment is flawed would come across bet­ter.

• See also

(My un­der­stand­ing is that TDT and UDT can both be seen as “im­ple­men­ta­tions” of su­per­ra­tional­ity.)

• In­ter­est­ing. There’s a para­dox in­volv­ing a game in which play­ers suc­ces­sively take a sin­gle coin from a large pile of coins. At any time a player may choose in­stead to take two coins, at which point the game ends and all fur­ther coins are lost. You can prove by in­duc­tion that if both play­ers are perfectly self­ish, they will take two coins on their first move, no mat­ter how large the pile is.

I’m pretty sure this proof only works if the coins are de­nom­i­nated in utilons.

• I see this dis­cus­sion over the last sev­eral months bounc­ing around, teas­ingly close to a co­her­ent re­s­olu­tion of the os­ten­si­ble sub­jec­tive/​ob­jec­tive di­chotomy ap­plied to eth­i­cal de­ci­sion-mak­ing. As a per­haps per­ti­nent meta-ob­ser­va­tion, my ini­tial sen­tence may pro­mul­gate the con­fu­sion with its ex­pe­di­tious word­ing of “ap­plied to eth­i­cal de­ci­sion-mak­ing” rather than a more ac­cu­rate phras­ing such as “ap­plied to de­ci­sion-mak­ing as­sessed as in­creas­ingly eth­i­cal over in­creas­ing con­text.”

Those who in the cur­rent thread re­fer to the es­sen­tial el­e­ment of em­pa­thy or similar­ity (of self mod­els) come close. It’s im­por­tant to re­al­ize that any agent always only ex­presses its na­ture within its en­vi­ron­ment—as­sess­ments of “right­ness” arise only in the larger con­text (of ad­di­tional agents, ad­di­tional ex­pe­riences of the one agent, etc.)

Our lan­guage and our cul­ture re­in­force an as­sump­tion of an on­tolog­i­cal “right­ness” that per­vades our think­ing on these mat­ters. An even greater (per­ceived) difficulty is that to re­lin­quish on­tolog­i­cal “right­ness” en­tails ul­ti­mately re­lin­quish­ing an on­tolog­i­cal “self”. But to re­lin­quish such ul­ti­mately un­founded be­liefs is to gain clar­ity and co­her­ence while giv­ing up noth­ing ac­tual at all.

“Su­per­ra­tional­ity” is an effec­tive wrap­per around these ap­par­ent dilem­mas, but even pro­po­nents such as Hofs­tadter con­fused de­scrip­tion with pre­scrip­tion in this re­gard. Para­dox is always only a mat­ter of in­suffi­cient con­text. In the big­ger pic­ture all the pieces must fit. [Or as Eliezer has taken to say­ing re­cently: “It all adds up to nor­malcy.”

Apolo­gies if my brief pok­ings and prod­dings on this topic ap­pear vague or even mys­ti­cal. I can only as­sert within this limited space and band­width that my back­ground in sci­ence, en­g­ineer­ing and busi­ness is far from that of one who could har­bor vague­ness, rel­a­tivism, mys­ti­cism, or post­mod­ernist pat­terns of thought. I ap­pre­ci­ate the depth and breadth of Eliezer’s writ­ten ex­plo­ra­tions of this is­sue whereas I lack the time to do so my­self.

• De­spite the dis­guise, I think this is the same as the stan­dard PD. In there (as­sum­ing full util­ities, etc...), the ob­vi­ous ideal for an im­par­tial ob­server is to pick (C,C) as the best op­tion, and for the pris­oner to pick (D,C).

Here, (D,C) is “righter” than (C,C), but that’s sim­ply be­cause we are no longer im­par­tial obervers; hu­mans shouldn’t re­main im­par­tial when billions of lives are at stake. We are all in the role of “pris­on­ers” in this situ­a­tion, even as ob­servers.

An “im­par­tial ob­server” would sim­ply be one that val­ued one billion hu­man lives the same as one pa­per clip. They would see us as a sim­ple pris­oner, in the same situ­a­tion as the stan­dard PD, with the same over­all solu­tion - (C,C).

• This is an old post and prob­a­bly very out of date, but: I think if you try to define an im­par­tial ob­server’s prefer­ences as what­ever se­lects (C,C) in two other agents’ PD, you get in­con­sis­ten­cies very rapidly once you have one of those agents stuck in two Pri­soner’s Dilem­mas at once.

I also don’t think we should use eu­phemisms like ‘im­par­tial’ for an in­cred­ibly par­tial Co­op­er­a­tion Fetishist that’s will­ing to give up ev­ery­thing else of value (e.g., billions of hu­man lives) to go through the mo­tions of satis­fy­ing non-sen­tient pro­cesses like sea slugs or pa­per­clip max­i­miz­ers.

• you get in­con­sis­ten­cies very rapidly once you have one of those agents stuck in two Pri­soner’s Dilem­mas at once.

Multi-player in­ter­ac­tions are tricky and we don’t have a good solu­tion for them yet.

that’s will­ing to give up ev­ery­thing else of value (e.g., billions of hu­man lives)

It’s not that its will­ing to give up ev­ery­thing of value—it’s that it doesn’t have our val­ues. Without shar­ing our val­ues, there’s no rea­son for it to pre­fer our opinions over sea slugs.

• Co­op­er­ate. I am not play­ing against just this one guy, but any fu­ture PD op­po­nents. Hope the max­i­mizer lives in a uni­verse where it has to worry about this same calcu­lus. It will defect if it is already the biggest bad in its uni­verse.

• I would make a ran­dom de­ci­sion (us­ing ran­dom num­ber gen­er­a­tor), since the only al­ter­na­tive I see is an in­finite re­cur­sion of thoughts about the Max­i­mizer. There is not enough in­for­ma­tion in this case to even as­sign non-triv­ial prob­a­bil­ities to Max­i­mizer’s C and D pos­si­ble de­ci­sions be­cause I don’t know how the Max­i­mizer thinks. But even as­sum­ing that the Max­i­mizer uses the same rea­son as I do, only with differ­ent value sys­tem, still I see no es­cape from the re­cur­sion. So a:=ran­dom(0..1);if a>0.5 then C else D (is it ra­tio­nal to ex­pect such prob­a­bil­ities?) and press en­ter… If the de­ci­sion pro­cess doesn’t con­verge, we should take ei­ther pos­si­bil­ity, I think.

• This is not a pris­oner’s dilemma. The nash equil­ibrium (C,C) is not dom­i­nated by a pareto op­ti­mal point in this game.

Although an in­ter­est­ing game, it re­quires in­ter­per­sonal com­par­i­sons of util­ity to make your point that the Nash equil­ibrium is not as good as an­other out­come.

Gen­er­ally, we as­sume that com­par­i­sons already show up in the util­ity func­tions of each player.

BTW, I think your non-stan­dard no­ta­tion is con­fus­ing.