The True Prisoner’s Dilemma

It oc­curred to me one day that the stan­dard vi­su­al­iza­tion of the Pri­soner’s Dilemma is fake.

The core of the Pri­soner’s Dilemma is this sym­met­ric pay­off ma­trix:

1: C 1: D
2: C (3, 3) (5, 0)
2: D (0, 5) (2, 2)

Player 1, and Player 2, can each choose C or D. 1 and 2′s util­ity for the fi­nal out­come is given by the first and sec­ond num­ber in the pair. For rea­sons that will be­come ap­par­ent, “C” stands for “co­op­er­ate” and D stands for “defect”.

Ob­serve that a player in this game (re­gard­ing them­selves as the first player) has this prefer­ence or­der­ing over out­comes: (D, C) > (C, C) > (D, D) > (C, D).

D, it would seem, dom­i­nates C: If the other player chooses C, you pre­fer (D, C) to (C, C); and if the other player chooses D, you pre­fer (D, D) to (C, D). So you wisely choose D, and as the pay­off table is sym­met­ric, the other player like­wise chooses D.

If only you’d both been less wise! You both pre­fer (C, C) to (D, D). That is, you both pre­fer mu­tual co­op­er­a­tion to mu­tual defec­tion.

The Pri­soner’s Dilemma is one of the great foun­da­tional is­sues in de­ci­sion the­ory, and enor­mous vol­umes of ma­te­rial have been writ­ten about it. Which makes it an au­da­cious as­ser­tion of mine, that the usual way of vi­su­al­iz­ing the Pri­soner’s Dilemma has a se­vere flaw, at least if you hap­pen to be hu­man.

The clas­sic vi­su­al­iza­tion of the Pri­soner’s Dilemma is as fol­lows: you are a crim­i­nal, and you and your con­fed­er­ate in crime have both been cap­tured by the au­thor­i­ties.

In­de­pen­dently, with­out com­mu­ni­cat­ing, and with­out be­ing able to change your mind af­ter­ward, you have to de­cide whether to give tes­ti­mony against your con­fed­er­ate (D) or re­main silent (C).

Both of you, right now, are fac­ing one-year prison sen­tences; tes­tify­ing (D) takes one year off your prison sen­tence, and adds two years to your con­fed­er­ate’s sen­tence.

Or maybe you and some stranger are, only once, and with­out know­ing the other player’s his­tory, or find­ing out who the player was af­ter­ward, de­cid­ing whether to play C or D, for a pay­off in dol­lars match­ing the stan­dard chart.

And, oh yes—in the clas­sic vi­su­al­iza­tion you’re sup­posed to pre­tend that you’re en­tirely self­ish, that you don’t care about your con­fed­er­ate crim­i­nal, or the player in the other room.

It’s this last speci­fi­ca­tion that makes the clas­sic vi­su­al­iza­tion, in my view, fake.

You can’t avoid hind­sight bias by in­struct­ing a jury to pre­tend not to know the real out­come of a set of events. And with­out a com­pli­cated effort backed up by con­sid­er­able knowl­edge, a neu­rolog­i­cally in­tact hu­man be­ing can­not pre­tend to be gen­uinely, truly self­ish.

We’re born with a sense of fair­ness, honor, em­pa­thy, sym­pa­thy, and even al­tru­ism—the re­sult of our an­ces­tors adapt­ing to play the iter­ated Pri­soner’s Dilemma. We don’t re­ally, truly, ab­solutely and en­tirely pre­fer (D, C) to (C, C), though we may en­tirely pre­fer (C, C) to (D, D) and (D, D) to (C, D). The thought of our con­fed­er­ate spend­ing three years in prison, does not en­tirely fail to move us.

In that locked cell where we play a sim­ple game un­der the su­per­vi­sion of eco­nomic psy­chol­o­gists, we are not en­tirely and ab­solutely un­sym­pa­thetic for the stranger who might co­op­er­ate. We aren’t en­tirely happy to think what we might defect and the stranger co­op­er­ate, get­ting five dol­lars while the stranger gets noth­ing.

We fix­ate in­stinc­tively on the (C, C) out­come and search for ways to ar­gue that it should be the mu­tual de­ci­sion: “How can we en­sure mu­tual co­op­er­a­tion?” is the in­stinc­tive thought. Not “How can I trick the other player into play­ing C while I play D for the max­i­mum pay­off?”

For some­one with an im­pulse to­ward al­tru­ism, or honor, or fair­ness, the Pri­soner’s Dilemma doesn’t re­ally have the crit­i­cal pay­off ma­trix—what­ever the fi­nan­cial pay­off to in­di­vi­d­u­als. (C, C) > (D, C), and the key ques­tion is whether the other player sees it the same way.

And no, you can’t in­struct peo­ple be­ing ini­tially in­tro­duced to game the­ory to pre­tend they’re com­pletely self­ish—any more than you can in­struct hu­man be­ings be­ing in­tro­duced to an­thro­po­mor­phism to pre­tend they’re ex­pected pa­per­clip max­i­miz­ers.

To con­struct the True Pri­soner’s Dilemma, the situ­a­tion has to be some­thing like this:

Player 1: Hu­man be­ings, Friendly AI, or other hu­mane in­tel­li­gence.

Player 2: UnFriendly AI, or an alien that only cares about sort­ing peb­bles.

Let’s sup­pose that four billion hu­man be­ings—not the whole hu­man species, but a sig­nifi­cant part of it—are cur­rently pro­gress­ing through a fatal dis­ease that can only be cured by sub­stance S.

How­ever, sub­stance S can only be pro­duced by work­ing with a pa­per­clip max­i­mizer from an­other di­men­sion—sub­stance S can also be used to pro­duce pa­per­clips. The pa­per­clip max­i­mizer only cares about the num­ber of pa­per­clips in its own uni­verse, not in ours, so we can’t offer to pro­duce or threaten to de­stroy pa­per­clips here. We have never in­ter­acted with the pa­per­clip max­i­mizer be­fore, and will never in­ter­act with it again.

Both hu­man­ity and the pa­per­clip max­i­mizer will get a sin­gle chance to seize some ad­di­tional part of sub­stance S for them­selves, just be­fore the di­men­sional nexus col­lapses; but the seizure pro­cess de­stroys some of sub­stance S.

The pay­off ma­trix is as fol­lows:

1: C 1: D
2: C (2 billion hu­man lives saved, 2 pa­per­clips gained) (+3 billion lives, +0 pa­per­clips)
2: D (+0 lives, +3 pa­per­clips) (+1 billion lives, +1 pa­per­clip)

I’ve cho­sen this pay­off ma­trix to pro­duce a sense of in­dig­na­tion at the thought that the pa­per­clip max­i­mizer wants to trade off billions of hu­man lives against a cou­ple of pa­per­clips. Clearly the pa­per­clip max­i­mizer should just let us have all of sub­stance S; but a pa­per­clip max­i­mizer doesn’t do what it should, it just max­i­mizes pa­per­clips.

In this case, we re­ally do pre­fer the out­come (D, C) to the out­come (C, C), leav­ing aside the ac­tions that pro­duced it. We would vastly rather live in a uni­verse where 3 billion hu­mans were cured of their dis­ease and no pa­per­clips were pro­duced, rather than sac­ri­fice a billion hu­man lives to pro­duce 2 pa­per­clips. It doesn’t seem right to co­op­er­ate, in a case like this. It doesn’t even seem fair—so great a sac­ri­fice by us, for so lit­tle gain by the pa­per­clip max­i­mizer? And let us spec­ify that the pa­per­clip-agent ex­pe­riences no pain or plea­sure—it just out­puts ac­tions that steer its uni­verse to con­tain more pa­per­clips. The pa­per­clip-agent will ex­pe­rience no plea­sure at gain­ing pa­per­clips, no hurt from los­ing pa­per­clips, and no painful sense of be­trayal if we be­tray it.

What do you do then? Do you co­op­er­ate when you re­ally, definitely, truly and ab­solutely do want the high­est re­ward you can get, and you don’t care a tiny bit by com­par­i­son about what hap­pens to the other player? When it seems right to defect even if the other player co­op­er­ates?

That’s what the pay­off ma­trix for the true Pri­soner’s Dilemma looks like—a situ­a­tion where (D, C) seems righter than (C, C).

But all the rest of the logic—ev­ery­thing about what hap­pens if both agents think that way, and both agents defect—is the same. For the pa­per­clip max­i­mizer cares as lit­tle about hu­man deaths, or hu­man pain, or a hu­man sense of be­trayal, as we care about pa­per­clips. Yet we both pre­fer (C, C) to (D, D).

So if you’ve ever prided your­self on co­op­er­at­ing in the Pri­soner’s Dilemma… or ques­tioned the ver­dict of clas­si­cal game the­ory that the “ra­tio­nal” choice is to defect… then what do you say to the True Pri­soner’s Dilemma above?