The True Pris­oner’s Dilemma

It oc­curred to me one day that the stand­ard visu­al­iz­a­tion of the Pris­oner’s Di­lemma is fake.

The core of the Pris­oner’s Di­lemma is this sym­met­ric pay­off mat­rix:

1: C 1: D
2: C (3, 3) (5, 0)
2: D (0, 5) (2, 2)

Player 1, and Player 2, can each choose C or D. 1 and 2′s util­ity for the fi­nal out­come is given by the first and second num­ber in the pair. For reas­ons that will be­come ap­par­ent, “C” stands for “co­oper­ate” and D stands for “de­fect”.

Ob­serve that a player in this game (re­gard­ing them­selves as the first player) has this pref­er­ence or­der­ing over out­comes: (D, C) > (C, C) > (D, D) > (C, D).

D, it would seem, dom­in­ates C: If the other player chooses C, you prefer (D, C) to (C, C); and if the other player chooses D, you prefer (D, D) to (C, D). So you wisely choose D, and as the pay­off table is sym­met­ric, the other player like­wise chooses D.

If only you’d both been less wise! You both prefer (C, C) to (D, D). That is, you both prefer mu­tual co­oper­a­tion to mu­tual de­fec­tion.

The Pris­oner’s Di­lemma is one of the great found­a­tional is­sues in de­cision the­ory, and enorm­ous volumes of ma­ter­ial have been writ­ten about it. Which makes it an au­da­cious as­ser­tion of mine, that the usual way of visu­al­iz­ing the Pris­oner’s Di­lemma has a severe flaw, at least if you hap­pen to be hu­man.

The clas­sic visu­al­iz­a­tion of the Pris­oner’s Di­lemma is as fol­lows: you are a crim­inal, and you and your con­fed­er­ate in crime have both been cap­tured by the au­thor­it­ies.

Independ­ently, without com­mu­nic­at­ing, and without be­ing able to change your mind af­ter­ward, you have to de­cide whether to give testi­mony against your con­fed­er­ate (D) or re­main si­lent (C).

Both of you, right now, are fa­cing one-year prison sen­tences; testi­fy­ing (D) takes one year off your prison sen­tence, and adds two years to your con­fed­er­ate’s sen­tence.

Or maybe you and some stranger are, only once, and without know­ing the other player’s his­tory, or find­ing out who the player was af­ter­ward, de­cid­ing whether to play C or D, for a pay­off in dol­lars match­ing the stand­ard chart.

And, oh yes—in the clas­sic visu­al­iz­a­tion you’re sup­posed to pre­tend that you’re en­tirely selfish, that you don’t care about your con­fed­er­ate crim­inal, or the player in the other room.

It’s this last spe­cific­a­tion that makes the clas­sic visu­al­iz­a­tion, in my view, fake.

You can’t avoid hind­sight bias by in­struct­ing a jury to pre­tend not to know the real out­come of a set of events. And without a com­plic­ated ef­fort backed up by con­sid­er­able know­ledge, a neur­o­lo­gic­ally in­tact hu­man be­ing can­not pre­tend to be genu­inely, truly selfish.

We’re born with a sense of fair­ness, honor, em­pathy, sym­pathy, and even al­tru­ism—the res­ult of our an­cest­ors ad­apt­ing to play the it­er­ated Pris­oner’s Di­lemma. We don’t really, truly, ab­so­lutely and en­tirely prefer (D, C) to (C, C), though we may en­tirely prefer (C, C) to (D, D) and (D, D) to (C, D). The thought of our con­fed­er­ate spend­ing three years in prison, does not en­tirely fail to move us.

In that locked cell where we play a simple game un­der the su­per­vi­sion of eco­nomic psy­cho­lo­gists, we are not en­tirely and ab­so­lutely un­sym­path­etic for the stranger who might co­oper­ate. We aren’t en­tirely happy to think what we might de­fect and the stranger co­oper­ate, get­ting five dol­lars while the stranger gets noth­ing.

We fix­ate in­stinct­ively on the (C, C) out­come and search for ways to ar­gue that it should be the mu­tual de­cision: “How can we en­sure mu­tual co­oper­a­tion?” is the in­stinct­ive thought. Not “How can I trick the other player into play­ing C while I play D for the max­imum pay­off?”

For someone with an im­pulse to­ward al­tru­ism, or honor, or fair­ness, the Pris­oner’s Di­lemma doesn’t really have the crit­ical pay­off mat­rix—whatever the fin­an­cial pay­off to in­di­vidu­als. (C, C) > (D, C), and the key ques­tion is whether the other player sees it the same way.

And no, you can’t in­struct people be­ing ini­tially in­tro­duced to game the­ory to pre­tend they’re com­pletely selfish—any more than you can in­struct hu­man be­ings be­ing in­tro­duced to an­thro­po­morph­ism to pre­tend they’re ex­pec­ted pa­per­clip max­im­izers.

To con­struct the True Pris­oner’s Di­lemma, the situ­ation has to be some­thing like this:

Player 1: Hu­man be­ings, Friendly AI, or other hu­mane in­tel­li­gence.

Player 2: UnFriendly AI, or an alien that only cares about sort­ing pebbles.

Let’s sup­pose that four bil­lion hu­man be­ings—not the whole hu­man spe­cies, but a sig­ni­fic­ant part of it—are cur­rently pro­gress­ing through a fatal dis­ease that can only be cured by sub­stance S.

However, sub­stance S can only be pro­duced by work­ing with a pa­per­clip max­im­izer from an­other di­men­sion—sub­stance S can also be used to pro­duce pa­per­clips. The pa­per­clip max­im­izer only cares about the num­ber of pa­per­clips in its own uni­verse, not in ours, so we can’t of­fer to pro­duce or threaten to des­troy pa­per­clips here. We have never in­ter­ac­ted with the pa­per­clip max­im­izer be­fore, and will never in­ter­act with it again.

Both hu­man­ity and the pa­per­clip max­im­izer will get a single chance to seize some ad­di­tional part of sub­stance S for them­selves, just be­fore the di­men­sional nexus col­lapses; but the seizure pro­cess des­troys some of sub­stance S.

The pay­off mat­rix is as fol­lows:

1: C 1: D
2: C (2 bil­lion hu­man lives saved, 2 pa­per­clips gained) (+3 bil­lion lives, +0 pa­per­clips)
2: D (+0 lives, +3 pa­per­clips) (+1 bil­lion lives, +1 pa­per­clip)

I’ve chosen this pay­off mat­rix to pro­duce a sense of in­dig­na­tion at the thought that the pa­per­clip max­im­izer wants to trade off bil­lions of hu­man lives against a couple of pa­per­clips. Clearly the pa­per­clip max­im­izer should just let us have all of sub­stance S; but a pa­per­clip max­im­izer doesn’t do what it should, it just max­im­izes pa­per­clips.

In this case, we really do prefer the out­come (D, C) to the out­come (C, C), leav­ing aside the ac­tions that pro­duced it. We would vastly rather live in a uni­verse where 3 bil­lion hu­mans were cured of their dis­ease and no pa­per­clips were pro­duced, rather than sac­ri­fice a bil­lion hu­man lives to pro­duce 2 pa­per­clips. It doesn’t seem right to co­oper­ate, in a case like this. It doesn’t even seem fair—so great a sac­ri­fice by us, for so little gain by the pa­per­clip max­im­izer? And let us spe­cify that the pa­per­clip-agent ex­per­i­ences no pain or pleas­ure—it just out­puts ac­tions that steer its uni­verse to con­tain more pa­per­clips. The pa­per­clip-agent will ex­per­i­ence no pleas­ure at gain­ing pa­per­clips, no hurt from los­ing pa­per­clips, and no pain­ful sense of be­trayal if we be­tray it.

What do you do then? Do you co­oper­ate when you really, def­in­itely, truly and ab­so­lutely do want the highest re­ward you can get, and you don’t care a tiny bit by com­par­ison about what hap­pens to the other player? When it seems right to de­fect even if the other player co­oper­ates?

That’s what the pay­off mat­rix for the true Pris­oner’s Di­lemma looks like—a situ­ation where (D, C) seems righter than (C, C).

But all the rest of the lo­gic—everything about what hap­pens if both agents think that way, and both agents de­fect—is the same. For the pa­per­clip max­im­izer cares as little about hu­man deaths, or hu­man pain, or a hu­man sense of be­trayal, as we care about pa­per­clips. Yet we both prefer (C, C) to (D, D).

So if you’ve ever prided your­self on co­oper­at­ing in the Pris­oner’s Di­lemma… or ques­tioned the ver­dict of clas­sical game the­ory that the “ra­tional” choice is to de­fect… then what do you say to the True Pris­oner’s Di­lemma above?