Prisoners’ Dilemma with Costs to Modeling

We con­sider a mod­ifi­ca­tion to the open source pris­on­ers’ dilemma in which agents must pay some re­sources to model each other. We will use the modal com­bat frame­work, but where agents pay a cost pro­por­tional to the depth of boxes in their code. Even a small mod­el­ing penalty makes the FairBot-FairBot out­come no longer an equil­ibrium, since the best re­sponse to FairBot is to be Co­op­er­ateBot and not pay the mod­el­ing penalty. The best re­sponse to Co­op­er­ateBot is to be Defec­tBot, and the pure Defec­tBot-Defec­tBot out­come is a sta­ble Nash equil­ibrium. In fact, I be­lieve that Defec­tBot-Defec­tBot is the unique pure strat­egy Nash equil­ibrium.

Amaz­ingly, this turns out to be okay! For small mod­el­ing penalties, there is a mixed strat­egy equil­ibrium which mixes be­tween Co­op­er­ateBot, FairBot, and Pru­den­tBot! Both play­ers get ex­actly the same util­ity in ex­pec­ta­tion as the FairBot-FairBot out­come.

Fur­ther, if you con­sider an evolu­tion­ary sys­tem where pop­u­la­tions re­pro­duce in pro­por­tion to how well they do in pris­on­ers’ dilem­mas with each other, it ap­pears that as the mod­el­ing penalty gets small, the basin of the defect equil­ibrium also gets small, and nearly all ini­tial con­di­tions cy­cle around Co­op­er­ateBot, FairBot, and Pru­den­tBot!

This post came out of con­ver­sa­tions with Sam Eisen­stat, Abram Dem­ski, Tsvi Ben­son-Tilsen, and An­drew Critch. It is a first draft that could use a coau­thor to care­fully check ev­ery­thing, ex­pand on it, and turn it into a pa­per. If you think you could do that with min­i­mal guidance from me, let me know.


We will be us­ing the modal com­bat frame­work, and iden­ti­fy­ing with co­op­er­a­tion and with defec­tion. Agents are defined to for­mu­las that com­bine the other agent run on var­i­ous agents us­ing propo­si­tional calcu­lus and a modal op­er­a­tor . The rep­re­sents prov­abil­ity, and ev­ery in­stance of run on an agent in the for­mula must be con­tained within a . Re­call some com­mon modal agents:

Co­op­er­ateBot is defined by .

Defec­tBot is defined by .

FairBot is defined by .

Pru­den­tBot is defined by .

Th­ese 4 agents in­ter­act with each other as fol­lows: Co­op­er­ateBot co­op­er­ates with ev­ery­one. Defec­tBot defects against ev­ery­one. FairBot defects against only Defec­tBot. Pru­den­tBot defects against Co­op­er­ateBot and Defec­tBot and co­op­er­ates with it­self and FairBot.

We will say that the depth of an agent is the max­i­mum of the depth of s in its code and the depth of the agents that it calls the op­po­nent on. Co­op­er­ateBot and Defec­tBot have depth 0, FairBot has depth 1, and Pru­den­tBot has depth 2.

We will use a pris­oner’s dilemma where mu­tual co­op­er­a­tion pro­duces util­ity 2, mu­tual defic­tion pro­duces util­ity 1, and ex­ploita­tion pro­duces util­ity 3 for the ex­ploiter and 0 for the ex­ploited. Each player will also pay a penalty of times its depth.

Pure Equilibria

The best re­sponse to both Co­op­er­ateBot and Defec­tBot is Defec­tBot, since when the op­po­nent does not de­pend on you, you want to defect with the least pos­si­ble penalty.

The best re­sponse to FairBot is Co­op­er­ateBot, since you can’t ex­ploit FairBot, so you want to get mu­tual co­op­er­a­tion with the least pos­si­ble penalty.

The best re­sponse to Pru­den­tBot is FairBot, since you can’t ex­ploit Pru­den­tBot, you can’t mu­tu­ally co­op­er­ate with penalty 0, but you can mu­tu­ally co­op­er­ate with penalty 1 by be­ing FairBot. (This is as­sum­ing is at less than . Other­wise, you just want to defect to avoid the penalty.)

Thus, if the only op­tions are Co­op­er­ateBot, Defec­tBot, FairBot, and Pru­den­tBot, the unique pure strat­egy equil­ibrium is mu­tual Defec­tBot.

I be­lieve that Defec­tBot is the only pure strat­egy equil­ibrium in gen­eral. This would fol­low di­rectly from the fact that if a depth agent co­op­er­ates with an­other depth agent , then there ex­ists a depth agent which also co­op­er­ates with. I be­lieve this is true, but haven’t proven it yet. If it turns out to be false, there might be a sim­ple mod­ifi­ca­tion of the penalties that makes it true.

Mixed Equilibria

First, let’s look at Mixed Equil­ibria in which the only op­tions are the four modal agents above.

Let , , , and be the prob­a­bil­ities with which player is Defec­tBot, Co­op­er­ateBot, FairBot, and Pru­den­tBot re­spec­tively.

The util­ity of play­ing Defec­tBot is , where is the other player.

The util­ity of play­ing Co­op­er­ateBot is , where is the other player.

The util­ity of play­ing FairBot is , where is the other player.

The util­ity of play­ing Pru­den­tBot is , where is the other player.

Note that if one player is in­differ­ent be­tween Co­op­er­ateBot and Defec­tBot, this means the other player plays FairBot ex­actly of the time, but then the first player would be bet­ter off play­ing Pru­den­tBot (as long as ). Thus no player can ever ran­dom­ize be­tween Co­op­er­ateBot and Defec­tBot.

If a player doesn’t play Co­op­er­ateBot at all, the other player can’t play Pru­dent Bot, since FairBot is strictly bet­ter. Thus, if both play­ers don’t play Co­op­er­ateBot, they both play only Defec­tBot and FairBot. If this isn’t the pure defect solu­tion, it must be be­cause and for both play­ers, which is in­deed a (not very good) Nash equil­ibrium.

If ex­actly one player plays Co­op­er­ateBot at all, the first player mixes be­tween Co­op­er­ateBot and FairBot, while the other player mixes be­tween (at most) Defec­tBot, FairBot and Pru­den­tBot. The sec­ond player can’t play FairBot, since Co­op­er­ateBot would have done strictly bet­ter. Thus the first player gets 0 util­ity for play­ing Co­op­er­ateBot, con­tra­dict­ing the fact that it is im­pos­si­ble to get 0 ex­pected util­ity play­ing FairBot.

Fi­nally, we have the case where both play­ers play Co­op­er­ateBot with some prob­a­bil­ity, and thus nei­ther plays Defec­tBot at all. If ei­ther player did not play FairBot at all, then Defec­tBot would strictly dom­i­nate Co­op­er­ateBot for the other player, con­tra­dict­ing the fact that both play­ers play Co­op­er­ateBot. If ei­ther player did not play Pru­den­tBot at all, Co­op­er­ateBot would strictly dom­i­nate FairBot, con­tra­dict­ing the fact that both play­ers play FairBot.

Thus, in the only re­main­ing equil­ibria, both play­ers mix be­tween all of Co­op­er­ateBot, FairBot, and Pru­den­tBot. Since both play­ers are in­differ­ent be­tween Co­op­er­ateBot and FairBot, both play­ers must play Pru­den­tBot with prob­a­bil­ity ex­actly . Since both play­ers are in­differ­ent be­tween FairBot and Pru­den­tBot, both play­ers must play Co­op­er­ateBot with prob­a­bil­ity ex­actly . This leaves prob­a­bil­ity for FairBot, and the re­sult is in­deed a Nash equil­ibrium.

We have a to­tal of three Nash equil­ibria. All three are sym­met­ric. The first two are bad and both play­ers get the ex­pected util­ity . The last one is good and both play­ers get ex­pected util­ity .

Next, we have to show that this out­come is also an equil­ibrium in the game where both play­ers can play any modal agent. If an­other agent were to get util­ity more than against this Pru­den­tBot, Co­op­er­ateBot, FairBot com­bi­na­tion, it would clearly have to be depth , since the only depth agents are Co­op­er­ateBot and Defec­tBot, and depth Pru­den­tBot already has the perfect be­hav­ior against these 3 agents. It would also have to mu­tu­ally co­op­er­ate with FairBot, and defect against Co­op­er­ateBot.

Sup­pose a depth agent prov­ably co­op­er­ates with FairBot and defects against Co­op­er­ateBot. Note that since is depth , knows what is, and so knows that . How­ever also know that . Thus knows , so knows , con­tra­dic­tion. There­fore, no agent of depth can have the de­sired be­hav­ior, and our good Nash equil­ibrium is in fact a Nash equil­ibrium of the game where both play­ers can play any modal agent.

Evolu­tion­ary Simulations

Next, we will con­sider what hap­pens when a large pop­u­la­tion of modal agents evolves by play­ing pris­on­ers’ dilem­mas. We will con­sider a sys­tem con­sist­ing only of Defec­tBot, Co­op­er­ateBot, FairBot, and Pru­den­tBot. We will con­sider a sim­ple model where at each time step, each pop­u­la­tion grows a small amount pro­por­tional to the size of that pop­u­la­tion times the ex­pected util­ity a mem­ber in that pop­u­la­tion gets by play­ing a pris­on­ers’ dilemma with a ran­dom other agent. No­tice that in this model, a pop­u­la­tion that starts out nonzero will re­main nonzero, and a pop­u­la­tion that starts out 0 will re­main 0.

Since the above three Nash equil­ibria were sym­met­ric Nash equil­ibria, they are also equil­ibria of this evolu­tion­ary sys­tem. This is be­cause the ex­pected util­ity of be­ing each type of agent that ap­pears any nonzero amount is the same. Since this sys­tem keeps pop­u­la­tions that start at 0 at 0, there are other equil­ibria too; for ex­am­ple, any point where all the agents are the same is in equil­ibrium since it can­not change.

We can then ask about the sta­bil­ity of these equil­ibria. The pure defect one is sta­ble, since if there are only a small num­ber of agents that are not Defec­tBots, they won’t play each other enough to can­cel out their mod­el­ing penalty. The Defec­tBot, FairBot one is un­sta­ble, since the more FairBots there are, the bet­ter the FairBots do. Un­for­tu­nately, the good equil­ibrium is also un­sta­ble. This is harder to see, but a small per­tur­ba­tion will cause a very slow spiral­ing out­ward. (I checked this with simu­la­tion, but did not prove it math­e­mat­i­cally.)

How­ever, I ran a simu­la­tion that seemed to show that for a small , and for al­most all ini­tial con­di­tions that mix be­tween all four types of agents, the sys­tem does not con­verge, but in­stead goes in a large cy­cle, in which the sys­tem spends most of its time with al­most all FairBots. Even­tu­ally a very small num­ber of Co­op­er­ateBots climbs out to be­come non-neg­ligible, then as soon as the Co­op­er­ateBots are a sig­nifi­cant pro­por­tion, the Pru­den­tBots come in to ex­ploit them, then the Pru­den­tBots quickly turn into FairBots, and the cy­cle re­peats. Mean­while, the Defec­tBots are pushed down into a very small pro­por­tion of the pop­u­la­tion.

I also looked at a sec­ond model as fol­lows: The pop­u­la­tion starts with a small finite num­ber of agents of each type. At each time step, a new agent en­ters, and per­ma­nently be­comes the modal agent that max­i­mizes ex­pected util­ity against the ex­ist­ing pop­u­la­tion. In this simu­la­tion, un­less you start with very few FairBots or Pru­den­tBots, you will sim­ply switch be­tween adding FairBots, Pru­den­tBots, and Co­op­er­ateBots, never adding a new Defec­tBot.

A more care­ful study could ac­tu­ally find what the dy­nam­ics are math­e­mat­i­cally rather than de­pend­ing on simu­la­tions, and can con­sider sys­tems with more or even all modal agents. There also might be other mod­els that are more jus­tified than the one I used. I ex­pect that most ways of do­ing it will re­sult in very lit­tle defec­tion.


I like this re­sult. I just as­sumed that Defec­tBot would be the only Nash equil­ibrium, and I was sur­prised that the story had a much hap­pier end­ing. When I first be­came sus­pi­cious, I was think­ing that you could get a good equil­ibrium out of an in­finite col­lec­tion of differ­ent agents, but it turns out you only need three. You can view the Co­op­er­ateBots as defect­ing against the FairBots by re­fus­ing to pay to pun­ish bad ac­tors, and you can view the FairBots as defect­ing against the Pru­den­tBots by re­fus­ing to pay to pun­ish the non-pun­ish­ers. You might think you would need to pun­ish ar­bi­trary lev­els on non-pun­ish­ers to be in equil­ibrium, but it turns out that the Pru­den­tBots can get paid ex­actly enough to re­main com­pet­i­tive by ex­ploit­ing the Co­op­er­ateBots, and the Co­op­er­ateBots can get ex­ploited ex­actly enough to can­cel out their un­will­ing­ness to model the op­po­nent.