Pris­on­ers’ Di­lemma with Costs to Modeling

We con­sider a modi­fic­a­tion to the open source pris­on­ers’ di­lemma in which agents must pay some re­sources to model each other. We will use the modal com­bat frame­work, but where agents pay a cost pro­por­tional to the depth of boxes in their code. Even a small mod­el­ing pen­alty makes the FairBot-FairBot out­come no longer an equi­lib­rium, since the best re­sponse to FairBot is to be Co­op­er­ateBot and not pay the mod­el­ing pen­alty. The best re­sponse to Co­op­er­ateBot is to be De­fectBot, and the pure De­fectBot-De­fectBot out­come is a stable Nash equi­lib­rium. In fact, I be­lieve that De­fectBot-De­fectBot is the unique pure strategy Nash equi­lib­rium.

Amaz­ingly, this turns out to be okay! For small mod­el­ing pen­al­ties, there is a mixed strategy equi­lib­rium which mixes between Co­op­er­ateBot, FairBot, and PrudentBot! Both play­ers get ex­actly the same util­ity in ex­pect­a­tion as the FairBot-FairBot out­come.

Fur­ther, if you con­sider an evol­u­tion­ary sys­tem where pop­u­la­tions re­pro­duce in pro­por­tion to how well they do in pris­on­ers’ di­lem­mas with each other, it ap­pears that as the mod­el­ing pen­alty gets small, the basin of the de­fect equi­lib­rium also gets small, and nearly all ini­tial con­di­tions cycle around Co­op­er­ateBot, FairBot, and PrudentBot!

This post came out of con­ver­sa­tions with Sam Ei­s­en­stat, Abram Dem­ski, Tsvi Ben­son-Tilsen, and Andrew Critch. It is a first draft that could use a coau­thor to care­fully check everything, ex­pand on it, and turn it into a pa­per. If you think you could do that with min­imal guid­ance from me, let me know.


We will be us­ing the modal com­bat frame­work, and identi­fy­ing with co­oper­a­tion and with de­fec­tion. Agents are defined to for­mu­las that com­bine the other agent run on vari­ous agents us­ing pro­pos­i­tional cal­cu­lus and a modal op­er­ator . The rep­res­ents prov­ab­il­ity, and every in­stance of run on an agent in the for­mula must be con­tained within a . Recall some com­mon modal agents:

Co­op­er­ateBot is defined by .

De­fectBot is defined by .

FairBot is defined by .

PrudentBot is defined by .

These 4 agents in­ter­act with each other as fol­lows: Co­op­er­ateBot co­oper­ates with every­one. De­fectBot de­fects against every­one. FairBot de­fects against only De­fectBot. PrudentBot de­fects against Co­op­er­ateBot and De­fectBot and co­oper­ates with it­self and FairBot.

We will say that the depth of an agent is the max­imum of the depth of s in its code and the depth of the agents that it calls the op­pon­ent on. Co­op­er­ateBot and De­fectBot have depth 0, FairBot has depth 1, and PrudentBot has depth 2.

We will use a pris­oner’s di­lemma where mu­tual co­oper­a­tion pro­duces util­ity 2, mu­tual de­fic­tion pro­duces util­ity 1, and ex­ploit­a­tion pro­duces util­ity 3 for the ex­ploiter and 0 for the ex­ploited. Each player will also pay a pen­alty of times its depth.

Pure Equilibria

The best re­sponse to both Co­op­er­ateBot and De­fectBot is De­fectBot, since when the op­pon­ent does not de­pend on you, you want to de­fect with the least pos­sible pen­alty.

The best re­sponse to FairBot is Co­op­er­ateBot, since you can’t ex­ploit FairBot, so you want to get mu­tual co­oper­a­tion with the least pos­sible pen­alty.

The best re­sponse to PrudentBot is FairBot, since you can’t ex­ploit PrudentBot, you can’t mu­tu­ally co­oper­ate with pen­alty 0, but you can mu­tu­ally co­oper­ate with pen­alty 1 by be­ing FairBot. (This is as­sum­ing is at less than . Other­wise, you just want to de­fect to avoid the pen­alty.)

Thus, if the only op­tions are Co­op­er­ateBot, De­fectBot, FairBot, and PrudentBot, the unique pure strategy equi­lib­rium is mu­tual De­fectBot.

I be­lieve that De­fectBot is the only pure strategy equi­lib­rium in gen­eral. This would fol­low dir­ectly from the fact that if a depth agent co­oper­ates with an­other depth agent , then there ex­ists a depth agent which also co­oper­ates with. I be­lieve this is true, but haven’t proven it yet. If it turns out to be false, there might be a simple modi­fic­a­tion of the pen­al­ties that makes it true.

Mixed Equilibria

First, let’s look at Mixed Equi­lib­ria in which the only op­tions are the four modal agents above.

Let , , , and be the prob­ab­il­it­ies with which player is De­fectBot, Co­op­er­ateBot, FairBot, and PrudentBot re­spect­ively.

The util­ity of play­ing De­fectBot is , where is the other player.

The util­ity of play­ing Co­op­er­ateBot is , where is the other player.

The util­ity of play­ing FairBot is , where is the other player.

The util­ity of play­ing PrudentBot is , where is the other player.

Note that if one player is in­dif­fer­ent between Co­op­er­ateBot and De­fectBot, this means the other player plays FairBot ex­actly of the time, but then the first player would be bet­ter off play­ing PrudentBot (as long as ). Thus no player can ever ran­dom­ize between Co­op­er­ateBot and De­fectBot.

If a player doesn’t play Co­op­er­ateBot at all, the other player can’t play Prudent Bot, since FairBot is strictly bet­ter. Thus, if both play­ers don’t play Co­op­er­ateBot, they both play only De­fectBot and FairBot. If this isn’t the pure de­fect solu­tion, it must be be­cause and for both play­ers, which is in­deed a (not very good) Nash equi­lib­rium.

If ex­actly one player plays Co­op­er­ateBot at all, the first player mixes between Co­op­er­ateBot and FairBot, while the other player mixes between (at most) De­fectBot, FairBot and PrudentBot. The second player can’t play FairBot, since Co­op­er­ateBot would have done strictly bet­ter. Thus the first player gets 0 util­ity for play­ing Co­op­er­ateBot, con­tra­dict­ing the fact that it is im­possible to get 0 ex­pec­ted util­ity play­ing FairBot.

Fin­ally, we have the case where both play­ers play Co­op­er­ateBot with some prob­ab­il­ity, and thus neither plays De­fectBot at all. If either player did not play FairBot at all, then De­fectBot would strictly dom­in­ate Co­op­er­ateBot for the other player, con­tra­dict­ing the fact that both play­ers play Co­op­er­ateBot. If either player did not play PrudentBot at all, Co­op­er­ateBot would strictly dom­in­ate FairBot, con­tra­dict­ing the fact that both play­ers play FairBot.

Thus, in the only re­main­ing equi­lib­ria, both play­ers mix between all of Co­op­er­ateBot, FairBot, and PrudentBot. Since both play­ers are in­dif­fer­ent between Co­op­er­ateBot and FairBot, both play­ers must play PrudentBot with prob­ab­il­ity ex­actly . Since both play­ers are in­dif­fer­ent between FairBot and PrudentBot, both play­ers must play Co­op­er­ateBot with prob­ab­il­ity ex­actly . This leaves prob­ab­il­ity for FairBot, and the res­ult is in­deed a Nash equi­lib­rium.

We have a total of three Nash equi­lib­ria. All three are sym­met­ric. The first two are bad and both play­ers get the ex­pec­ted util­ity . The last one is good and both play­ers get ex­pec­ted util­ity .

Next, we have to show that this out­come is also an equi­lib­rium in the game where both play­ers can play any modal agent. If an­other agent were to get util­ity more than against this PrudentBot, Co­op­er­ateBot, FairBot com­bin­a­tion, it would clearly have to be depth , since the only depth agents are Co­op­er­ateBot and De­fectBot, and depth PrudentBot already has the per­fect be­ha­vior against these 3 agents. It would also have to mu­tu­ally co­oper­ate with FairBot, and de­fect against Co­op­er­ateBot.

Sup­pose a depth agent prov­ably co­oper­ates with FairBot and de­fects against Co­op­er­ateBot. Note that since is depth , knows what is, and so knows that . However also know that . Thus knows , so knows , con­tra­dic­tion. There­fore, no agent of depth can have the de­sired be­ha­vior, and our good Nash equi­lib­rium is in fact a Nash equi­lib­rium of the game where both play­ers can play any modal agent.

Evolu­tion­ary Simulations

Next, we will con­sider what hap­pens when a large pop­u­la­tion of modal agents evolves by play­ing pris­on­ers’ di­lem­mas. We will con­sider a sys­tem con­sist­ing only of De­fectBot, Co­op­er­ateBot, FairBot, and PrudentBot. We will con­sider a simple model where at each time step, each pop­u­la­tion grows a small amount pro­por­tional to the size of that pop­u­la­tion times the ex­pec­ted util­ity a mem­ber in that pop­u­la­tion gets by play­ing a pris­on­ers’ di­lemma with a ran­dom other agent. Notice that in this model, a pop­u­la­tion that starts out nonzero will re­main nonzero, and a pop­u­la­tion that starts out 0 will re­main 0.

Since the above three Nash equi­lib­ria were sym­met­ric Nash equi­lib­ria, they are also equi­lib­ria of this evol­u­tion­ary sys­tem. This is be­cause the ex­pec­ted util­ity of be­ing each type of agent that ap­pears any nonzero amount is the same. Since this sys­tem keeps pop­u­la­tions that start at 0 at 0, there are other equi­lib­ria too; for ex­ample, any point where all the agents are the same is in equi­lib­rium since it can­not change.

We can then ask about the sta­bil­ity of these equi­lib­ria. The pure de­fect one is stable, since if there are only a small num­ber of agents that are not De­fectBots, they won’t play each other enough to can­cel out their mod­el­ing pen­alty. The De­fectBot, FairBot one is un­stable, since the more FairBots there are, the bet­ter the FairBots do. Un­for­tu­nately, the good equi­lib­rium is also un­stable. This is harder to see, but a small per­turb­a­tion will cause a very slow spiral­ing out­ward. (I checked this with sim­u­la­tion, but did not prove it math­em­at­ic­ally.)

However, I ran a sim­u­la­tion that seemed to show that for a small , and for al­most all ini­tial con­di­tions that mix between all four types of agents, the sys­tem does not con­verge, but in­stead goes in a large cycle, in which the sys­tem spends most of its time with al­most all FairBots. Even­tu­ally a very small num­ber of Co­op­er­ateBots climbs out to be­come non-neg­li­gible, then as soon as the Co­op­er­ateBots are a sig­ni­fic­ant pro­por­tion, the PrudentBots come in to ex­ploit them, then the PrudentBots quickly turn into FairBots, and the cycle re­peats. Mean­while, the De­fectBots are pushed down into a very small pro­por­tion of the pop­u­la­tion.

I also looked at a second model as fol­lows: The pop­u­la­tion starts with a small fi­nite num­ber of agents of each type. At each time step, a new agent enters, and per­man­ently be­comes the modal agent that max­im­izes ex­pec­ted util­ity against the ex­ist­ing pop­u­la­tion. In this sim­u­la­tion, un­less you start with very few FairBots or PrudentBots, you will simply switch between adding FairBots, PrudentBots, and Co­op­er­ateBots, never adding a new De­fectBot.

A more care­ful study could ac­tu­ally find what the dy­nam­ics are math­em­at­ic­ally rather than de­pend­ing on sim­u­la­tions, and can con­sider sys­tems with more or even all modal agents. There also might be other mod­els that are more jus­ti­fied than the one I used. I ex­pect that most ways of do­ing it will res­ult in very little de­fec­tion.


I like this res­ult. I just as­sumed that De­fectBot would be the only Nash equi­lib­rium, and I was sur­prised that the story had a much hap­pier end­ing. When I first be­came sus­pi­cious, I was think­ing that you could get a good equi­lib­rium out of an in­fin­ite col­lec­tion of dif­fer­ent agents, but it turns out you only need three. You can view the Co­op­er­ateBots as de­fect­ing against the FairBots by re­fus­ing to pay to pun­ish bad act­ors, and you can view the FairBots as de­fect­ing against the PrudentBots by re­fus­ing to pay to pun­ish the non-pun­ish­ers. You might think you would need to pun­ish ar­bit­rary levels on non-pun­ish­ers to be in equi­lib­rium, but it turns out that the PrudentBots can get paid ex­actly enough to re­main com­pet­it­ive by ex­ploit­ing the Co­op­er­ateBots, and the Co­op­er­ateBots can get ex­ploited ex­actly enough to can­cel out their un­will­ing­ness to model the op­pon­ent.