Prisoner’s Dilemma Tournament Results

About two weeks ago I an­nounced an open com­pe­ti­tion for LessWrong read­ers in­spired by Robert Ax­elrod’s fa­mous tour­na­ments. The com­peti­tors had to sub­mit a strat­egy which would play an iter­ated pris­oner’s dilemma of fixed length: first in the round-robin tour­na­ment where the strat­egy plays a hun­dred-turn match against each of its com­peti­tors ex­actly once, and sec­ond in the evolu­tion­ary tour­na­ment where the strate­gies are ran­domly paired against each other and their gain is trans­lated in num­ber of their copies pre­sent in next gen­er­a­tion; the strat­egy with the high­est num­ber of copies af­ter gen­er­a­tion 100 wins. More de­tails about the rules were de­scribed in the an­nounce­ment. This post sum­marises the re­sults.

The Zoo of Strategies

I have re­ceived 25 con­test en­tries con­tain­ing 21 dis­tinct strate­gies. Those I have di­vided into six classes based on su­perfi­cial similar­i­ties (ex­cept the last class, which is a catch-all cat­e­gory for ev­ery­thing which doesn’t be­long any­where else, some­thing like ad­verbs within the clas­sifi­ca­tion of parts of speech or now de­funct ver­mes in the an­i­mal king­dom). The first class is formed by Tit-for-tat var­i­ants, prob­a­bly the most ob­vi­ous choice for a po­ten­tially suc­cess­ful strat­egy. Ap­par­ently so ob­vi­ous that at least one com­menter de­clared high con­fi­dence that tit-for-tat will make more than half of the strat­egy pool. That was ac­tu­ally a good ex­am­ple of mis­placed con­fi­dence, since the num­ber of re­ceived tit-for-tat var­i­ants (where I put any­thing which be­haves like tit-for-tat ex­cept for iso­lated de­vi­a­tions) was only six, two of them be­ing iden­ti­cal and thus counted as one. More­over there wasn’t a sin­gle true tit-for-tat­ter among the con­tes­tants; the clos­est we got was

A (-, -): On the first turn of each match, co­op­er­ate. On ev­ery other turn, with prob­a­bil­ity 0.0000004839, co­op­er­ate; oth­er­wise play the move that the op­po­nent played on the im­me­di­ately pre­ced­ing turn.

(In the pre­sen­ta­tion of strate­gies, the let­ter in bold serves as a unique iden­ti­fi­ca­tor. The fol­low­ing paren­the­ses in­clude the name of the strat­egy — if the au­thor has pro­vided one — and the name of the au­thor. I use the au­thor’s origi­nal de­scrip­tion of the strat­egy when pos­si­ble. If that’s too long, an ab­bre­vi­ated para­phrase is given. If I found the origi­nal de­scrip­tion am­bigu­ous, I may give a slightly re­for­mu­lated ver­sion based on sub­se­quent clar­ifi­ca­tions with the au­thor.) The au­thor of A was the only one who re­quested his/​her name should be with­held and the strat­egy is name­less, so both ar­gu­ments in the bracket are empty. The rea­son for the ob­scure prob­a­bil­ity was to make the strat­egy unique. The au­thor says:

I wish to en­ter a triv­ial vari­a­tion on the tit-for-tat strat­egy. (The triv­ial vari­a­tion is to force the strat­egy to be unique; I wish to pun­ish defec­tor­ish strate­gies by hav­ing lots of tit-for-tat-style strate­gies in the pool.)

This was per­haps a slight abuse of rules, but since I am re­spon­si­ble for failing to make the rules im­mune to abuse, I had to ac­cept the strat­egy as it is. Any­way, it turned out that the triv­ial vari­a­tion was need­less for the stated pur­pose.

The re­main­ing strate­gies from this class were more or less stan­dard with B be­ing the most ob­vi­ous choice.

B (-, Alexei): Tit-for-Tat, but always defect on last turn.

C (-, Caer­bannog): Tit-for-tat with 20% chance of for­giv­ing af­ter op­po­nent’s defec­tion. Defect on the last turn.

D (-, fubarobfusco and Dun­canS): Tit-for-tat with 10% chance of for­giv­ing.

E (-, Jem): First two turns co­op­er­ate. Later tit-for-tat with chance of for­giv­ing equal to 12x where x is equal to num­ber of op­po­nent’s defec­tions af­ter own co­op­er­a­tions. Last turn defect.

The next cat­e­gory of strate­gies I call Avengers. The Avengers play a nice strat­egy un­til the op­po­nent’s defec­tive be­havi­our reaches a speci­fied thresh­old. After that they switch to ir­re­vo­ca­ble defec­tion.

F (-, Eug­ine_Nier): Stan­dard Tit-for-Tat with the fol­low­ing mod­ifi­ca­tions: 1) Always defect on the last move. 2) Once the other player defects 5 times, switch to all defect.

G (-, rwal­lace): 1. Start with vanilla tit-for-tat. 2. When the op­po­nent has defected a to­tal of three times this match, switch to always defect for the rest of the match. 3. If the num­ber of rounds is known and fixed, defect on the last round.

H (-, Michaelos): Tit for Tat, with two ex­cep­tions: 1: Ig­nore all other con­sid­er­a­tions and always defect on the fi­nal iter­a­tion. 2: After own defec­tion and op­po­nent’s co­op­er­a­tion, 50 per­cent of the time, co­op­er­ate. The other 50 per­cent of the time, always defect for the rest of the game.

I (-, malthrin): if Op­po­nen­tDefect­edBe­fore(7) then MyMove=D else if n>98 then MyMove=D else MyMove=OpponentLastMove

J (Venge­ful Cheater, Vaniver): Set Rage = 0. Turn 1: co­op­er­ate. Turn 2-99: If the op­po­nent defected last round, set Rage to 1. (If they co­op­er­ate, no change.) If Rage is 1, defect. Else, co­op­er­ate. Turn 100: defect.

K (Grim Trig­ger, shinoteki): Co­op­er­ate if and only if the op­po­nent has not defected in the cur­rent match.

Then we have a spe­cial “class” of Defec­tBots (a name stolen from Orthonor­mal’s ex­cel­lent ar­ti­cle) con­sist­ing of only one sin­gle strat­egy. But this is by far the most pop­u­lar strat­egy — I have re­ceived it in four copies. Un­for­tu­nately for Defec­tBots, ac­cord­ing to the rules they have to be in­cluded in the pool only once. The pe­cu­liar name for the sin­gle Defec­tBot comes from wedrifid:

L (Fuck them all!, MileyCyrus and wedrifid and vespiacic and pe­ter_hur­ford): Defect.

Then, we have a rather het­eroge­nous class of Lists, whose only com­mon thing is that their play is speci­fied by a rather long list of in­struc­tions.

M (-, Je­susPetry): Turn 1: co­op­er­ate. Turns 2 to 21: tit-for-tat. Turn 22: defect. Turn 23: co­op­er­ate. Turn 24: if op­po­nent co­op­er­ated in turns 21 and 22, co­op­er­ate; oth­er­wise, do what­ever the op­po­nent did on pre­vi­ous turn. Then tit-for-tat, the sce­nario from turns 22-24 re­peats it­self in turns 35-37, 57-59, 73-75. Turns 99 and 100: defect.

N (-, FAWS): 1. Co­op­er­ate on the first turn. 2. If the op­po­nent has defected at least 3 times: Defect for the rest of the match. 3. Play tit for tat un­less speci­fied oth­er­wise. 4. If the op­po­nent has co­op­er­ated on the first 20 turns: Ran­domly choose a turn be­tween 21 and 30. Other­wise con­tinue with 11. 5. If the op­po­nent defects af­ter turn 20, but be­fore the cho­sen turn comes up: Play CDC the next three turns, then con­tinue with 12. 6. Defect on the cho­sen turn if the op­po­nent hasn’t defected be­fore. 7. If the op­po­nent defects on the same turn: Co­op­er­ate, then con­tinue with 12. 8. If the op­po­nent defects on the next turn: Co­op­er­ate, then con­tinue with 11. 9. If the op­po­nent defects on the turn af­ter that (i. e. the sec­ond turn af­ter we defected): Co­op­er­ate, then con­tinue with 12. 10. If the op­po­nent co­op­er­ates on ev­ery turn up to and in­clud­ing two turns af­ter the defec­tion: Defect un­til the op­po­nent defects, then co­op­er­ate twice and con­tinue with 11. 11. Defect on the last two turns (99 and 100). 12. Defect on the last two turns un­less the op­po­nent defected ex­actly once.

O (Se­cond Chance, benel­liott): Se­cond Chance is defined by 5 rules. When more than one rule ap­plies to the cur­rent situ­a­tion, always defer to the one with the lower num­ber. Rules: 1. Co-op­er­ate on move 1, defect on moves 98, 99 and 100. 2. If Se­cond Chance has co-op­er­ated at least 4 times (ex­clud­ing the most re­cent move) and in ev­ery case co-op­er­a­tion has been fol­lowed by defec­tion from the other strat­egy, then always defect from now on. 3. If Se­cond Chance has co-op­er­ated at least 8 times, and defected at least 10 times (ex­clud­ing the pre­vi­ous move), then calcu­late x, the pro­por­tion of the time that co-op­er­a­tion has been fol­lowed by co-op­er­a­tion and y, the pro­por­tion of the time that defec­tion has been fol­lowed by co-op­er­a­tion. If 4x < 6y + 1 then defect un­til that changes. 4. If the num­ber of times the op­pos­ing strat­egy has defected (in­clud­ing the most re­cent move) is a mul­ti­ple of 4, co-op­er­ate. 5. Do what­ever the op­pos­ing strat­egy did on the pre­vi­ous move.

The next class are CliqueBots (once more a name from the Orthonor­mal’s post). CliqueBots try to iden­tify whether their op­po­nent is a copy of them­selves and then co­op­er­ate. If they iden­tify a for­eign op­po­nent, they usu­ally be­come pretty nasty.

P (Sim­ple Iden­tity ChecK, red75): Move 1: co­op­er­ate. Moves 2 to 57: tit-for-tat. Move 58 - defect. Move 59 - if moves 0-57 were coop-coop and 58 was defect-defect then coop else defect. Moves 60-100: if my move 59 was coop then tit-for-tat else defect.

Q (EvilAlli­ance, ArisKat­saris): Start with 5 defec­tions. On later turns, if the op­po­nent had co­op­er­ated at least once in the first 5 turns or defected ever since, defect; else co­op­er­ate.

The re­main­ing strate­gies are Un­clas­sified, since they have lit­tle in com­mon. Here they come:

R (Probe & Pu­n­ish, Eneasz): 1. Co­op­er­ate for a turn. 2. Co­op­er­ate for a sec­ond turn. If op­po­nent co­op­er­ated on the sec­ond turn, con­tinue to co­op­er­ate in ev­ery sub­se­quent turn un­til the op­po­nent defects. 3. If/​when the op­po­nent defects, defect for 12 con­sec­u­tive rounds. 4. Re­turn to 1, re­peat.

S (Win-stay lose-shift, Nerzhin): It co­op­er­ates if and only if it and the op­po­nent both played the same strat­egy on the last round.

The au­thor of S says he’s adopted the strat­egy from Nowak and Sig­mund, Na­ture 364 pp. 56-58.

T (Tit for Two Tats, DataPacRat): As the name sug­gests.

U (Rack­Block­Shooter, Nic_Smith): The code is available here.

This one is a real beast, with code more that twice as long as that of its com­peti­tors. It ac­tu­ally mod­els the op­po­nent’s be­havi­our as a ran­dom pro­cess speci­fied by two pa­ram­e­ters: prob­a­bil­ity of co­op­er­a­tion when the sec­ond strat­egy (which is the Rack­Block­Shooter) also co­op­er­ates, and prob­a­bil­ity of co­op­er­a­tion when RBS defects. This was acausal: the pa­ram­e­ters de­pend on RBS’s be­havi­our in the same turn, not in the pre­ced­ing one. More­over, RBS stores whole prob­a­bil­ity dis­tri­bu­tions of those pa­ram­e­ters and cor­rectly up­dates them in an or­tho­dox Bayesian man­ner af­ter each turn. To fur­ther com­pli­cate things, if RBS thinks that the op­po­nent is it­self, it co­op­er­ates in the CliqueBot­tish style.

And fi­nally, there’s a strat­egy in­cluded by de­fault:

Z (Fully Ran­dom, de­fault): Co­op­er­ate with 50% prob­a­bil­ity.

The Round-Robin Tournament

Each strat­egy had played 21 matches where a to­tal max­i­mum of 14,700 points could the­o­ret­i­cally be won. A more rea­son­able thresh­old was 8,400 points: an award for any mem­ber of a pool con­sist­ing solely of nice strate­gies (i.e. those which never defect first). No strat­egy reached this op­ti­mum. The stand­ings are given in the fol­low­ing table (columns from left: strat­egy iden­ti­fier, matches won, matches drawn, matches lost, to­tal points).

The win­ner is malthrin’s un­named strat­egy. Se­cond and third place are oc­cu­pied by Eug­ine_Nier’s mod­ified tit-for-tat and benel­liott’s Se­cond Chance, re­spec­tively. Con­grat­u­la­tions!

Here are re­sults of all matches (view the pic­ture in full size by right-click­ing).

Is there some­thing in­ter­est­ing to say about the re­sults? Well, there were three strate­gies which performed worse than ran­dom: the Defec­tBot L, the CliqueBot Q (which effec­tively acted like a Defec­tBot) and the com­plex strat­egy U. As ex­pected, very nasty strate­gies were un­suc­cess­ful. On the other hand, to­tally nice strate­gies weren’t much suc­cess­ful ei­ther. No mem­ber of the set {A, D, K, R, S, T} got into the first five. Not defect­ing on the last turn was a mis­take, defect­ing on the last two turns seemed even a bet­ter choice. As for the some­what ar­bi­trary classes, the av­er­age gains were as such:

  1. Tit-for-tat var­i­ants 7197.4

  2. Avengers 7144.3

  3. Lists 6551.3

  4. Un­clas­sified 5963.8

  5. CliqueBots 4371.5

  6. Defec­tBots 3174.0

Another prob­a­bly ob­vi­ous point is that in a non-zero sum game it doesn’t mat­ter too much how many op­po­nents you beat in di­rect con­fronta­tion. The only un­defeated strate­gies that won all their matches ex­cept one draw, L and Q, finished last.

The Evolu­tion­ary1 Tournament

The ini­tial pop­u­la­tion con­sisted of 1,980 strate­gies, 90 of each kind. But that didn’t last for long. The very nasty strate­gies started to die out rapidly and by gen­er­a­tion 6 there were no rep­re­sen­tants of L, Q, U and Z. (I have to say I was very glad to see U ex­tinct so early be­cause with non-neg­ligible num­ber of Us in the pool the simu­la­tion run very, very slowly.) By gen­er­a­tion 40, M, N and P were also gone. Un­for­tu­nately, 100 was too low as a num­ber of gen­er­a­tions to see any in­ter­est­ing effects as­so­ci­ated with en­vi­ron­men­tal change as differ­ent strate­gies be­come ex­tinct (but see the next sec­tion for a longer simu­la­tion); the strate­gies suc­cess­ful in the round-robin tour­na­ment were gen­er­ally suc­cess­ful here as well.

The graph shows the his­tory of sub­pop­u­la­tions (TfT var­i­ants are blue, Avengers red, Lists green, CliqueBots pur­ple, Un­clas­sified strate­gies yel­low, the Defec­tBot gray and Ful­lyRan­dom is black). The fi­nal stand­ings (or­derd by num­ber of copies in gen­er­a­tion 100) are sum­marised in the fol­low­ing table.

The gold and silver medals are once again awarded to malthrin and Eug­ine_Nier. There is a change at the third place which is now oc­cu­pied by Caer­bannog’s 20%-for­giv­ing tit-for-tat.

One re­mark­able fact is the ab­solute failure of CliqueBots which were with­out doubt de­signed to suc­ceed in the evolu­tion­ary tour­na­ment. The idea was, per­haps, that CliqueBots win by co­op­er­at­ing among them­selves and pun­ish the oth­ers by defect­ing. That may be a work­ing meta-strat­egy once you are dom­i­nant in the pop­u­la­tion, but for play­ers start­ing as small minor­ity it is suici­dal. It would be a pretty bad idea even if most con­tes­tants were CliqueBots: while TfT var­i­ants gen­er­ally co­op­er­ate with each other re­gard­less of slight differ­ences be­tween their source codes, CliqueBots are los­ing points in defec­tion­ful matches even against other CliqueBots ex­cept those of their own species.

The Con­trol Group

I run the same ex­per­i­ment with read­ers of my blog and thus I have a con­trol group to match against the LW com­peti­tors. The com­par­i­son is done sim­ply by putting all strate­gies to­gether in an even larger tour­na­ment. Be­fore show­ing the re­sults, let me in­tro­duce the con­trol group strate­gies.

C1: Let p be the rel­a­tive fre­quency of op­po­nent’s co­op­er­a­tions (af­ter my co­op­er­a­tion if I have co­op­er­ated last turn /​ af­ter my defec­tion if I have defected). If the rel­a­tive fre­quency can’t be calcu­lated, let p = 12. Let Ec = 5p − 7 and Ed = 6p − 6. Co­op­er­ate with prob­a­bil­ity exp Ec /​ (exp Ec + exp Ed).

This strat­egy was in many re­spects similar to strat­egy U: math­e­mat­i­cally el­e­gant, but very un­suc­cess­ful.

C2: Defect if and only if the op­po­nent had defected more than twice or if it defected on the first turn.

C3: If the op­po­nent played the same move on the last and last but one turn, play that. Else, play the op­po­site what I have played last turn. On turns 1 and 2 play defect, co­op­er­ate.

C4: Co­op­er­ate on the first three turns, defect on the last two. Else, co­op­er­ate if the op­po­nent has co­op­er­ated at least 85% of the time.

C5: Co­op­er­ate on the first three turns, defect on the last turn. Else, co­op­er­ate with prob­a­bil­ity equal to the pro­por­tion of op­po­nent’s co­op­er­a­tions.

C6: Tit-for-tat, but co­op­er­ate af­ter the op­po­nent’s first defec­tion.

C7: On the first turn play ran­domly, on turns 2n and 2n+1 play what the op­po­nent played on turn n.

C8: Defect, if the op­po­nent defected on two out of three pre­ced­ing turns, or if I have defected on the last turn and co­op­er­ated on the last but one turn, or if I haven’t defected in pre­ced­ing 10 turns. On turns 20, 40, 60 and 80 co­op­er­ate always.

C9: Have two sub­strate­gies: 1. tit for two tats and 2. defect af­ter op­po­nent’s first defec­tion. Be­gin with 1. After each tenth turn change the sub­strat­egy if the gain in the last ten turns was be­tween 16 and 34 points. If the sub­strat­egy is changed, re­set all defec­tion coun­ters.

C10: Turns 1 and 2: co­op­er­ate. Turns 3-29: tit for tat. Turns 30-84: defect if the op­po­nent defected more than seven times, else tit for tat. Turn 85: defect. Turns 86 and 87: defect if the op­po­nent defected more than seven times, else co­op­er­ate. Turns 88-97: If the op­po­nent defected on turn 87 or more than four times dur­ing the match, defect. Else co­op­er­ate. Turns 98-100: defect.

C11: Turn 1: co­op­er­ate. Turn 2: tit for tat. Turn 100: defect. Else, if the op­po­nent defected on the last and (I have co­op­er­ated or the op­po­nent has defected on the last but one turn) defect, else co­op­er­ate.

Two round-robin tour­na­ment stand­ings are given in the fol­low­ing ta­bles (two tour­na­ments were played to illus­trate the rôle of ran­dom­ness).

You can see how the new strate­gies had changed the stand­ings. The origi­nal win­ner, I, now hov­ers around rather un­re­mark­able 15th po­si­tion, 10th among LW strate­gies. The av­er­age score for the LW strate­gies (calcu­lated from the sec­ond tour­na­ment) is 9702.9, for the con­trol group only 9296.9. How much this means that read­ing LW im­proves game-the­o­ret­i­cal skills I leave to the read­ers’ judge­ment.

In the evolu­tion­ary set­ting the strat­egy I re­cov­ered back its lead­ing po­si­tion. Here are the 100th gen­er­a­tion pop­u­la­tions and the graph:

(In the graph, LW strate­gies are green and the con­trol group strate­gies are red.)

It is rel­a­tively clear that the situ­a­tion af­ter hu­dred gen­er­a­tions is not equil­ibrium and fur­ther changes would en­sue if the simu­la­tion con­tinued. There­fore I have run one more simu­la­tion, this time last­ing for thou­sand of gen­er­a­tions. This was more in­ter­est­ing, since more strate­gies died out. The first was C10 which was gone af­ter gen­er­a­tion 125. Around gen­er­a­tion 300 there were only four sur­viv­ing strate­gies from the con­trol group (two of them moribund) and twelve from the origi­nal LW pool (two of them se­ri­ously en­dan­gered, too.) The lead­ers were I (with pop­u­la­tion 539), C4 (432) and C11 (183). The last one went through a steady slow de­cline and was soon over­taken by O, which as­sumed the third po­si­tion around gen­er­a­tion 420. At gen­er­a­tion 600, only three strate­gies re­mained in the pool: I (932), C4 (692) and O (374). Not long be­fore that the pop­u­la­tions of I and C4 peaked and started to de­cline, while O con­tinued to rise. In di­rect con­fronta­tion, O always beats both I and C4 397:390 which proved to be de­ci­sive when other strate­gies were elimi­nated. After the thou­sandth gen­er­a­tion, there were 1374 copies of O and only 625 copies of both its com­peti­tors com­bined.

Fi­nal Remarks

I have or­ganised the pris­oner’s dilemma tour­na­ment mainly for fun, but hope­fully the re­sults can illus­trate few facts about non-zero sum games. At least in the fi­nal simu­la­tion we can see how short-term suc­cess needn’t im­ply long-term sur­vival. The failure of CliqueBots shows that it isn’t much pru­dent to be too re­stric­tive about whom one co­op­er­ates with. There is one more les­son that I have learned: the strate­gies which I con­sid­ered most beau­tiful (U and C1) played poorly and were the first to be elimi­nated from the pool. Both U and C1 tried to ex­per­i­ment with the op­po­nent’s be­havi­our and use the re­sults to con­struct a work­ing model thereof. But that didn’t work in this set­ting: while those strate­gies were los­ing points ex­per­i­ment­ing, dumb me­chan­i­cal tit-for-tats were max­imis­ing their gain. There are situ­a­tions when the cost of ob­tain­ing knowl­edge is higher than the knowl­edge is worth, and this was one of such situ­a­tions. (Of course, both dis­cussed strate­gies could have used the knowl­edge they had more effi­ciently, but that wouldn’t sig­nifi­cantly help.)

I have de­liber­ately run fixed-length matches to make the com­pe­ti­tion more in­ter­est­ing. Ran­dom-length iter­ated PDs are do­main of tit-for-tats and it would be hard to de­vise a bet­ter strat­egy; in the fixed-length case the play­ers had to think about the cor­rect turn when to defect first when play­ing a nice strat­egy and the an­swer isn’t triv­ial. The most suc­cess­ful strate­gies started defect­ing on the 98th or 99th turns.

The ques­tion whether read­ing LessWrong im­proves perfor­mance in similar games re­mains un­solved as the ev­i­dence gath­ered in the tour­na­ment was very weak. The differ­ence be­tween the LW and con­trol groups wasn’t big. Other prob­lem is that I don’t know much about the con­trol group au­thors (ex­cept one of them whom I know per­son­ally); it’s not clear what part of pop­u­la­tion the con­trol group rep­re­sents. The judge­ment is more com­pli­cated by the ap­par­ent facts that not all par­ti­ci­pants were try­ing to win: the name­less au­thor of A has ex­plic­itly de­clared a differ­ent goal (namely, to pun­ish defec­tor­ish strate­gies) and it’s likely that L was in­tended to sig­nal au­thor’s cyn­i­cism (I am speci­fi­cally think­ing about one of the four Defec­tBot’s origi­na­tors) rather than to suc­ceed in the tour­na­ment. Fur­ther­more, many strate­gies were send by new users and only 11 out of 26 strat­egy au­thors have karma over 500, which means that the LW group doesn’t faith­fully rep­re­sent LessWrong ac­tive par­ti­ci­pants.

1 Evolu­tion­ary may be a mis­nomer. The tour­na­ment mod­els nat­u­ral se­lec­tion, but no changes and there­fore evolu­tion oc­curs. I have kept the des­ig­na­tion to main­tain con­sis­tency with the ear­lier post.