What counts as defection?

Thanks to Michael Den­nis for propos­ing the for­mal defi­ni­tion; to An­drew Critch for point­ing me in this di­rec­tion; to Abram Dem­ski for propos­ing non-nega­tive weight­ing; and to Alex Ap­pel, Scott Em­mons, Evan Hub­inger, philh, Ro­hin Shah, and Car­roll Wain­wright for their feed­back and ideas.

There’s a good chance I’d like to pub­lish this at some point as part of a larger work. How­ever, I wanted to make the work available now, in case that doesn’t hap­pen soon.

They can’t prove the con­spir­acy… But they could, if Steve runs his mouth.

The po­lice chief stares at you.

You stare at the table. You’d agreed (sworn!) to stay quiet. You’d even stud­ied game the­ory to­gether. But, you hadn’t un­der­stood what an ex­tra year of jail meant.

The po­lice chief stares at you.

Let Steve be the gullible ideal­ist. You have a fam­ily wait­ing for you.

Sun­light stretches across the valley, dap­pling the grass and warm­ing your bow. Your hand anx­iously runs along the bow­string. A dis­tant figure darts be­tween trees, and your stom­ach rum­bles. The day is near spent.

The stags run strong and free in this land. Carla should meet you there. Shouldn’t she? Who wants to live like a beg­gar, sub­sist­ing on scraps of lean rab­bit meat?

In your mind’s eye, you reach the stags, alone. You find one, and your ar­row pierces its bar­row. The beast bucks and bursts away; the rest of the herd fol­lows. You slump against the tree, ex­hausted, and never open your eyes again.

You can’t risk it.

Peo­ple talk about ‘defec­tion’ in so­cial dilemma games, from the pris­oner’s dilemma to stag hunt to chicken. In the tragedy of the com­mons, we talk about defec­tion. The con­cept has be­come a reg­u­lar part of LessWrong dis­course.

In­for­mal defi­ni­tion. A player defects when they in­crease their per­sonal pay­off at the ex­pense of the group.

This in­for­mal defi­ni­tion is no se­cret, be­ing echoed from the an­cient For­mal Models of Dilem­mas in So­cial De­ci­sion-Mak­ing to the re­cent Clas­sify­ing games like the Pri­soner’s Dilemma:

you can model the “defect” ac­tion as “take some value for your­self, but de­stroy value in the pro­cess”.

Given that the pris­oner’s dilemma is the bread and but­ter of game the­ory and of many parts of eco­nomics, evolu­tion­ary biol­ogy, and psy­chol­ogy, you might think that some­one had already for­mal­ized this. How­ever, to my knowl­edge, no one has.


Con­sider a finite -player nor­mal-form game, with player hav­ing pure ac­tion set and pay­off func­tion . Each player chooses a strat­egy (a dis­tri­bu­tion over ). To­gether, the strate­gies form a strat­egy pro­file . is the strat­egy pro­file, ex­clud­ing player ’s strat­egy. A pay­off pro­file con­tains the pay­offs for all play­ers un­der a given strat­egy pro­file.

A util­ity weight­ing is a set of non-nega­tive weights (as in Harsanyi’s util­i­tar­ian the­o­rem). You can con­sider the weights as quan­tify­ing each player’s con­tri­bu­tion; they might rep­re­sent a per­cieved so­cial agree­ment or be the ex­plicit re­sult of a bar­gain­ing pro­cess.

When all are equal, we’ll call that an equal weight­ing. How­ever, if there are “util­ity mon­sters”, we can down­weight them ac­cord­ingly.

We’re im­plic­itly as­sum­ing that pay­offs are com­pa­rable across play­ers. We want to in­ves­ti­gate: given a util­ity weight­ing, which ac­tions are defec­tions?

Defi­ni­tion. Player ’s ac­tion is a defec­tion against strat­egy pro­file and weight­ing if

  1. So­cial loss:

If such an ac­tion ex­ists for some player , strat­egy pro­file , and weight­ing, then we say that there is an op­por­tu­nity for defec­tion in the game.

Re­mark. For an equal weight­ing, con­di­tion (2) is equiv­a­lent to de­mand­ing that the ac­tion not be a Kal­dor-Hicks im­prove­ment.

Pay­off pro­files in the Pri­soner’s Dilemma. Red ar­rows rep­re­sent defec­tions against pure strat­egy pro­files; player 1 defects ver­ti­cally, while player 2 defects hori­zon­tally. For ex­am­ple, player 2 defects with be­cause they gain () but the weighted sum loses out ().

Our defi­ni­tion seems to make rea­son­able in­tu­itive sense. In the tragedy of the com­mons, each player ra­tio­nally in­creases their util­ity, while im­pos­ing nega­tive ex­ter­nal­ities on the other play­ers and de­creas­ing to­tal util­ity. A spy might leak clas­sified in­for­ma­tion, benefit­ing them­selves and Rus­sia but defect­ing against Amer­ica.

Defi­ni­tion. Co­op­er­a­tion takes place when a strat­egy pro­file is main­tained de­spite the op­por­tu­nity for defec­tion.

The­o­rem 1. In con­stant-sum games, there is no op­por­tu­nity for defec­tion against equal weight­ings.

The­o­rem 2. In com­mon-pay­off games (where all play­ers share the same pay­off func­tion), there is no op­por­tu­nity for defec­tion.

Propo­si­tion 3. There is no op­por­tu­nity for defec­tion against Nash equil­ibria.

An ac­tion is a Pareto im­prove­ment over strat­egy pro­file if, for all play­ers ,.

Propo­si­tion 4. Pareto im­prove­ments are never defec­tions.

Game Theorems

We can prove that for­mal defec­tion ex­ists in the trifecta of fa­mous games. Feel free to skip proofs if you aren’t in­ter­ested.

In (a), vari­ables stand for emp­ta­tion, eward, un­ish­ment, and ucker. A sym­met­ric game is a Pri­soner’s Dilemma when . Un­sur­pris­ingly, for­mal defec­tion is ev­ery­where in this game.

The­o­rem 5. In sym­met­ric games, if the Pri­soner’s Dilemma in­equal­ity is satis­fied, defec­tion can ex­ist against equal weight­ings.

Proof. Sup­pose the Pri­soner’s Dilemma in­equal­ity holds. Fur­ther sup­pose that . Then . Then since but , both play­ers defect from with .

Sup­pose in­stead that . Then , so . But , so player 1 defects from with ac­tion , and player 2 defects from with ac­tion . QED.

A sym­met­ric game is a Stag Hunt when . In Stag Hunts, due to un­cer­tainty about whether the other player will hunt stag, play­ers defect and fail to co­or­di­nate on the unique Pareto op­ti­mum . In (b), player 2 will defect (play ) when . In Stag Hunts, for­mal defec­tion can always oc­cur against mixed strat­egy pro­files, which lines up with defec­tion in this game be­ing due to un­cer­tainty.

The­o­rem 6. In sym­met­ric games, if the Stag Hunt in­equal­ity is satis­fied, defec­tion can ex­ist against equal weight­ings.

Proof. Sup­pose that the Stag Hunt in­equal­ity is satis­fied. Let be the prob­a­bil­ity that player 1 plays . We now show that player 2 can always defect against strat­egy pro­file for some value of .

For defec­tion’s first con­di­tion, we de­ter­mine when :

This de­nom­i­na­tor is pos­i­tive ( and ), as is the nu­mer­a­tor. The frac­tion clearly falls in the open in­ter­val .

For defec­tion’s sec­ond con­di­tion, we de­ter­mine when

Com­bin­ing the two con­di­tions, we have

Since , this holds for some nonempty subin­ter­val of . QED.

A sym­met­ric game is Chicken when . In (b), defec­tion only oc­curs when : when player 1 is very likely to turn, player 2 is will­ing to trade a bit of to­tal pay­off for per­sonal pay­off.

The­o­rem 7. In sym­met­ric games, if the Chicken in­equal­ity is satis­fied, defec­tion can ex­ist against equal weight­ings.

Proof. As­sume that the Chicken in­equal­ity is satis­fied. This proof pro­ceeds similarly as in the­o­rem 6. Let be the prob­a­bil­ity that player 1′s strat­egy places on .

For defec­tion’s first con­di­tion, we de­ter­mine when :

The in­equal­ity flips in the first equa­tion be­cause of the di­vi­sion by , which is nega­tive ( and ). , so ; this re­flects the fact that is a Nash equil­ibrium, against which defec­tion is im­pos­si­ble (propo­si­tion 3).

For defec­tion’s sec­ond con­di­tion, we de­ter­mine when

The in­equal­ity again flips be­cause is nega­tive. When , we have , in which case defec­tion does not ex­ist against a pure strat­egy pro­file.

Com­bin­ing the two con­di­tions, we have

Be­cause ,



This bit of ba­sic the­ory will hope­fully al­low for things like prin­ci­pled clas­sifi­ca­tion of poli­cies: “has an agent learned a ‘non-co­op­er­a­tive’ policy in a multi-agent set­ting?”. For ex­am­ple, the em­piri­cal game-the­o­retic analy­ses (EGTA) of Leibo et al.’s Multi-agent Re­in­force­ment Learn­ing in Se­quen­tial So­cial Dilem­mas say that ap­ple-har­vest­ing agents are defect­ing when they zap each other with beams. In­stead of us­ing a qual­i­ta­tive met­ric, you could choose a de­sired non-zap­ping strat­egy pro­file, and then use EGTA to clas­sify for­mal defec­tions from that. This ap­proach would still have a free pa­ram­e­ter, but it seems bet­ter.

I had vague pre-the­o­retic in­tu­itions about ‘defec­tion’, and now I feel more ca­pa­ble of rea­son­ing about what is and isn’t a defec­tion. In par­tic­u­lar, I’d been con­fused by the differ­ence be­tween power-seek­ing and defec­tion, and now I’m not.