COEDT Equilibria in Games

From Jes­sica’s ear­lier post about con­di­tional or­a­cles, the ques­tion was asked: what is the equil­ibrium con­cept for games with more than one COEDT agent?

This post will (par­tially) an­swer that ques­tion, and provide a link to a tool to vi­su­al­ize 2-player 2-move COEDT equil­ibria sets.

To be­gin with, we will re­view how con­di­tional or­a­cles work. A con­di­tional or­a­cle is a map­ping from (where is some set of finitely many prob­a­bil­is­tic or­a­cle ma­chines). A con­di­tional or­a­cle dis­tri­bu­tion is a prob­a­bil­ity dis­tri­bu­tion over con­di­tional or­a­cles. A con­di­tional or­a­cle dis­tri­bu­tion weakly leads to , if

From the last post, all re­flec­tive con­di­tional or­a­cle dis­tri­bu­tions weakly lead to them­selves.

Two things must be noted. First, the space that the equil­ibrium set lies in is ac­tu­ally cor­re­lated strat­egy space (a prob­a­bil­ity dis­tri­bu­tion over all joint out­comes, which in a 2-player 2-move game, is a tetra­he­dron). This is be­cause the same or­a­cle is se­lected from the or­a­cle dis­tri­bu­tion for all the al­gorithms to use, which opens the door to cor­re­lated out­comes, in­stead of both play­ers in­de­pen­dently draw­ing an or­a­cle from the or­a­cle dis­tri­bu­tion. Se­cond, the full con­cept of an equil­ibrium pro­duced by a re­flec­tive COD (in the sense of a nec­es­sary and suffi­cient con­di­tion) is difficult to char­ac­ter­ize, be­cause it in­volves a bunch of fiddly de­tails of the util­ities of var­i­ous prob­a­bil­ity-zero ac­tions and whether it is pos­si­ble to limit to them in the ap­pro­pri­ate way. There­fore, the fol­low­ing defi­ni­tion doesn’t char­ac­ter­ize all COEDT equil­ibria, just the fully-mixed ones that lie in the in­te­rior of the space in­stead of on a bound­ary (be­cause that cor­re­sponds to hav­ing no out­comes oc­cur with prob­a­bil­ity zero).

Let be the ‘th player, be the util­ity func­tion of the ‘th player, be the strat­egy set of the ‘th player, be the set of out­comes, be the move the ’th player plays as dic­tated by some , and be a prob­a­bil­ity dis­tri­bu­tion over .

Defi­ni­tion 1: A fully-mixed con­di­tional equil­ibria is a prob­a­bil­ity dis­tri­bu­tion with nonzero prob­a­bil­ity over all , s.t.

In short, all play­ers are in­differ­ent be­tween all of the moves that they could play.

The­o­rem 1: All fully mixed con­di­tional equil­ibria in a game have a re­flec­tive con­di­tional or­a­cle dis­tri­bu­tion which re­sults in the equil­ibrium be­ing played.

The proof will be deferred to the end, along with The­o­rem 2:

The­o­rem 2: All con­di­tional or­a­cle dis­tri­bu­tions which re­sult in nonzero prob­a­bil­ity for all out­comes, when used, which aren’t fully-mixed con­di­tional equil­ibria, aren’t re­flec­tive.

Due to these the­o­rems, in the 2-player 2-move case, all con­di­tional equil­ibria ex­cept for some points on the bound­ary may be found by plot­ting an “in­differ­ence sur­face” (where a player is in­differ­ent be­tween both moves) for both play­ers in the unit tetra­he­dron, and the in­ter­sec­tion of the in­differ­ence sur­faces in the in­te­rior of the tetra­he­dron is the set of fully-mixed con­di­tional equil­ibria.

Adam Scherlis has kindly coded a tool to do this, us­ing the Wolfram CDF player. The .cdf file is here.

Now for pic­tures.

This is Pri­soner’s Dilemma. Both D,D and C,C are pos­si­ble equil­ibria.

This is Chicken. There is a line of pos­si­ble equil­ibria where both play­ers get 4 util­ity in ex­pec­ta­tion. The two swerve/​straight out­comes are also pos­si­ble equil­ibria, be­cause there are points ar­bi­trar­ily close to them where the play­ers pick swerve and straight, so there is a se­quence of or­a­cles which limit to swerve/​straight, so the dis­tri­bu­tion which cor­re­sponds to swerve/​straight lies in . This is an in­stance of dis­con­nected points on the bound­ary which are pro­duced by re­flec­tive con­di­tional or­a­cle dis­tri­bu­tions, which aren’t fully-mixed, so they aren’t an in­stance of the equil­ibrium con­cept that we defined. Note that the in­tu­itively de­sir­able out­come where both play­ers go straight with 10% prob­a­bil­ity to get an ex­pected util­ity of 4.05 is not an equil­ibrium, de­spite be­ing the op­ti­mal strat­egy if you know your op­po­nent will se­lect the same prob­a­bil­ity dis­tri­bu­tion over moves as you, be­cause it in­cen­tivizes both play­ers to go straight if they know that the other player plays that strat­egy, and so isn’t sta­ble.

This is Stag Hunt. Again, there are many equil­ibria where both play­ers earn 1 util­ity in ex­pec­ta­tion, and a dis­con­nected equil­ibrium point where both play­ers co­op­er­ate on hunt­ing the stag.

The­o­rem 1 Proof:

For all out­comes, there is an or­a­cle that pro­duces it when used, by map­ping the ap­pro­pri­ate queries made by all the play­ers to or , re­spec­tively. There­fore, all fully-mixed dis­tri­bu­tions over out­comes have fully-mixed or­a­cle dis­tri­bu­tions which pro­duce that dis­tri­bu­tion when used. Given a fully-mixed dis­tri­bu­tion , let be an ar­bi­trary fully-mixed or­a­cle dis­tri­bu­tion which pro­duces it. As­sume the im­ple­men­ta­tion of COEDT where 1 means to take an ac­tion and 0 means to defer to a COEDT al­gorithm that chooses among the ac­tion set with­out ac­tion . is the ’th in­stance in this chain, for player .(to im­ple­ment choice among more than two ac­tions)

Be­cause all or­a­cle queries are of the form , by start­ing with the fi­nal COEDT al­gorithm in the chain, be­cause ex­pected util­ity is the same for both ac­tions (by be­ing a fully-mixed con­di­tional equil­ibrium), there is no con­straint on the or­a­cle dis­tri­bu­tion. Work­ing back­ward from the max­i­mum to , for all the COEDT al­gorithms, be­cause ex­pected util­ity is the same for tak­ing an ac­tion and defer­ring, there is no con­straint on the or­a­cle dis­tri­bu­tion pro­duced.

Be­cause pro­duces no con­straints on what must be, weakly leads to all , and in par­tic­u­lar, weakly leads to . By the defi­ni­tion of from the pre­vi­ous post, and be­ing fully-mixed, , so is a re­flec­tive con­di­tional or­a­cle dis­tri­bu­tion.

The­o­rem 2 Proof:

As be­fore, given a fully-mixed dis­tri­bu­tion , se­lect an ar­bi­trary which pro­duces it. By the same rea­son­ing as be­fore (back­wards in­duc­tion), when we get to the first ac­tion with higher or lower ex­pected util­ity than the other ac­tions that have been seen so far, due to the differ­ence in ex­pected util­ity, we get a con­straint on , namely that , or . Be­cause has some of its prob­a­bil­ity mass com­posed of or­a­cles which per­mit later ac­tions to be taken (be­cause is fully-mixed) , it vi­o­lates this con­di­tion, so does not weakly lead to it­self. All re­flec­tive weakly lead to them­selves, from the pre­vi­ous post, so can­not be re­flec­tive.