# Cooperative Oracles

The long-awaited con­tinu­a­tion of this se­quence.

Credit to Sam, Tsvi, Nisan, Scott, me, and Giles (who helped make the figures)

It may or may not be­come a pa­per, but if it doesn’t, it’s best for it to not be locked in a folder of drafts. This will also re­cap ma­te­rial from pre­vi­ous posts and pa­pers.

## Refresher on Reflec­tive Oracles

Let de­note an ar­bi­trary prob­a­bil­is­tic Tur­ing ma­chine with or­a­cle ac­cess, and re­fer to the same ma­chine equipped with the or­a­cle . It is im­por­tant to re­mem­ber that refers to a ran­dom­ized al­gorithm, while refers to the al­gorithm equipped with a par­tic­u­lar or­a­cle, so it doesn’t make sense to ask what the out­put of is, only what the out­put of is.

Reflec­tive or­a­cles as origi­nally defined here are or­a­cles which ac­cu­rately an­swer queries about whether a prob­a­bil­is­tic Tur­ing ma­chine with ac­cess to the same or­a­cle, , out­put a 1 with prob­a­bil­ity greater than or less than , by out­putting 1 for “yes”, 0 for “no”, and ran­dom­iz­ing be­tween the two if the prob­a­bil­ity of out­putting 1 is ex­actly . This is more ex­pres­sive than it may seem at first, be­cause it is pos­si­ble to query the or­a­cle about a prob­a­bil­is­tic or­a­cle ma­chine that runs and checks whether the out­put has some ar­bi­trary com­putable prop­erty.

It is also pos­si­ble to use re­flec­tive or­a­cles to make the prob­a­bil­ity dis­tri­bu­tion over the out­put of one al­gorithm vary con­tin­u­ously with re­spect to the prob­a­bil­ity dis­tri­bu­tion over the out­put of an­other al­gorithm, via re­peated or­a­cle calls and ran­dom­iza­tion, as demon­strated here.

Reflec­tive or­a­cles can be thought of as akin to halt­ing or­a­cles, ex­cept that they ac­cu­rately an­swer ques­tions about the class of prob­a­bil­is­tic Tur­ing ma­chines with ac­cess to the same or­a­cle, which halt­ing or­a­cles can­not do. If the di­ag­o­nal­iza­tion ar­gu­ment against the ex­is­tence of a halt­ing or­a­cle is ap­plied to a re­flec­tive or­a­cle, where an al­gorithm in­vokes the or­a­cle on it­self, and out­puts the bit which it is the least likely to out­put, a re­flec­tive or­a­cle will ran­dom­ize be­tween out­putting 0 and 1, but for any , if the or­a­cle is asked whether the al­gorithm out­puts 1 with greater than prob­a­bil­ity , it will re­spond “yes”.

This abil­ity to ac­cu­rately an­swer ques­tions about them­selves makes re­flec­tive or­a­cles well-suited for study­ing the un­bounded com­put­ing power limit of game the­ory, and situ­a­tions where an agent rea­sons about a sur­round­ing en­vi­ron­ment which con­tains it­self. In par­tic­u­lar, if a set of agents in a game all use a re­flec­tive or­a­cle to perfectly pre­dict the be­hav­ior of all other agents in the game, and then se­lect the op­ti­mal ac­tion, a Nash equil­ibrium will be played in the game. This will be called the short­sighted best re­sponse al­gorithm.

To be­gin, there is a canon­i­cal way to trans­late any game the­ory prob­lem into the set­ting of prob­a­bil­is­tic or­a­cle ma­chines. As a con­crete ex­am­ple, take a game of Pri­soner’s Dilemma, with an ex­change of source code be­tween agents. Let and re­fer to the source code of the two play­ers in a Pri­soner’s Dilemma game. Then the game may be thought of as the two prob­a­bil­is­tic or­a­cle ma­chines  and   call­ing the re­flec­tive or­a­cle on each other, and out­putting 0 for “defect” and 1 for “co­op­er­ate”. In gen­eral, any game can be ex­pressed by ex­press­ing all play­ers as a pair of two al­gorithms, one of which out­puts a strat­egy, and one of which takes no in­put and com­putes the util­ity via or­a­cle calls. By fix­ing a suit­able in­put for the al­gorithm which gen­er­ates the strat­egy, which rep­re­sents the in­for­ma­tion a player has, all strat­egy al­gorithms be­come in­put­less prob­a­bil­is­tic or­a­cle ma­chines which make or­a­cle calls to other play­ers.

More for­mally, let a player be a pair of a prob­a­bil­is­tic or­a­cle ma­chine , and a prob­a­bil­is­tic or­a­cle ma­chine which halts with prob­a­bil­ity 1 re­gard­less of the choice of or­a­cle, and out­puts 0 or 1. is in­ter­preted as the al­gorithm which pro­duces the strat­egy of the player, and is in­ter­preted as the util­ity func­tion. Fix­ing an or­a­cle , the prob­a­bil­ity of out­putting 1 is the util­ity of the player .

There are many differ­ent var­i­ants of a re­flec­tive or­a­cle, which are es­sen­tially equiv­a­lent. There is the stan­dard sin­gle-bit re­flec­tive or­a­cle, pre­sented here but multi-bit re­flec­tive or­a­cles may also be defined. Here, I used the semimea­sure for­mu­la­tion of re­flec­tive or­a­cles, which ac­cu­rately an­swers any ques­tion about pre­fixes of a Tur­ing ma­chine out­put, and as­sume that the out­puts of the Tur­ing ma­chine form a semimea­sure over the set of finite bit­strings.

In­stead of the queries con­sist­ing of a pair, they con­sist of a triple, ask­ing about the prob­a­bil­ity of the com­po­nent of halt­ing with a bit­string that has as a pre­fix. Let be the prob­a­bil­ity that halts with a bit­string that has as a pre­fix, and let be the prob­a­bil­ity that halts with a bit­string that doesn’t have as a pre­fix. Let be the prob­a­bil­ity of the or­a­cle re­turn­ing a 1 in re­sponse to the query . Then one of the defin­ing prop­er­ties of a re­flec­tive or­a­cle is that, for all bit­strings , ra­tio­nal prob­a­bil­ities , and play­ers :

If a prob­a­bil­is­tic or­a­cle ma­chine halts with a cer­tain prob­a­bil­ity dis­tri­bu­tion, it will be ac­cu­rately cap­tured by this var­i­ant of re­flec­tive or­a­cle, but non­halt­ing prob­a­bil­ity mass may be as­signed to an ar­bi­trary dis­tri­bu­tion over bit­strings, as long as it makes a semimea­sure.

## Co­op­er­a­tive Or­a­cles:

In the par­tic­u­lar case where the two agents in a Pri­soner’s Dilemma both use short­sighted best re­sponse, the set of fixed points is the set of Nash equil­ibria, so they will mu­tu­ally defect.

How­ever, differ­ent al­gorithms may be used by the play­ers. As long as each player’s set of re­sponses to ev­ery­one else’s strate­gies is nonempty, con­vex, and has closed graph, Kaku­tani’s fixed point the­o­rem guaran­tees the ex­is­tence of an “equil­ibrium set” where all play­ers re­spond to knowl­edge of all other player strate­gies with the ex­act strat­egy that all other play­ers pre­dicted. This is the same as the fair set from here.

If we con­sider two agents in the Pri­soner’s Dilemma im­ple­ment­ing the al­gorithm “co­op­er­ate with the same prob­a­bil­ity as my op­po­nent co­op­er­ates”, there is a con­tinuum of re­flec­tive or­a­cles for all prob­a­bil­ities of co­op­er­a­tion in . In­tu­itively, mu­tual co­op­er­a­tion should be se­lected. This splits the prob­lem of for­mal­iz­ing Func­tional De­ci­sion The­ory in the un­bounded case into the prob­lem of find­ing an al­gorithm for the agent to use, and the prob­lem of find­ing an or­a­cle which causes a Pareto op­ti­mal el­e­ment of the equil­ibrium set to be played.

A naive im­ple­men­ta­tion, where we se­lect a re­flec­tive or­a­cle that is Pareto op­ti­mal for all play­ers, fails. This is be­cause all util­ity func­tions have an op­pos­ing one, which makes all out­comes Pareto op­ti­mal, be­cause there is no out­come that makes one player bet­ter off with­out mak­ing an­other player worse off. In the Pri­soner’s Dilemma, if there is a third player, a jailer who wants to max­i­mize jail time, but can­not oth­er­wise in­fluence the out­come of the game, all out­comes are Pareto op­ti­mal, be­cause there is no choice which makes a pris­oner bet­ter off that doesn’t also leave the jailer worse off, and vice-versa. There­fore, the op­ti­mal­ity no­tion must some­how re­strict to the set of play­ers who are ac­tu­ally par­ti­ci­pat­ing in a game, in­stead of the set of all play­ers.

To clar­ify what it means to talk about the set of play­ers who par­ti­ci­pate in a game, we can look at the struc­ture of which play­ers make or­a­cle calls to other play­ers. If there is a game with finitely many play­ers, the util­ity func­tions of all play­ers should make or­a­cle calls to the out­put of all play­ers, to com­pute a util­ity.

Let the de­pen­dency set of player be the set of all play­ers that ei­ther the strat­egy al­gorithm or the util­ity func­tion of may query the or­a­cle about, as well as it­self. Let the re­la­tion be the min­i­mal re­la­tion such that if is in ’s de­pen­dency set, . The tran­si­tive clo­sure of this re­la­tion, , in­duces a poset over equiv­alence classes of play­ers.

Min­i­mal el­e­ments in this poset cor­re­spond to sets of play­ers whose strate­gies and util­ity func­tions only de­pend on each other. Play­ers in a self-con­tained game have this prop­erty. A non-min­i­mal equiv­alence class in this poset cor­re­sponds to a game where the util­ity func­tions of some par­ti­ci­pants are af­fected by what hap­pens in a differ­ent game they can­not in­fluence. Due to these prop­er­ties, is a nat­u­ral choice to define the bound­aries of a game, and to for­mal­ize the desider­a­tum that our no­tion of op­ti­mal­ity should not be Pareto op­ti­mal for spec­ta­tors.

Now, the no­tion of strat­ified Pareto op­ti­mal­ity may be defined. Let be the util­ity of the player , upon be­ing equipped with the or­a­cle . A re­flec­tive or­a­cle is a strat­ified Pareto im­prove­ment on an­other re­flec­tive or­a­cle , for a player , iff:

(1):

(2):

(3):

The first con­di­tion is that a strat­ified Pareto im­prove­ment for must make bet­ter off.

The sec­ond con­di­tion is that all play­ers that de­pends on, even in­di­rectly, must be un­af­fected or bet­ter off. In the case of a finite and min­i­mal equiv­alence class of play­ers un­der, this says that all other play­ers in the game must be un­af­fected or bet­ter off, so all strat­ified Pareto im­prove­ments are Pareto im­prove­ments for this game. In the case of the jailer in the Pri­soner’s Dilemma, it says that both pris­on­ers must re­ceive the same or greater util­ity, be­cause the jailer’s util­ity func­tion makes an or­a­cle query about their out­put.

The third con­di­tion is that a change in or­a­cles should only af­fect play­ers whose strat­egy al­gorithm or util­ity func­tion de­pends on the out­come of a game, and leave all other play­ers un­af­fected. This con­di­tion is used to for­bid cases where se­lect­ing a bet­ter or­a­cle in a game of Stag Hunt makes the play­ers in a to­tally un­re­lated game of Chicken worse off, be­cause the same or­a­cle is used by all play­ers.

A point is strat­ified Pareto op­ti­mal (SPO) for a set of play­ers if there are no strat­ified Pareto im­prove­ments for any play­ers in . In the par­tic­u­lar case of a Pri­soner’s Dilemma with a jailer, a strat­ified Pareto op­ti­mum will in­volve both pris­on­ers se­lect­ing a Pareto op­ti­mal out­come in their equil­ibrium set, and ig­nor­ing the jailer.

How­ever, even strat­ified Pareto op­tima aren’t enough in all cases, by the case of the chain of in­finitely many play­ers de­tailed here

Let an al­most strat­ified Pareto op­ti­mum for a set of play­ers be a re­flec­tive or­a­cle that is in the clo­sure of the set of strat­ified Pareto op­ti­mal or­a­cles for all play­ers in . In the case of the in­finitely de­scend­ing chain of play­ers, an al­most strat­ified Pareto op­ti­mal re­flec­tive or­a­cle would re­port that all play­ers had prob­a­bil­ity 0 of out­putting 1, be­cause that is the limit of a se­quence of SPO or­a­cles for a player, which move the “first player which out­puts 1” ar­bi­trar­ily far back in the se­quence.

The­o­rem 1 (weak form): There ex­ists a re­flec­tive or­a­cle that is al­most strat­ified Pareto op­ti­mal for all play­ers.

A proof sketch for The­o­rem 1 is as fol­lows:

Given an ar­bi­trary finite set of play­ers, , there is a finite poset of equiv­alence classes of play­ers in un­der . Given some ini­tial set of re­flec­tive or­a­cles, it is pos­si­ble to con­struct a sub­set that is Pareto op­ti­mal for all play­ers in a min­i­mal equiv­alence class , and a Pareto op­ti­mal sub­set of that for a min­i­mal equiv­alence class in the poset with re­moved from it, and so on. This pro­cess re­peats un­til there is a set of SPO re­flec­tive or­a­cles for all play­ers in . Now that ex­is­tence has been shown, take the clo­sure of the set of SPO re­flec­tive or­a­cles for . The re­sult­ing set is a com­pact set of ASPO re­flec­tive or­a­cles for . This ar­gu­ment works for all finite . Be­cause there is a com­pact and nonempty set of ASPO re­flec­tive or­a­cles for all finite sets of play­ers, by com­pact­ness, there must be a com­pact and nonempty set of ASPO re­flec­tive or­a­cles for all play­ers.

This is pretty much the old proof of “there’s an ASPO point for all play­ers”, the only differ­ence is that this proof takes place in re­flec­tive-or­a­cle space, in­stead of the space of the ac­tions of all play­ers. Let a weak co­op­er­a­tive or­a­cle be a re­flec­tive or­a­cle with this prop­erty.

Let a well-founded equiv­alence class be an equiv­alence class of play­ers un­der com­posed of finitely many play­ers, such that play­ers within only make or­a­cle calls to other well-founded equiv­alence classes. Min­i­mal equiv­alence classes un­der , of finitely many play­ers, are well-founded. It is pos­si­ble to strengthen The­o­rem 1 to:

The­o­rem 1: There ex­ists a re­flec­tive or­a­cle that is ASPO for all play­ers, and SPO for all play­ers in a well-founded equiv­alence class.

This is proved by con­struct­ing a suit­able set of re­flec­tive or­a­cles that are SPO for all play­ers in a well-founded equiv­alence class, and then us­ing this as the ini­tial set in the pre­vi­ous proof sketch. The re­sult­ing set of ASPO re­flec­tive or­a­cles is a sub­set, and in­her­its this prop­erty.

An or­a­cle of this type is called a strong co­op­er­a­tive or­a­cle. Al­most all games of in­ter­est in game the­ory in­volve finitely many play­ers play­ing against each other. This is the con­di­tion for a min­i­mal finite equiv­alence class un­der , all of which are well-founded. There­fore, a strong co­op­er­a­tive or­a­cle is an SPO or­a­cle for any rea­son­able game that doesn’t have non-well-founded chains of in­finitely many play­ers.

How­ever, the proof of the ex­is­tence of a co­op­er­a­tive or­a­cle in­volves the in­finite-di­men­sional space of re­flec­tive or­a­cles, so it is not im­me­di­ately linked to prac­ti­cal game the­ory. By im­pos­ing sev­eral nat­u­ral con­di­tions, it is pos­si­ble to link the two set­tings. (Also, it is un­clear whether the fi­nal con­di­tion of strat­egy con­ti­nu­ity is nec­es­sary).

The­o­rem 2: In a game of finitely many play­ers, which all have finitely many pos­si­ble out­puts, have or­a­cle ac­cess to each other and no other play­ers, halt with prob­a­bil­ity 1 given or­a­cle ac­cess to the other play­ers, and have con­tin­u­ous strate­gies w.r.t ev­ery­one’s prob­a­bil­ity dis­tri­bu­tion over out­puts, all points in the equil­ibrium set have re­flec­tive or­a­cles which re­sult in that point be­ing played in the game.

Corol­lary 2: If the con­di­tions of The­o­rem 2 hold, then for any point on the Pareto fron­tier of the equil­ibrium set, there is a strong co­op­er­a­tive or­a­cle which re­sults in that point be­ing played in the game.

There­fore, the equil­ibrium se­lec­tion prob­lem is solved by the use of a spe­cial type of re­flec­tive or­a­cle which re­sults in Pareto op­ti­mal out­comes within the equil­ibrium set when­ever it is used, aka a strong co­op­er­a­tive or­a­cle.

How­ever, there is still an open ques­tion about which al­gorithms re­sult in some­thing “FDT-like” when used with a co­op­er­a­tive or­a­cle. To make par­tial progress to­wards a re­sult, we will now look at the Ul­ti­ma­tum Game. In the Ul­ti­ma­tum Game, there is a pool of money, such as 100 dol­lars, and player A speci­fies a split of money be­tween player A and player B, and then player B can ei­ther choose to ac­cept the offer, or re­ject it, in which case both play­ers get 0 dol­lars.

Imag­ine player B is im­ple­ment­ing the strat­egy “I will re­ject the money un­less I get 90 dol­lars or more”. If player A would offer a 1090 split in re­sponse to this strat­egy, and this is known to player B, this pro­vides an in­cen­tive for player B to im­ple­ment this ex­tor­tionate strat­egy. Similarly, by view­ing the game from the per­spec­tive of player B, ac­cept­ing a split of money where 90 dol­lars goes to player A in­cen­tivises player A to offer a 9010 split.

Ad­di­tion­ally, if the strate­gies of player A and player B act to ac­quire 90 dol­lars, by propos­ing a 9010 split, and re­ject­ing all offers short of a 1090 split, the game out­come will be A offer­ing a 9010 split, and B re­ject­ing it, so both play­ers walk away with no money. Ex­tor­tionate strate­gies fail when they go against each other.

Is there a strat­egy that does not in­cen­tivize ex­tor­tion? There is, in a cer­tain sense, as long as there is some no­tion of a “fair out­come”, such as the Nash bar­gain­ing solu­tion. As an ex­am­ple, in the Ul­ti­ma­tum Game, if a 5050 split is con­sid­ered to be a fair out­come by player B, they could im­ple­ment the strat­egy “if the money I am offered, , is less than 50 dol­lars, re­ject the offer with a prob­a­bil­ity, where ”. Against this strat­egy, the op­ti­mal offer is a 5050 split of money, be­cause in ex­pec­ta­tion, money is lost by offer­ing any split that is more self-fa­vor­ing than 5050. Each marginal dol­lar that goes to­wards player A in the ini­tial split is com­pen­sated by an in­crease in the prob­a­bil­ity of player B re­ject­ing the offer, mak­ing the ex­pected value of tak­ing one more dol­lar from the split zero or nega­tive. There­fore, ex­tor­tion be­yond the “fair out­come” is dis­in­cen­tivized.

Another ad­van­tage of this strat­egy is that it is ro­bust against differ­ent agents hav­ing differ­ent no­tions of a “fair out­come”. If player A thinks that a 6040 split is a fair out­come, and player B thinks that a 5050 split is a fair out­come, and , then player B will re­ject the offer with a prob­a­bil­ity, lead­ing to an ex­pected gain of 50 dol­lars for player A and dol­lars for player B, in­stead of a guaran­teed re­jec­tion of the offer.

Red is the strat­egy of player A, blue is the strat­egy of player B, the dis­play is over util­ity-space, player B is im­ple­ment­ing a co­er­cion-proof strat­egy.

How­ever, this class of strate­gies does not have an el­e­gant way of a-pri­ori se­lect­ing the “fair point” that it will at­tempt to en­force. In­tu­itively, in the Ul­ti­ma­tum Game, there’s a trade­off where en­forc­ing a strict de­mand, like an 3070 split, means that there are more pos­si­ble op­po­nents which will re­ject the offer, while en­forc­ing a weak de­mand, like a 7030 split, means that there are more pos­si­ble op­po­nents which will ex­ploit this. The Nash bar­gain­ing solu­tion is a can­di­date “fair point”, but it is un­known whether there is an al­gorithm that would nat­u­rally at­tain it un­der ap­pro­pri­ate cir­cum­stances.

The in­sight of “be in­creas­ingly venge­ful when the op­pos­ing play­ers cause you to lose util­ity” will likely be a com­po­nent of the true solu­tion, and gen­er­al­izes to other games such as the Pri­soner’s Dilemma and Chicken, but it is not a com­plete ac­count, as will be shown in the next sec­tion. Hu­mans also seem to act ac­cord­ing to this strat­egy to some de­gree, in­stead of fol­low­ing the Homo Eco­nomi­cus strat­egy of ac­cept­ing any offer greater than 0 dol­lars.

On the Pri­soner’s Dilemma, a nat­u­ral choice for an ex­tor­tion-proof strat­egy is “co­op­er­ate with the same prob­a­bil­ity that the op­po­nent co­op­er­ates”. This leads to mu­tual co­op­er­a­tion against a copy of it­self, when equipped with a co­op­er­a­tive or­a­cle, as seen in Figure 2.

Pri­soner’s Dilemma, mu­tual co­op­er­a­tion is the Pareto-op­ti­mal point in the equil­ibrium set where the strate­gies in­ter­sect.

If the prob­a­bil­ity of player A co­op­er­at­ing is , then if player B im­ple­ments the strat­egy “co­op­er­ate with prob­a­bil­ity for some ”, mu­tual defec­tion will oc­cur, as that is the only fixed point. This ap­pears sub­op­ti­mal, but per­mit­ting ex­ploita­tion, even in a mild form, in­cen­tivizes op­po­nents to mod­ify their strat­egy to an ex­ploit­ing strat­egy. If Player B uses short­sighted best re­sponse, then mu­tual defec­tion will oc­cur. This is be­cause player B will always play defect be­cause it is a dom­i­nant strat­egy, and player A will play defect as re­tal­i­a­tion. If Player B always co­op­er­ates, then Player A will co­op­er­ate as well. This shows that im­ple­ment­ing an ex­tor­tion-proof strat­egy is not enough. A satis­fac­tory strat­egy should lead to defec­tion against a player which co­op­er­ates re­gard­less of op­po­nent.

A similar strat­egy ap­plies to the game of Chicken. The strat­egy is “play Straight with the same prob­a­bil­ity that the op­po­nent plays Straight, un­less the prob­a­bil­ity of them play­ing Sw­erve is greater than , in which case, play Straight.” This leads to crash­ing into an op­po­nent which goes straight, which in­cen­tivizes the op­po­nent to avoid strate­gies such as toss­ing their steer­ing wheel out the win­dow. If two agents of this type play against each other, then by Figure 3, they both get ex­pected util­ity with a prob­a­bil­ity of swerv­ing. This is a Pareto im­prove­ment on the mixed-strat­egy Nash equil­ibrium where both play­ers get 2 ex­pected util­ity with a prob­a­bil­ity of swerv­ing. If this player plays against a player B us­ing short­sighted best-re­sponse, a co­op­er­a­tive or­a­cle will re­sult in the out­come where player A goes straight and player B swerves, for an ex­pected util­ity of 3 and 2, re­spec­tively.

The game of chicken.

Con­sider the co­op­er­a­tion game where player A has a prefer­ence for go­ing to a movie, and player B has a prefer­ence for go­ing home to play a board game. They also have a prefer­ence for go­ing to the same event, and go­ing to a movie leads to nonzero util­ity even if the movie­goer is alone.

If both play­ers im­ple­ment the strat­egy “con­sult the or­a­cle, and go to whichever event the other player has a higher prob­a­bil­ity of go­ing to, with a rapid change of strat­egy around 50 per­cent”, then the Pareto fron­tier of the equil­ibrium set con­sists of the two points where both play­ers co­or­di­nate on an ac­tion. The choice of co­op­er­a­tive or­a­cle de­ter­mines which ac­tion is se­lected, and co­or­di­na­tion oc­curs be­cause both play­ers are us­ing the same or­a­cle. This is similar to the choice of re­flec­tive or­a­cle de­ter­min­ing which Nash equil­ibrium is played in a game.

A co­op­er­a­tion game. Of the 3 points in the equil­ibrium set, the choice of co­op­er­a­tive or­a­cle en­sures that the out­come is co­or­di­na­tion on an ac­tivity.

By look­ing at these ex­am­ples, there are two fur­ther di­rec­tions for de­sign­ing a strat­egy which effec­tively makes use of a co­op­er­a­tive or­a­cle. The first di­rec­tion is de­vel­op­ing a gen­eral crite­ria which al­lows a player to ex­ploit some ex­ploitable op­po­nents. In the Pri­soner’s Dilemma, against an op­po­nent which co­op­er­ates with 10 per­cent greater prob­a­bil­ity than you co­op­er­ate, the de­sired ac­tion is to co­op­er­ate with 90 per­cent prob­a­bil­ity. The sec­ond di­rec­tion for im­prove­ment is find­ing a sin­gle de­ci­sion rule, like for Nash equil­ibria, that pro­duces an ex­ploita­tion-proof strat­egy in­cen­tiviz­ing op­po­nents to play at a Pareto op­ti­mal point, in­stead of cus­tomiz­ing a strat­egy of that type for each game sep­a­rately.

(Proofs were omit­ted be­cause they’re quite long. I can post them if there is in­ter­est.)

• How did you cre­ate your vi­su­al­iza­tions of game pay­offs? I re­ally like the way they show the rele­vant de­tails of pay­offs, equil­ibrium points, and the strate­gies of the play­ers.

• Giles Ed­kins coded up a thing which lets you plug in num­bers for a 2-player, 2-move game pay­off ma­trix and it au­to­mat­i­cally dis­plays pos­si­ble out­comes in util­ity-space. It may be found here. The equil­ibrium points and strat­egy lines were added later in MS Paint.

• The defi­ni­tion of strat­ified Pareto im­prove­ment doesn’t seem right to me. You are try­ing to solve the prob­lem that there are too many Pareto op­ti­mal out­comes. So, you need to make the no­tion of Pareto im­prove­ment weaker. That is, you want more changes to count as Pareto im­prove­ments so that less out­comes count as Pareto op­ti­mal. How­ever, the defi­ni­tion you gave is strictly stronger than the usual defi­ni­tion of Pareto im­prove­ment, not strictly weaker (be­cause con­di­tion 3 has equal­ity in­stead of in­equal­ity). What it seems like you need is drop­ping con­di­tion 3 en­tirely.

The defi­ni­tion of al­most strat­ified Pareto op­ti­mum also doesn’t make sense to me. What prob­lem are you try­ing to solve? The clo­sure of a set can only be larger than the set. Also, the clo­sure of an empty set is empty. So, on the one hand, any strat­ified Pareto op­ti­mum is in par­tic­u­lar an al­most strat­ified Pareto op­ti­mum. On the other hand, if there ex­ists an al­most strat­ified Pareto op­ti­mum, then there ex­ists a strat­ified Pareto op­ti­mum. So, you nei­ther re­fine the defi­ni­tion of an op­ti­mum nor make ex­is­tence eas­ier.

• It ac­tu­ally is a weak­en­ing. Be­cause all changes can be in­ter­preted as mak­ing some player worse off if we just use stan­dard Pareto op­ti­mal­ity, the sec­ond con­di­tion mean that more changes count as im­prove­ments, as you cor­rectly state. The third con­di­tion cuts down on which changes count as im­prove­ments, but the com­bi­na­tion of con­di­tions 2 and 3 still has some changes be­ing la­beled as im­prove­ments that wouldn’t be im­prove­ments un­der the old con­cept of Pareto Op­ti­mal­ity.

The defi­ni­tion of an al­most strat­ified Pareto op­ti­mum was adapted from this , and was de­vel­oped speci­fi­cally to ad­dress the in­finite game in that post in­volv­ing a non-well-founded chain of play­ers, where noth­ing is a strat­ified Pareto op­ti­mum for all play­ers. Some­thing isn’t strat­ified Pareto op­ti­mal in a vac­uum, it’s strat­ified Pareto op­ti­mal for a par­tic­u­lar player. There’s no or­a­cle that’s strat­ified Pareto op­ti­mal for all play­ers, but if you take the clo­sure of ev­ery­one’s SPO sets first to pro­duce a set of ASPO or­a­cles for ev­ery player, and take the in­ter­sec­tion of all those sets, there are points which are ASPO for ev­ery­one.

• “the com­bi­na­tion of con­di­tions 2 and 3 still has some changes be­ing la­beled as im­prove­ments that wouldn’t be im­prove­ments un­der the old con­cept of Pareto Op­ti­mal­ity.”

Why? Con­di­tion 3 im­plies that U_{RO,j} \leq U_{RO’,j}. So, to­gether with con­di­tion 2, we get that U_{RO,j} \leq U_{RO’,j} for any j. That pre­cisely means that this is a Pareto im­prove­ment in the usual sense.

• The game fixes each player’s util­ity func­tion, and then we fix all strate­gies, yes? It doesn’t seem right to bun­dle one strat­egy and one util­ity func­tion, which aren’t fixed at the same time, into one “player”.

• I’m not sure what you mean by fix­ing the util­ity func­tion oc­cur­ring be­fore fix­ing the strat­egy. In the prob­lem setup of a game, you spec­ify a util­ity func­tion ma­chine and a strat­egy ma­chine for ev­ery­one, and there isn’t any sort of time or or­der on this (there’s just a set of pairs of prob­a­bil­is­tic or­a­cle ma­chines) and you can freely con­sider things such as “what hap­pens when we change some player’s strate­gies/​util­ity func­tion ma­chines”

• I ex­pected that we want to ask ques­tions such as which strat­egy in a game dom­i­nates all oth­ers, which re­quires that the util­ity func­tions are known but the strate­gies aren’t.

Should the de­pen­dency re­la­tion care about or­a­cle queries from strate­gies? After all, a strat­egy could pre­serve its be­hav­ior while re­mov­ing a de­pen­dency by simu­lat­ing what it was go­ing to ask the or­a­cle about, defer­ring all new queries to the or­a­cle again. Or is this ruled out by in­ter­est­ing query simu­la­tions some­times di­verg­ing? Worse, you can make an ar­bi­trary util­ity func­tion part of the pareto op­ti­miza­tion by ask­ing the or­a­cle about the player with par­tic­u­lar util­ity and triv­ial strat­egy.

• The ba­sic rea­son for the de­pen­dency re­la­tion to care about or­a­cle queries from strate­gies is that, when you have sev­eral play­ers all call­ing the or­a­cle on each other, there’s no good way to swap out the or­a­cle calls with the com­pu­ta­tion. The trick you de­scribe does in­deed work, and is a rea­son to not call any more tur­ing ma­chines than you need to, but there’s sev­eral things it doesn’t solve. For in­stance, if you are player 1, and your strat­egy de­pends on or­a­cle calls to player 2 and 3, and the same ap­plies to the other two play­ers, you may be able to swap out an or­a­cle call to player two with player two’s ac­tual code (which calls play­ers 1 and 3), but you can’t un­pack any more or­a­cle calls into their re­spec­tive com­pu­ta­tions with­out hit­ting an in­finite regress.