Decision theory and zero-sum game theory, NP and PSPACE

(Cross-posted from my blog)

At a rough level:

  • De­ci­sion the­ory is about mak­ing de­ci­sions to max­i­mize some ob­jec­tive func­tion.

  • Zero-sum game the­ory is about mak­ing de­ci­sions to op­ti­mize some ob­jec­tive func­tion while some­one else is mak­ing de­ci­sions to min­i­mize this ob­jec­tive func­tion.

Th­ese are quite differ­ent.

De­ci­sion the­ory and NP

De­ci­sion the­ory roughly cor­re­sponds to the NP com­plex­ity class. Con­sider the fol­low­ing prob­lem:

Given a set of items, each of which has a in­te­ger-val­ued value and weight, does there ex­ist a sub­set with to­tal weight less than and to­tal value at least ?

(It turns out that find­ing a solu­tion is not much harder than de­ter­min­ing whether there is a solu­tion; if you know how to tell whether there is a solu­tion to ar­bi­trary prob­lems of this form, you can in par­tic­u­lar tell if there is a solu­tion that uses any par­tic­u­lar item.)

This is the knap­sack prob­lem, and it is in NP. Given a can­di­date solu­tion, it is easy to check whether it ac­tu­ally is a solu­tion: you just count the val­ues and the weights. Since this solu­tion would con­sti­tute a proof that the an­swer to the ques­tion is “yes”, and a solu­tion ex­ists when­ever the an­swer is “yes”, this prob­lem is in NP.

The fol­low­ing is a gen­eral form for NP prob­lems:

where is a speci­fi­ca­tion of a cir­cuit (say, made of AND, OR, and NOT gates) that out­puts a sin­gle Boolean value. That is, the prob­lem is to de­cide whether there is some as­sign­ment of val­ues to that out­puts true on. This is a var­i­ant of the Boolean satis­fi­a­bil­ity prob­lem.

In de­ci­sion the­ory (and in NP), all op­ti­miza­tion is in the same di­rec­tion. The only quan­tifier is .

Zero-sum game the­ory and PSPACE

Zero-sum game the­ory roughly cor­re­sponds to the PSPACE com­plex­ity class. Con­sider the fol­low­ing prob­lem:

Given a speci­fi­ca­tion of a Rev­ersi game state (on an ar­bi­trar­ily-large square board), does there ex­ists a policy for the light player that guaran­tees a win?

(It turns out that win­ning the game is not much harder than de­ter­min­ing whether there is a win­ning policy; if you know how to tell whether there is a solu­tion to ar­bi­trary prob­lems of this form, then in par­tic­u­lar you can tell if dark can win given a start­ing move by light.)

This prob­lem is in PSPACE: it can be solved by a Tur­ing ma­chine us­ing a polyno­mial amount of space. This Tur­ing ma­chine works through the min­i­max al­gorithm: it simu­lates all pos­si­ble games in a back­track­ing fash­ion.

The fol­low­ing is a gen­eral form for PSPACE prob­lems:

where is a speci­fi­ca­tion of a cir­cuit (say, made of AND, OR, and NOT gates) that out­puts a sin­gle Boolean value. That is, the prob­lem is to de­ter­mine whether it is pos­si­ble to set the val­ues in­ter­leaved with an op­po­nent set­ting the val­ues such that, no mat­ter how the op­po­nent acts, is true. This is a var­i­ant of the quan­tified Boolean for­mula prob­lem. (In­ter­pret­ing a log­i­cal for­mula con­tain­ing and as a game is stan­dard; see game se­man­tics).

In zero-sum game the­ory, all op­ti­miza­tion is in one of two com­pletely op­po­site di­rec­tions. There is liter­ally no differ­ence be­tween some­thing that is good for one player and some­thing that is bad for the other. The op­pos­ing quan­tifiers and , rep­re­sent­ing de­ci­sions by the two op­po­nents, are in­ter­leaved.

Differ­ent cog­ni­tive modes

The com­par­i­son to com­plex­ity classes sug­gests that there are two differ­ent cog­ni­tive modes for de­ci­sion the­ory and zero-sum game the­ory, as there are two differ­ent types of al­gorithms for NP-like and PSPACE-like prob­lems.

In de­ci­sion the­ory, you plan with no re­gard to any op­po­nents in­terfer­ing with your plans, al­low­ing you to plan on ar­bi­trar­ily long time scales. In zero-sum game the­ory, you plan on the as­sump­tion that your op­po­nent will in­terfere with your plans (your s are in­ter­leaved with your op­po­nent’s s), so you can only plan as far as your op­po­nent lacks the abil­ity to in­terfere with these plans. You must have a short OODA loop, or your op­po­nent’s in­terfer­ence will make your plans use­less.

In de­ci­sion the­ory, you can mostly run on naïve ex­pected util­ity anal­y­sis: just do things that seem like they will work. In zero-sum game the­ory, you must screen your plans for defen­si­bil­ity: they must be re­sis­tant to pos­si­ble at­tacks. Com­pare farm­ing with bor­der defense, me­chan­i­cal en­g­ineer­ing with com­puter se­cu­rity.

High-re­li­a­bil­ity en­g­ineer­ing is an in­ter­me­di­ate case: de­signs must be se­lected to work with high prob­a­bil­ity across a va­ri­ety of con­di­tions, but there is nor­mally no in­tel­li­gent op­ti­miza­tion power work­ing against the de­sign. One could think of na­ture as an “ad­ver­sary” se­lect­ing some con­di­tion to test the de­sign against, and rep­re­sent this se­lec­tion by a uni­ver­sal quan­tifier; how­ever, this is qual­i­ta­tively differ­ent from a true ad­ver­sary, who ap­plies in­ten­tional op­ti­miza­tion to break a de­sign rather than hap­haz­ard se­lec­tion of con­di­tions.

Conclusion

Th­ese two types of prob­lems do not cover all re­al­is­tic situ­a­tions an agent might face. De­ci­sion prob­lems in­volv­ing agents with differ­ent but not com­pletely op­posed ob­jec­tive func­tions are differ­ent, as are zero-sum games with more than two play­ers. But re­al­is­tic situ­a­tions share some prop­er­ties with each of these, and I sus­pect that there might ac­tu­ally be a dis­crete dis­tinc­tion be­tween cog­ni­tive modes for NP-like de­ci­sion the­ory prob­lems and PSPACE-like zero-sum games.

What’s the up­shot? If you want to know what is go­ing on, one of the most im­por­tant ques­tions (per­haps the most im­por­tant ques­tion) is: what kind of game are you play­ing? Is your situ­a­tion more like a de­ci­sion the­ory prob­lem or a zero-sum game? To what ex­tent is op­ti­miza­tion by differ­ent agents go­ing in the same di­rec­tion, op­pos­ing di­rec­tions, or or­thog­o­nal di­rec­tions? What would have to change for the na­ture of the game to change?


Thanks to Michael Vas­sar for draw­ing my at­ten­tion to the dis­tinc­tion be­tween de­ci­sion the­ory and zero-sum game the­ory as a dis­tinc­tion be­tween two cog­ni­tive modes.

Re­lated: The Face of the Ice