Single player extensive-form games as a model of UDT

This post was in­spired by Benja’s SUDT post. I’m go­ing to de­scribe an­other sim­plified model of UDT which is equiv­a­lent to Benja’s pro­posal, and is based on stan­dard game the­ory con­cepts as de­scribed in this Wikipe­dia ar­ti­cle.

First let’s define what is a “sin­gle player ex­ten­sive-form game with chance moves and im­perfect in­for­ma­tion”:

  1. A “sin­gle player ex­ten­sive-form game” is a tree of nodes. Each leaf node is a util­ity value. A play of the game starts at the root and ends at some leaf node.

  2. Some non-leaf nodes are “chance nodes”, with prob­a­bil­ities as­signed to branches go­ing out of that node. All other non-leaf nodes are “de­ci­sion nodes”, where the player can choose which branch to take. (Thanks to bad­ger for helping me fix an er­ror in this part!)

  3. “Im­perfect in­for­ma­tion” means the de­ci­sion nodes are grouped into “in­for­ma­tion sets”. The player doesn’t know which node they’re cur­rently at, only which in­for­ma­tion set it be­longs to.

  4. “Im­perfect re­call” is a spe­cial case of im­perfect in­for­ma­tion, where know­ing the cur­rent in­for­ma­tion set doesn’t even al­low the player to figure out which in­for­ma­tion sets were pre­vi­ously vis­ited, like in the Ab­sent-Minded Driver prob­lem.

  5. We will as­sume that the player can use “be­hav­ioral strate­gies”, where the player can make a ran­dom choice at each node in­de­pen­dently, rather than “mixed strate­gies”, which ran­dom­ize over the set of pure strate­gies for the en­tire game. See Pic­cione and Ru­bin­stein’s pa­per for more on this differ­ence. (Thanks to Coscott for point­ing out that as­sump­tion!)

  6. The be­hav­ioral strat­egy with the high­est ex­pected util­ity will be taken as the solu­tion of the game.

Now let’s try us­ing that to solve some UDT prob­lems:

Ab­sent-Minded Driver is the sim­plest case, since it’s already dis­cussed in the liter­a­ture as a game of the above form. It’s strange that not ev­ery­one agrees that the best strat­egy is in­deed the best, but let’s skip that and move on.

Psy-Kosh’s non-an­thropic prob­lem is more tricky, be­cause it has mul­ti­ple play­ers. We will model it as a sin­gle-player game any­way, putting the de­ci­sion nodes of the differ­ent play­ers in se­quence and group­ing them to­gether into in­for­ma­tion sets in the nat­u­ral way. The re­sult­ing game tree is com­pli­cated, but the solu­tion is the same as UDT’s. As a bonus, we see that our model does not need any kind of an­thropic prob­a­bil­ities, be­cause it doesn’t spec­ify or use the prob­a­bil­ities of in­di­vi­d­ual nodes within an in­for­ma­tion set.

Wei Dai’s co­or­di­na­tion prob­lem is similar to the pre­vi­ous one, but with mul­ti­ple play­ers choos­ing differ­ent ac­tions based on differ­ent in­for­ma­tion. If we use the same trick of fold­ing all play­ers into one, and group the de­ci­sion nodes into in­for­ma­tion sets in the nat­u­ral way, we get the right solu­tion again. It’s nice to see that our model au­to­mat­i­cally solves prob­lems that re­quire Wei’s “ex­plicit op­ti­miza­tion of global strat­egy”.

Coun­ter­fac­tual Mug­ging is even more tricky, be­cause writ­ing it as an ex­ten­sive-form game must in­clude a de­ci­sion node for Omega’s simu­la­tion of the player. Some peo­ple are okay with that, and our model gives the right solu­tion. But oth­ers feel that it leads to con­fus­ing ques­tions about the na­ture of ob­ser­va­tion. For ex­am­ple, what if Omega used a log­i­cal coin, and the player could ac­tu­ally check which way the coin came up by do­ing a long calcu­la­tion? Pay­ing up is prob­a­bly the right de­ci­sion, but our model here doesn’t have enough de­tail.

Fi­nally, Agent Si­mu­lates Pre­dic­tor is the kind of prob­lem that can­not be cap­tured by our model at all, be­cause log­i­cal un­cer­tainty is the whole point of ASP.

It’s in­struc­tive to see the differ­ence be­tween the kind of UDT prob­lems that fit our model and those that re­quire some­thing more. Also it would be easy to im­ple­ment the model as a com­puter pro­gram, and solve some UDT prob­lems au­to­mat­i­cally. (Though the ex­er­cise wouldn’t have much sci­en­tific value, be­cause ex­ten­sive-form games are a well known idea.) In this way it’s a lit­tle similar to Pa­trick’s work on modal agents, which made cer­tain prob­lems solv­able on the com­puter by us­ing modal logic in­stead of enu­mer­at­ing proofs. Now I won­der if other prob­lems that in­volve log­i­cal un­cer­tainty could also be solved by some sim­plified model?