Decision Theories: A Less Wrong Primer

Alpha-beta pruning (from Wikipedia)

Sum­mary: If you’ve been won­der­ing why peo­ple keep go­ing on about de­ci­sion the­ory on Less Wrong, I wrote you this post as an an­swer. I ex­plain what de­ci­sion the­o­ries are, show how Causal De­ci­sion The­ory works and where it seems to give the wrong an­swers, in­tro­duce (very briefly) some can­di­dates for a more ad­vanced de­ci­sion the­ory, and touch on the (pos­si­ble) con­nec­tion be­tween de­ci­sion the­ory and ethics.

What is a de­ci­sion the­ory?

This is go­ing to sound silly, but a de­ci­sion the­ory is an al­gorithm for mak­ing de­ci­sions.0 The in­puts are an agent’s knowl­edge of the world, and the agent’s goals and val­ues; the out­put is a par­tic­u­lar ac­tion (or plan of ac­tions). Ac­tu­ally, in many cases the goals and val­ues are im­plicit in the al­gorithm rather than given as in­put, but it’s worth keep­ing them dis­tinct in the­ory.

For ex­am­ple, we can think of a chess pro­gram as a sim­ple de­ci­sion the­ory. If you feed it the cur­rent state of the board, it re­turns a move, which ad­vances the im­plicit goal of win­ning. The ac­tual de­tails of the de­ci­sion the­ory in­clude things like writ­ing out the tree of pos­si­ble moves and coun­ter­moves, and eval­u­at­ing which pos­si­bil­ities bring it closer to win­ning.

Another ex­am­ple is an E. Coli bac­terium. It has two ba­sic op­tions at ev­ery mo­ment: it can use its flag­ella to swim for­ward in a straight line, or to change di­rec­tions by ran­domly tum­bling. It can sense whether the con­cen­tra­tion of food or toxin is in­creas­ing or de­creas­ing over time, and so it ex­e­cutes a sim­ple al­gorithm that ran­domly changes di­rec­tion more of­ten when things are “get­ting worse”. This is enough con­trol for bac­te­ria to rapidly seek out food and flee from tox­ins, with­out need­ing any sort of ad­vanced in­for­ma­tion pro­cess­ing.

A hu­man be­ing is a much more com­pli­cated ex­am­ple which com­bines some as­pects of the two sim­pler ex­am­ples; we men­tally model con­se­quences in or­der to make many de­ci­sions, and we also fol­low heuris­tics that have evolved to work well with­out ex­plic­itly mod­el­ing the world.1 We can’t model any­thing quite like the com­pli­cated way that hu­man be­ings make de­ci­sions, but we can study sim­ple de­ci­sion the­o­ries on sim­ple prob­lems; and the re­sults of this anal­y­sis were of­ten more effec­tive than the raw in­tu­itions of hu­man be­ings (who evolved to suc­ceed in small sa­van­nah tribes, not ne­go­ti­ate a nu­clear arms race). But the stan­dard model used for this anal­y­sis, Causal De­ci­sion The­ory, has a se­ri­ous draw­back of its own, and the sug­gested re­place­ments are im­por­tant for a num­ber of things that Less Wrong read­ers might care about.

What is Causal De­ci­sion The­ory?

Causal de­ci­sion the­ory (CDT to all the cool kids) is a par­tic­u­lar class of de­ci­sion the­o­ries with some nice prop­er­ties. It’s straight­for­ward to state, has some nice math­e­mat­i­cal fea­tures, can be adapted to any util­ity func­tion, and gives good an­swers on many prob­lems. We’ll de­scribe how it works in a fairly sim­ple but gen­eral setup.

Let X be an agent who shares a world with some other agents (Y1 through Yn). All these agents are go­ing to pri­vately choose ac­tions and then perform them si­mul­ta­neously, and the ac­tions will have con­se­quences. (For in­stance, they could be play­ing a round of Di­plo­macy.)

We’ll as­sume that X has goals and val­ues rep­re­sented by a util­ity func­tion: for ev­ery con­se­quence C, there’s a num­ber U(C) rep­re­sent­ing just how much X prefers that out­come, and X views equal ex­pected util­ities with in­differ­ence: a 50% chance of util­ity 0 and 50% chance of util­ity 10 is no bet­ter or worse than 100% chance of util­ity 5, for in­stance. (If these as­sump­tions sound ar­tifi­cial, re­mem­ber that we’re try­ing to make this as math­e­mat­i­cally sim­ple as we can in or­der to an­a­lyze it. I don’t think it’s as ar­tifi­cial as it seems, but that’s a differ­ent topic.)

X wants to max­i­mize its ex­pected util­ity. If there were no other agents, this would be sim­ple: model the world, es­ti­mate how likely each con­se­quence is to hap­pen if it does this ac­tion or that, calcu­late the ex­pected util­ity of each ac­tion, then perform the ac­tion that re­sults in the high­est ex­pected util­ity. But if there are other agents around, the out­comes de­pend on their ac­tions as well as on X’s ac­tion, and if X treats that un­cer­tainty like nor­mal un­cer­tainty, then there might be an op­por­tu­nity for the Ys to ex­ploit X.

This is a Difficult Prob­lem in gen­eral; a full dis­cus­sion would in­volve Nash equil­ibria, but even that doesn’t fully set­tle the mat­ter- there can be more than one equil­ibrium! Also, X can some­times treat an­other agent as pre­dictable (like a fixed out­come or an or­di­nary ran­dom vari­able) and get away with it.

CDT is a class of de­ci­sion the­o­ries, not a spe­cific de­ci­sion the­ory, so it’s im­pos­si­ble to spec­ify with full gen­er­al­ity how X will de­cide if X is a causal de­ci­sion the­o­rist. But there is one key prop­erty that dis­t­in­guishes CDT from the de­ci­sion the­o­ries we’ll talk about later: a CDT agent as­sumes that X’s de­ci­sion is in­de­pen­dent from the si­mul­ta­neous de­ci­sions of the Ys- that is, X could de­cide one way or an­other and ev­ery­one else’s de­ci­sions would stay the same.

There­fore, there is at least one case where we can say what a CDT agent will do in a multi-player game: some strate­gies are dom­i­nated by oth­ers. For ex­am­ple, if X and Y are both de­cid­ing whether to walk to the zoo, and X will be hap­piest if X and Y both go, but X would still be hap­pier at the zoo than at home even if Y doesn’t come along, then X should go to the zoo re­gard­less of what Y does. (Pre­sum­ing that X’s util­ity func­tion is fo­cused on be­ing happy that af­ter­noon.) This crite­rion is enough to “solve” many prob­lems for a CDT agent, and in zero-sum two-player games the solu­tion can be shown to be op­ti­mal for X.

What’s the prob­lem with Causal De­ci­sion The­ory?

There are many sim­plifi­ca­tions and ab­strac­tions in­volved in CDT, but that as­sump­tion of in­de­pen­dence turns out to be key. In prac­tice, peo­ple put a lot of effort into pre­dict­ing what other peo­ple might de­cide, some­times with im­pres­sive ac­cu­racy, and then base their own de­ci­sions on that pre­dic­tion. This wrecks the in­de­pen­dence of de­ci­sions, and so it turns out that in a non-zero-sum game, it’s pos­si­ble to “beat” the out­come that CDT gets.

The clas­si­cal thought ex­per­i­ment in this con­text is called New­comb’s Prob­lem. X meets with a very smart and hon­est alien, Omega, that has the power to ac­cu­rately pre­dict what X would do in var­i­ous hy­po­thet­i­cal situ­a­tions. Omega pre­sents X with two boxes, a clear one con­tain­ing $1,000 and an opaque one con­tain­ing ei­ther $1,000,000 or noth­ing. Omega ex­plains that X can ei­ther take the opaque box (this is called one-box­ing) or both boxes (two-box­ing), but there’s a trick: Omega pre­dicted in ad­vance what X would do, and put $1,000,000 into the opaque box only if X was pre­dicted to one-box. (This is a lit­tle de­vi­ous, so take some time to pon­der it if you haven’t seen New­comb’s Prob­lem be­fore- or read here for a ful­ler ex­pla­na­tion.)

If X is a causal de­ci­sion the­o­rist, the choice is clear: what­ever Omega de­cided, it de­cided already, and whether the opaque box is full or empty, X is bet­ter off tak­ing both. (That is, two-box­ing is a dom­i­nant strat­egy over one-box­ing.) So X two-boxes, and walks away with $1,000 (since Omega eas­ily pre­dicted that this would hap­pen). Mean­while, X’s cousin Z (not a CDT) de­cides to one-box, and finds the box full with $1,000,000. So it cer­tainly seems that one could do bet­ter than CDT in this case.

But is this a fair prob­lem? After all, we can always come up with prob­lems that trick the ra­tio­nal agent into mak­ing the wrong choice, while a dumber agent lucks into the right one. Hav­ing a very pow­er­ful pre­dic­tor around might seem ar­tifi­cial, al­though the prob­lem might look much the same if Omega had a 90% suc­cess rate rather than 100%. One rea­son that this is a fair prob­lem is that the out­come de­pends only on what ac­tion X is simu­lated to take, not on what pro­cess pro­duced the de­ci­sion.

Be­sides, we can see the same be­hav­ior in an­other fa­mous game the­ory prob­lem: the Pri­soner’s Dilemma. X and Y are col­lab­o­rat­ing on a pro­ject, but they have differ­ent goals for it, and ei­ther one has the op­por­tu­nity to achieve their goal a lit­tle bet­ter at the cost of sig­nifi­cantly im­ped­ing their part­ner’s goal. (The op­tions are called co­op­er­a­tion and defec­tion.) If they both co­op­er­ate, they get a util­ity of +50 each; if X co­op­er­ates and Y defects, then X winds up at +10 but Y gets +70, and vice versa; but if they both defect, then both wind up at +30 each.2

If X is a CDT agent, then defect­ing dom­i­nates co­op­er­at­ing as a strat­egy, so X will always defect in the Pri­soner’s Dilemma (as long as there are no fur­ther ram­ifi­ca­tions; the Iter­ated Pri­soner’s Dilemma can be differ­ent, be­cause X’s cur­rent de­ci­sion can in­fluence Y’s fu­ture de­ci­sions). Even if you know­ingly pair up X with a copy of it­self (with a differ­ent goal but the same de­ci­sion the­ory), it will defect even though it could prove that the two de­ci­sions will be iden­ti­cal.

Mean­while, its cousin Z also plays the Pri­soner’s Dilemma: Z co­op­er­ates when it’s fac­ing an agent that has the same de­ci­sion the­ory, and defects oth­er­wise. This is a strictly bet­ter out­come than X gets. (Z isn’t op­ti­mal, though; I’m just show­ing that you can find a strict im­prove­ment on X.)3

What de­ci­sion the­o­ries are bet­ter than CDT?

I re­al­ize this post is pretty long already, but it’s way too short to out­line the ad­vanced de­ci­sion the­o­ries that have been pro­posed and de­vel­oped re­cently by a num­ber of peo­ple (in­clud­ing Eliezer, Gary Drescher, Wei Dai, Vladimir Nesov and Vladimir Slep­nev). In­stead, I’ll list the fea­tures that we would want an ad­vanced de­ci­sion the­ory to have:

  1. The de­ci­sion the­ory should be for­mal­iz­able at least as well as CDT is.

  2. The de­ci­sion the­ory should give an­swers that are at least as good as CDT’s an­swers. In par­tic­u­lar, it should always get the right an­swer in 1-player games and find a Nash equil­ibrium in zero-sum two-player games (when the other player is also able to do so).

  3. The de­ci­sion the­ory should strictly out­perform CDT on the Pri­soner’s Dilemma- it should elicit mu­tual co­op­er­a­tion in the Pri­soner’s Dilemma from some agents that CDT elic­its mu­tual defec­tion from, it shouldn’t co­op­er­ate when its part­ner defects, and (ar­guably) it should defect if its part­ner would co­op­er­ate re­gard­less.

  4. The de­ci­sion the­ory should one-box on New­comb’s Prob­lem.

  5. The de­ci­sion the­ory should be rea­son­ably sim­ple, and not in­clude a bunch of ad-hoc rules. We want to solve prob­lems in­volv­ing pre­dic­tion of ac­tions in gen­eral, not just the spe­cial cases.

There are now a cou­ple of can­di­date de­ci­sion the­o­ries (Time­less De­ci­sion The­ory, Up­date­less De­ci­sion The­ory, and Am­bi­ent De­ci­sion The­ory) which seem to meet these crite­ria. In­ter­est­ingly, for­mal­iz­ing any of these tends to deeply in­volve the math­e­mat­ics of self-refer­ence (Gödel’s The­o­rem and Löb’s The­o­rem) in or­der to avoid the in­finite regress in­her­ent in simu­lat­ing an agent that’s simu­lat­ing you.

But for the time be­ing, we can mas­sively over­sim­plify and out­line them. TDT con­sid­ers your ul­ti­mate de­ci­sion as the cause of both your ac­tion and other agents’ valid pre­dic­tions of your ac­tion, and tries to pick the de­ci­sion that works best un­der that model. ADT uses a kind of di­ag­o­nal­iza­tion to pre­dict the effects of differ­ent de­ci­sions with­out hav­ing the fi­nal de­ci­sion throw off the pre­dic­tion. And UDT con­sid­ers the de­ci­sion that would be the best policy for all pos­si­ble ver­sions of you to em­ploy, on av­er­age.

Why are ad­vanced de­ci­sion the­o­ries im­por­tant for Less Wrong?

There are a few rea­sons. Firstly, there are those who think that ad­vanced de­ci­sion the­o­ries are a nat­u­ral base on which to build AI. One rea­son for this is some­thing I briefly men­tioned: even CDT al­lows for the idea that X’s cur­rent de­ci­sions can af­fect Y’s fu­ture de­ci­sions, and self-mod­ifi­ca­tion counts as a de­ci­sion. If X can self-mod­ify, and if X ex­pects to deal with situ­a­tions where an ad­vanced de­ci­sion the­ory would out-perform its cur­rent self, then X will change it­self into an ad­vanced de­ci­sion the­ory (with some weird caveats: for ex­am­ple, if X started out as CDT, its mod­ifi­ca­tion will only care about other agents’ de­ci­sions made af­ter X self-mod­ified).

More rele­vantly to ra­tio­nal­ists, the bad choices that CDT makes are of­ten held up as ex­am­ples of why you shouldn’t try to be ra­tio­nal, or why ra­tio­nal­ists can’t co­op­er­ate. But in­stru­men­tal ra­tio­nal­ity doesn’t need to be syn­ony­mous with causal de­ci­sion the­ory: if there are other de­ci­sion the­o­ries that do strictly bet­ter, we should adopt those rather than CDT! So figur­ing out ad­vanced de­ci­sion the­o­ries, even if we can’t im­ple­ment them on real-world prob­lems, helps us see that the ideal of ra­tio­nal­ity isn’t go­ing to fall flat on its face.

Fi­nally, ad­vanced de­ci­sion the­ory could be rele­vant to moral­ity. If, as many of us sus­pect, there’s no ba­sis for hu­man moral­ity apart from what goes on in hu­man brains, then why do we feel there’s still a dis­tinc­tion be­tween what-we-want and what-is-right? One an­swer is that if we feed in what-we-want into an ad­vanced de­ci­sion the­ory, then just as co­op­er­a­tion emerges in the Pri­soner’s Dilemma, many kinds of pat­terns that we take as ba­sic moral rules emerge as the equil­ibrium be­hav­ior. The idea is de­vel­oped more sub­stan­tially in Gary Drescher’s Good and Real, and (be­fore there was a can­di­date for an ad­vanced de­ci­sion the­ory) in Dou­glas Hofs­tadter’s con­cept of su­per­ra­tional­ity.

It’s still at the spec­u­la­tive stage, be­cause it’s difficult to work out what in­ter­ac­tions be­tween agents with ad­vanced de­ci­sion the­o­ries would look like (in par­tic­u­lar, we don’t know whether bar­gain­ing would end in a fair split or in a Xanatos Gam­bit Pileup of chicken threats, though we think and hope it’s the former). But it’s at least a promis­ing ap­proach to the slip­pery ques­tion of what ‘right’ could ac­tu­ally mean.

And if you want to un­der­stand this on a slightly more tech­ni­cal level… well, I’ve started a se­quence.

Next: A Semi-For­mal Anal­y­sis, Part I (The Prob­lem with Naive De­ci­sion The­ory)


0. Rather con­fus­ingly, de­ci­sion the­ory is the name for the study of de­ci­sion the­o­ries.

1. Both pat­terns ap­pear in our con­scious rea­son­ing as well as our sub­con­scious think­ing- we care about con­se­quences we can di­rectly fore­see and also about moral rules that don’t seem at­tached to any par­tic­u­lar con­se­quence. How­ever, just as the sim­ple “pro­gram” for the bac­terium was con­structed by evolu­tion, our moral rules are there for evolu­tion­ary rea­sons as well- per­haps even for rea­sons that have to do with ad­vanced de­ci­sion the­ory...

Also, it’s worth not­ing that we’re not con­sciously aware of all of our val­ues and goals, though at least we have a bet­ter idea of them than E.Coli does. This is a prob­lem for the idea of rep­re­sent­ing our usual de­ci­sions in terms of de­ci­sion the­ory, though we can still hope that our ap­prox­i­ma­tions are good enough (e.g. that our real val­ues re­gard­ing the Cold War roughly cor­re­sponded to our es­ti­mates of how bad a nu­clear war or a Soviet world takeover would be).

2. Eliezer once pointed out that our in­tu­itions on most for­mu­la­tions of the Pri­soner’s Dilemma are skewed by our no­tions of fair­ness, and a more out­landish ex­am­ple might serve bet­ter to illus­trate how a gen­uine PD re­ally feels. For an ex­am­ple where peo­ple are no­to­ri­ous for not car­ing about each oth­ers’ goals, let’s con­sider aes­thet­ics: peo­ple who love one form of mu­sic of­ten re­ally feel that an­other pop­u­lar form is a waste of time. One might feel that if the works of Artist Q sud­denly dis­ap­peared from the world, it would ob­jec­tively be a tragedy; while if the same hap­pened to the works of Artist R, then it’s no big deal and R’s fans should be glad to be freed from that dreck.

We can use this aes­thetic in­tol­er­ance to con­struct a more gen­uine Pri­soner’s Dilemma with­out invit­ing aliens or any­thing like that. Say X is a writer and Y is an illus­tra­tor, and they have very differ­ent prefer­ences for how a cer­tain scene should come across, so they’ve worked out a com­pro­mise. Now, both of them could co­op­er­ate and get a scene that both are OK with, or X could se­cretly change the di­alogue in hopes of get­ting his idea to come across, or Y could draw the scene differ­ently in or­der to get her idea of the scene across. But if they both “defect” from the com­pro­mise, then the scene gets con­fus­ing to read­ers. If both X and Y pre­fer their own idea to the com­pro­mise, pre­fer the com­pro­mise to the mud­dle, and pre­fer the mud­dle to their part­ner’s idea, then this is a gen­uine Pri­soner’s Dilemma.

3. I’ve avoided men­tion­ing Ev­i­den­tial De­ci­sion The­ory, the “usual” coun­ter­part to CDT; it’s worth not­ing that EDT one-boxes on New­comb’s Prob­lem but gives the wrong an­swer on a clas­si­cal one-player prob­lem (The Smok­ing Le­sion) which the ad­vanced de­ci­sion the­o­ries han­dle cor­rectly. It’s also far less amenable to for­mal­iza­tion than the oth­ers.