Decision Theory is multifaceted

Re­lated: Con­cep­tual Prob­lems with UDT and Policy Selec­tion, For­mal­is­ing de­ci­sion the­ory is hard


Any­one who is in­ter­ested in de­ci­sion the­ory. The post is pretty gen­eral and not re­ally tech­ni­cal; some fa­mil­iar­ity with coun­ter­fac­tual mug­ging can be use­ful, but over­all the re­quired back­ground knowl­edge is not much.


The post de­vel­ops the claim that iden­ti­fy­ing the cor­rect solu­tion to some de­ci­sion prob­lems might be in­tri­cate, if not im­pos­si­ble, when cer­tain de­tails about the spe­cific sce­nario are not given. First I show that, in coun­ter­fac­tual mug­ging, some im­por­tant el­e­ments in the prob­lem de­scrip­tion and in a pos­si­ble for­mal­i­sa­tion are ac­tu­ally un­der­speci­fied. Next I de­scribe is­sues re­lated to the con­cept of perfect pre­dic­tion and briefly dis­cuss whether they ap­ply to other de­ci­sion sce­nar­ios in­volv­ing pre­dic­tors. Then I pre­sent some ad­van­tages and dis­ad­van­tages of the for­mal­i­sa­tion of agents as com­puter pro­grams. A sum­mary with bul­let points con­cludes.

Miss­ing parts of a “cor­rect” solution

I fo­cus on the ver­sion of the prob­lem with cards and two hu­mans since, to me, it feels more grounded in re­al­ity—a game that could ac­tu­ally be played—but what I say ap­plies also to the ver­sion with a coin toss and Omega.

What makes the prob­lem in­ter­est­ing is the con­flict be­tween these two in­tu­itions:

  • Be­fore Player A looks at the card, the best strat­egy seems to never show the card, be­cause it is the strat­egy that makes Player A lose the least in ex­pec­ta­tion, given the un­cer­tainty about the value of the card (50/​50 high or low)

  • After Player A sees a low card, show­ing it seems a re­ally good idea, be­cause that ac­tion gives Player A a loss of 0, which is the best pos­si­ble re­sult con­sid­er­ing that the game is played only once and never again. Thus, the in­cen­tive to not re­veal the card seems to dis­ap­pear af­ter Player A knows that the card is low.

[In the other ver­sion, the con­flict is be­tween pay­ing be­fore the coin toss and re­fus­ing to pay af­ter know­ing the coin landed tails.]

One at­tempt at for­mal­is­ing the prob­lem is to rep­re­sent it as a tree (a for­mal­i­sa­tion similar to the fol­low­ing one is con­sid­ered here). The root is a 5050 chance node rep­re­sent­ing the pos­si­ble val­ues of the card. Then Player A chooses be­tween show­ing and not show­ing the card; each ac­tion leads to a leaf with a value which in­di­cates the loss for Player A. The pe­cu­liar­ity of coun­ter­fac­tual mug­ging is that some pay­offs de­pend on ac­tions taken in a differ­ent sub­tree.

[The tree of the other ver­sion is a bit differ­ent since the player has a choice only when the coin lands tails; any­way, the pay­off in the heads case is “pe­cu­liar” in the same sense of the card ver­sion, since it de­pends on the ac­tion taken when the coin lands tails.]

With this rep­re­sen­ta­tion, it is easy to see that we can as­sign an ex­pected value (EV) to each de­ter­minis­tic policy available to the player: we start from the root of the tree, then we fol­low the path pre­scribed by the policy un­til we reach a pay­off, which is as­signed a weight ac­cord­ing to the chance nodes that we’ve run into.

There­fore it is pos­si­ble to or­der the poli­cies ac­cord­ing to their ex­pected val­ues and de­ter­mine which one gives the low­est ex­pected loss [or, in the other ver­sion, the high­est EV] re­spect to the root of the tree. This is the for­mal­ism be­hind the first of the two in­tu­itions pre­sented be­fore.

On the other hand, one could ob­ject that it is far from triv­ial that the cor­rect thing to do is to min­imise ex­pected loss from the root of the tree. In fact, in the origi­nal prob­lem state­ment, the card is low [tails], so the rele­vance of the pay­offs in the other sub­tree—where the card is high [heads]—is not clear and the fo­cus should be on the de­ci­sion node with the low card, not on the root of the tree. This is the for­mal­ism be­hind the sec­ond in­tu­ition.

Even though the ob­jec­tion re­lated to the sec­ond in­tu­ition sounds rea­son­able, I think one could point to other, more im­por­tant is­sues un­der­ly­ing the prob­lem state­ment and for­mal­i­sa­tion. Why is there a root in the first place and what does it rep­re­sent? What do we mean when we say that we min­imise loss “from the start”?

Th­ese ques­tions are more com­pli­cated than they seem: let me elab­o­rate on them. Sup­pose that the ad­vice of max­imis­ing EV “from the start” is gen­er­ally cor­rect from a de­ci­sion the­ory point of view. It is not clear how we should ap­ply that ad­vice in or­der to make cor­rect de­ci­sions as hu­mans, or to cre­ate an AI that makes cor­rect de­ci­sions. Should we max­imise value...

  1. ...from the in­stant in which we are “mak­ing the de­ci­sion”? This seems to bring us back to the sec­ond in­tu­ition, where we want to show the card once we’ve seen it is low.

  2. ...from our first con­scious mo­ment, or from when we started col­lect­ing data about the world, or maybe from the mo­ment which the first data point in our mem­ory is about? In the case of an AI, this would cor­re­spond to the mo­ment of the “cre­ation” of the AI, what­ever that means, or maybe to the first in­stant which the data we put into the AI points to.

  3. ...from the very first mo­ment since the be­gin­ning of space-time? After all, the uni­verse we are ob­serv­ing could be one pos­si­ble out­come of a ran­dom pro­cess, analo­gous to the 5050 high/​low card [or the coin toss].

Re­gard­ing point 1, I’ve men­tioned the sec­ond in­tu­ition, but other in­ter­pre­ta­tions could be closer to the first in­tu­ition in­stead. The root could rep­re­sent the mo­ment in which we set­tle our policy, and this is what we would mean with “mak­ing the de­ci­sion”.

Then, how­ever, other ques­tions should be an­swered about policy se­lec­tion. Why and when should we change policy? If se­lect­ing a policy is what con­sti­tutes a de­ci­sion, what ex­actly is the role of ac­tions, or how is chang­ing policy fun­da­men­tally differ­ent from other ac­tions? It seems we are treat­ing poli­cies and ac­tions as con­cepts be­long­ing to two differ­ent lev­els in a hi­er­ar­chy: if this is a cor­rect model, it is not clear to me why we do not use fur­ther lev­els, or why we need two differ­ent lev­els, es­pe­cially when think­ing in terms of em­bed­ded agency.

Note that giv­ing pre­cise an­swers to the ques­tions in the pre­vi­ous para­graph could help us find a crite­rion to dis­t­in­guish fair prob­lems from un­fair ones, which would be use­ful to com­pare the perfor­mance of differ­ent de­ci­sion the­o­ries, as pointed out in the con­clu­sion of the pa­per on FDT. Con­sid­er­ing fair all the prob­lems in which the out­come de­pends only on the agent’s be­hav­ior in the dilemma at hand (p.29) is not a satis­fac­tory crite­rion when all the is­sues out­lined be­fore are taken into ac­count: the lack of clar­ity about the role of root, de­ci­sion nodes, poli­cies and ac­tions makes the “bor­ders” of a de­ci­sion prob­lem blurred, and leaves the agent’s be­havi­our as an un­der­speci­fied con­cept.

More­over, re­solv­ing the am­bi­gui­ties in the ex­pres­sion “from the start” could also ex­plain why it seems difficult to ap­ply up­date­less­ness to game the­ory (see the sec­tions “Two Ways UDT Hasn’t Gen­er­al­ized” and “What UDT Wants”).


A weird sce­nario with perfect pre­dic­tion

So far, we’ve rea­soned as if Player B—who de­ter­mines the loss of Player A by choos­ing the value of that best rep­re­sents his be­lief that the card is high—can perfectly guess the strat­egy that Player A adopts. Analo­gously, in the ver­sion with the coin toss, Omega is ca­pa­ble of perfectly pre­dict­ing what the de­ci­sion maker does when the coin lands tails, be­cause that in­for­ma­tion is nec­es­sary to de­ter­mine the pay­off in case the coin lands heads.

How­ever, I think that also the con­cept of perfect pre­dic­tion de­serves fur­ther in­ves­ti­ga­tion: not be­cause it is an im­plau­si­ble ideal­i­sa­tion of a highly ac­cu­rate pre­dic­tion, but be­cause it can lead to strange con­clu­sions, if not down­right con­tra­dic­tions, even in very sim­ple set­tings.

Con­sider a hu­man that is go­ing to choose only one be­tween two op­tions: M or N. Be­fore the choice, a perfect pre­dic­tor analy­ses the hu­man and writes the let­ter (M or N) cor­re­spond­ing to the pre­dicted choice on a piece of pa­per, which is given to the hu­man. Now, what ex­actly pre­vents the hu­man from read­ing the piece of pa­per and choos­ing the other op­tion in­stead?

From a slightly differ­ent per­spec­tive: as­sume there ex­ists a hu­man, fac­ing a de­ci­sion be­tween M and N, who is ca­pa­ble of read­ing a piece of pa­per con­tain­ing only one let­ter, M or N, and choos­ing the op­po­site—seems quite a weak as­sump­tion. Is a “perfect pre­dic­tor” that writes the pre­dicted op­tion on a piece of pa­per and gives it to the hu­man… always wrong?

Note that al­low­ing prob­a­bil­ities doesn’t help: a hu­man ca­pa­ble of always choos­ing M when read­ing a pre­dic­tion like “prob­a­bil­ity p of choos­ing M, prob­a­bil­ity 1-p of choos­ing N” seems as plau­si­ble as the pre­vi­ous hu­man, but again would make the pre­dic­tion always wrong.

Other predictions

Un­like the pre­vi­ous ex­am­ple, New­comb’s and other prob­lems in­volve de­ci­sion mak­ers who are not told about the pre­dic­tion out­come. How­ever, the differ­ence might not be as clear-cut as it first ap­pears. If the de­ci­sion maker re­gards some in­for­ma­tion—maybe el­e­ments of the de­liber­a­tion pro­cess it­self—as ev­i­dence about the im­mi­nent choice, the DM will also have in­for­ma­tion about the pre­dic­tion out­come, since the pre­dic­tor is known to be re­li­able. To what ex­tent is this in­for­ma­tion about the pre­dic­tion out­come differ­ent from the piece of pa­per in the pre­vi­ous ex­am­ple? What ex­actly can be con­sid­ered ev­i­dence about one’s own fu­ture choices? The an­swer seems to be re­lated to the de­tails of the pre­dic­tion pro­cess and how it is car­ried out.

It may be use­ful to con­sider how a pre­dic­tion is im­ple­mented as a spe­cific pro­gram. In this pa­per by Critch, the al­gorithm plays the pris­oner’s dilemma by co­op­er­at­ing if it suc­cess­fully pre­dicts that the op­po­nent will co­op­er­ate, and defect­ing oth­er­wise. Here the “pre­dic­tion” con­sists in a search for proofs, up to a cer­tain length, that the other al­gorithm out­puts Co­op­er­ate when given as in­put. Thanks to a bounded ver­sion of Löb’s the­o­rem, this spe­cific pre­dic­tion im­ple­men­ta­tion al­lows to co­op­er­ate when play­ing against it­self.

Re­sults of this kind (open-source game the­ory /​ pro­gram equil­ibrium) could be es­pe­cially rele­vant in a fu­ture in which im­por­tant policy choices are made by AIs that in­ter­act with each other. Note, how­ever, that no claim is made about the ra­tio­nal­ity of ‘s over­all be­havi­our—it is de­bat­able whether ’s de­ci­sion to co­op­er­ate against a pro­gram that always co­op­er­ates is cor­rect.

More­over, see­ing de­ci­sion mak­ers as pro­grams can be con­fus­ing and less pre­cise than one would in­tu­itively think, be­cause it is still un­clear how to prop­erly for­mal­ise con­cepts such as ac­tion, policy and de­ci­sion-mak­ing pro­ce­dure, as dis­cussed pre­vi­ously. If ac­tions in cer­tain situ­a­tions cor­re­spond to pro­gram out­puts given cer­tain in­puts, does policy se­lec­tion cor­re­spond to pro­gram se­lec­tion? If so, why is policy se­lec­tion not an ac­tion like the other ones? And—re­lated to what I said be­fore about us­ing a hi­er­ar­chy of ex­actly two lev­els—why don’t we also “se­lect” the code frag­ment that does policy se­lec­tion?

In gen­eral, ap­proaches that use some kind of for­mal­ism tend to be more pre­cise than purely philo­soph­i­cal ap­proaches, but there are some dis­ad­van­tages as well. Fo­cus­ing on low-level de­tails can make us lose sight of the big­ger pic­ture and limit lat­eral think­ing, which can be a great source of in­sight for find­ing al­ter­na­tive solu­tions in cer­tain situ­a­tions. In a black­mail sce­nario, be­sides the de­ci­sion to pay or not, we could con­sider what fac­tors caused the leak­age of sen­si­ble in­for­ma­tion, or the ex­po­sure of some­thing we care about, to ad­ver­sar­ial agents. Another ex­am­ple: in a pris­oner’s dilemma, the equil­ibrium can shift to mu­tual co­op­er­a­tion thanks to the in­ter­ven­tion of an ex­ter­nal ac­tor that makes the pay­offs for defec­tion worse (the chap­ter on game the­ory in Al­gorithms to Live By gives a nice pre­sen­ta­tion of this equil­ibrium shift and re­lated con­cepts).

We may also take into ac­count that, for effi­ciency rea­sons, pre­dic­tions in prac­tice might be made with meth­ods differ­ent from close-to-perfect phys­i­cal or al­gorith­mic simu­la­tion, and the spe­cific method used could be rele­vant for an ac­cu­rate anal­y­sis of the situ­a­tion, as men­tioned be­fore. In the case of hu­man in­ter­ac­tion, some­times it is pos­si­ble to in­fer some­thing about one’s fu­ture ac­tions by read­ing fa­cial ex­pres­sions; but this also means that a pre­dic­tor can be tricked if one is ca­pa­ble of mask­ing their own in­ten­tions by keep­ing a poker face.


  • The claim that a cer­tain de­ci­sion is cor­rect be­cause it max­imises util­ity may re­quire fur­ther ex­pla­na­tion, since ev­ery de­ci­sion prob­lem sits in a con­text which might not be fully cap­tured in the prob­lem for­mal­i­sa­tion.

  • Perfect pre­dic­tion leads to seem­ingly para­dox­i­cal situ­a­tions. It is un­clear whether these prob­lems un­der­lie other sce­nar­ios in­volv­ing pre­dic­tion. This does not mean the con­cept must be re­jected; but our cur­rent un­der­stand­ing of pre­dic­tion might lack crit­i­cal de­tails. Cer­tain prob­lems may re­quire clar­ifi­ca­tion of how the pre­dic­tion is made be­fore a solu­tion is claimed as cor­rect.

  • The use of pre­cise math­e­mat­i­cal for­mal­ism can re­solve some am­bi­gui­ties. At the same time, in­ter­est­ing solu­tions to cer­tain situ­a­tions may lie “out­side” the origi­nal prob­lem state­ment.

Thanks to Abram Dem­ski, Wolf­gang Sch­warz and Cas­par Oester­held for ex­ten­sive feed­back.

This work was sup­ported by CEEALAR.



There are bi­ases in fa­vor of the there-is-always-a-cor­rect-solu­tion frame­work. Un­cov­er­ing the right solu­tion in de­ci­sion prob­lems can be fun, and find­ing the De­ci­sion The­ory to solve them all can be ap­peal­ing.

On “wrong” solutions

Many of the rea­sons pro­vided in this post ex­plain also why it’s tricky to de­ter­mine what a cer­tain de­ci­sion the­ory does in a prob­lem, and if the given solu­tion is wrong. But I want to provide an­other rea­son, namely the fol­low­ing in­for­mal...

Con­jec­ture: for any de­ci­sion prob­lem that you be­lieve CDT/​EDT gets wrong, there ex­ists a pa­per or book in which a par­tic­u­lar ver­sion of CDT/​EDT gives the solu­tion that you be­lieve is cor­rect, and/​or a pa­per or book that ar­gues that the solu­tion you be­lieve is cor­rect is ac­tu­ally wrong.

Here’s an ex­am­ple about New­comb’s prob­lem.