# Decision Theory is multifaceted

## Target

Any­one who is in­ter­ested in de­ci­sion the­ory. The post is pretty gen­eral and not re­ally tech­ni­cal; some fa­mil­iar­ity with coun­ter­fac­tual mug­ging can be use­ful, but over­all the re­quired back­ground knowl­edge is not much.

# Outline

The post de­vel­ops the claim that iden­ti­fy­ing the cor­rect solu­tion to some de­ci­sion prob­lems might be in­tri­cate, if not im­pos­si­ble, when cer­tain de­tails about the spe­cific sce­nario are not given. First I show that, in coun­ter­fac­tual mug­ging, some im­por­tant el­e­ments in the prob­lem de­scrip­tion and in a pos­si­ble for­mal­i­sa­tion are ac­tu­ally un­der­speci­fied. Next I de­scribe is­sues re­lated to the con­cept of perfect pre­dic­tion and briefly dis­cuss whether they ap­ply to other de­ci­sion sce­nar­ios in­volv­ing pre­dic­tors. Then I pre­sent some ad­van­tages and dis­ad­van­tages of the for­mal­i­sa­tion of agents as com­puter pro­grams. A sum­mary with bul­let points con­cludes.

# Miss­ing parts of a “cor­rect” solution

I fo­cus on the ver­sion of the prob­lem with cards and two hu­mans since, to me, it feels more grounded in re­al­ity—a game that could ac­tu­ally be played—but what I say ap­plies also to the ver­sion with a coin toss and Omega.

What makes the prob­lem in­ter­est­ing is the con­flict be­tween these two in­tu­itions:

• Be­fore Player A looks at the card, the best strat­egy seems to never show the card, be­cause it is the strat­egy that makes Player A lose the least in ex­pec­ta­tion, given the un­cer­tainty about the value of the card (50/​50 high or low)

• After Player A sees a low card, show­ing it seems a re­ally good idea, be­cause that ac­tion gives Player A a loss of 0, which is the best pos­si­ble re­sult con­sid­er­ing that the game is played only once and never again. Thus, the in­cen­tive to not re­veal the card seems to dis­ap­pear af­ter Player A knows that the card is low.

[In the other ver­sion, the con­flict is be­tween pay­ing be­fore the coin toss and re­fus­ing to pay af­ter know­ing the coin landed tails.]

One at­tempt at for­mal­is­ing the prob­lem is to rep­re­sent it as a tree (a for­mal­i­sa­tion similar to the fol­low­ing one is con­sid­ered here). The root is a 5050 chance node rep­re­sent­ing the pos­si­ble val­ues of the card. Then Player A chooses be­tween show­ing and not show­ing the card; each ac­tion leads to a leaf with a value which in­di­cates the loss for Player A. The pe­cu­liar­ity of coun­ter­fac­tual mug­ging is that some pay­offs de­pend on ac­tions taken in a differ­ent sub­tree.

[The tree of the other ver­sion is a bit differ­ent since the player has a choice only when the coin lands tails; any­way, the pay­off in the heads case is “pe­cu­liar” in the same sense of the card ver­sion, since it de­pends on the ac­tion taken when the coin lands tails.]

With this rep­re­sen­ta­tion, it is easy to see that we can as­sign an ex­pected value (EV) to each de­ter­minis­tic policy available to the player: we start from the root of the tree, then we fol­low the path pre­scribed by the policy un­til we reach a pay­off, which is as­signed a weight ac­cord­ing to the chance nodes that we’ve run into.

There­fore it is pos­si­ble to or­der the poli­cies ac­cord­ing to their ex­pected val­ues and de­ter­mine which one gives the low­est ex­pected loss [or, in the other ver­sion, the high­est EV] re­spect to the root of the tree. This is the for­mal­ism be­hind the first of the two in­tu­itions pre­sented be­fore.

On the other hand, one could ob­ject that it is far from triv­ial that the cor­rect thing to do is to min­imise ex­pected loss from the root of the tree. In fact, in the origi­nal prob­lem state­ment, the card is low [tails], so the rele­vance of the pay­offs in the other sub­tree—where the card is high [heads]—is not clear and the fo­cus should be on the de­ci­sion node with the low card, not on the root of the tree. This is the for­mal­ism be­hind the sec­ond in­tu­ition.

Even though the ob­jec­tion re­lated to the sec­ond in­tu­ition sounds rea­son­able, I think one could point to other, more im­por­tant is­sues un­der­ly­ing the prob­lem state­ment and for­mal­i­sa­tion. Why is there a root in the first place and what does it rep­re­sent? What do we mean when we say that we min­imise loss “from the start”?

Th­ese ques­tions are more com­pli­cated than they seem: let me elab­o­rate on them. Sup­pose that the ad­vice of max­imis­ing EV “from the start” is gen­er­ally cor­rect from a de­ci­sion the­ory point of view. It is not clear how we should ap­ply that ad­vice in or­der to make cor­rect de­ci­sions as hu­mans, or to cre­ate an AI that makes cor­rect de­ci­sions. Should we max­imise value...

1. ...from the in­stant in which we are “mak­ing the de­ci­sion”? This seems to bring us back to the sec­ond in­tu­ition, where we want to show the card once we’ve seen it is low.

2. ...from our first con­scious mo­ment, or from when we started col­lect­ing data about the world, or maybe from the mo­ment which the first data point in our mem­ory is about? In the case of an AI, this would cor­re­spond to the mo­ment of the “cre­ation” of the AI, what­ever that means, or maybe to the first in­stant which the data we put into the AI points to.

3. ...from the very first mo­ment since the be­gin­ning of space-time? After all, the uni­verse we are ob­serv­ing could be one pos­si­ble out­come of a ran­dom pro­cess, analo­gous to the 5050 high/​low card [or the coin toss].

Re­gard­ing point 1, I’ve men­tioned the sec­ond in­tu­ition, but other in­ter­pre­ta­tions could be closer to the first in­tu­ition in­stead. The root could rep­re­sent the mo­ment in which we set­tle our policy, and this is what we would mean with “mak­ing the de­ci­sion”.

Then, how­ever, other ques­tions should be an­swered about policy se­lec­tion. Why and when should we change policy? If se­lect­ing a policy is what con­sti­tutes a de­ci­sion, what ex­actly is the role of ac­tions, or how is chang­ing policy fun­da­men­tally differ­ent from other ac­tions? It seems we are treat­ing poli­cies and ac­tions as con­cepts be­long­ing to two differ­ent lev­els in a hi­er­ar­chy: if this is a cor­rect model, it is not clear to me why we do not use fur­ther lev­els, or why we need two differ­ent lev­els, es­pe­cially when think­ing in terms of em­bed­ded agency.

Note that giv­ing pre­cise an­swers to the ques­tions in the pre­vi­ous para­graph could help us find a crite­rion to dis­t­in­guish fair prob­lems from un­fair ones, which would be use­ful to com­pare the perfor­mance of differ­ent de­ci­sion the­o­ries, as pointed out in the con­clu­sion of the pa­per on FDT. Con­sid­er­ing fair all the prob­lems in which the out­come de­pends only on the agent’s be­hav­ior in the dilemma at hand (p.29) is not a satis­fac­tory crite­rion when all the is­sues out­lined be­fore are taken into ac­count: the lack of clar­ity about the role of root, de­ci­sion nodes, poli­cies and ac­tions makes the “bor­ders” of a de­ci­sion prob­lem blurred, and leaves the agent’s be­havi­our as an un­der­speci­fied con­cept.

More­over, re­solv­ing the am­bi­gui­ties in the ex­pres­sion “from the start” could also ex­plain why it seems difficult to ap­ply up­date­less­ness to game the­ory (see the sec­tions “Two Ways UDT Hasn’t Gen­er­al­ized” and “What UDT Wants”).

# Predictors

## A weird sce­nario with perfect pre­dic­tion

So far, we’ve rea­soned as if Player B—who de­ter­mines the loss of Player A by choos­ing the value of that best rep­re­sents his be­lief that the card is high—can perfectly guess the strat­egy that Player A adopts. Analo­gously, in the ver­sion with the coin toss, Omega is ca­pa­ble of perfectly pre­dict­ing what the de­ci­sion maker does when the coin lands tails, be­cause that in­for­ma­tion is nec­es­sary to de­ter­mine the pay­off in case the coin lands heads.

How­ever, I think that also the con­cept of perfect pre­dic­tion de­serves fur­ther in­ves­ti­ga­tion: not be­cause it is an im­plau­si­ble ideal­i­sa­tion of a highly ac­cu­rate pre­dic­tion, but be­cause it can lead to strange con­clu­sions, if not down­right con­tra­dic­tions, even in very sim­ple set­tings.

Con­sider a hu­man that is go­ing to choose only one be­tween two op­tions: M or N. Be­fore the choice, a perfect pre­dic­tor analy­ses the hu­man and writes the let­ter (M or N) cor­re­spond­ing to the pre­dicted choice on a piece of pa­per, which is given to the hu­man. Now, what ex­actly pre­vents the hu­man from read­ing the piece of pa­per and choos­ing the other op­tion in­stead?

From a slightly differ­ent per­spec­tive: as­sume there ex­ists a hu­man, fac­ing a de­ci­sion be­tween M and N, who is ca­pa­ble of read­ing a piece of pa­per con­tain­ing only one let­ter, M or N, and choos­ing the op­po­site—seems quite a weak as­sump­tion. Is a “perfect pre­dic­tor” that writes the pre­dicted op­tion on a piece of pa­per and gives it to the hu­man… always wrong?

Note that al­low­ing prob­a­bil­ities doesn’t help: a hu­man ca­pa­ble of always choos­ing M when read­ing a pre­dic­tion like “prob­a­bil­ity p of choos­ing M, prob­a­bil­ity 1-p of choos­ing N” seems as plau­si­ble as the pre­vi­ous hu­man, but again would make the pre­dic­tion always wrong.

## Other predictions

Un­like the pre­vi­ous ex­am­ple, New­comb’s and other prob­lems in­volve de­ci­sion mak­ers who are not told about the pre­dic­tion out­come. How­ever, the differ­ence might not be as clear-cut as it first ap­pears. If the de­ci­sion maker re­gards some in­for­ma­tion—maybe el­e­ments of the de­liber­a­tion pro­cess it­self—as ev­i­dence about the im­mi­nent choice, the DM will also have in­for­ma­tion about the pre­dic­tion out­come, since the pre­dic­tor is known to be re­li­able. To what ex­tent is this in­for­ma­tion about the pre­dic­tion out­come differ­ent from the piece of pa­per in the pre­vi­ous ex­am­ple? What ex­actly can be con­sid­ered ev­i­dence about one’s own fu­ture choices? The an­swer seems to be re­lated to the de­tails of the pre­dic­tion pro­cess and how it is car­ried out.

It may be use­ful to con­sider how a pre­dic­tion is im­ple­mented as a spe­cific pro­gram. In this pa­per by Critch, the al­gorithm plays the pris­oner’s dilemma by co­op­er­at­ing if it suc­cess­fully pre­dicts that the op­po­nent will co­op­er­ate, and defect­ing oth­er­wise. Here the “pre­dic­tion” con­sists in a search for proofs, up to a cer­tain length, that the other al­gorithm out­puts Co­op­er­ate when given as in­put. Thanks to a bounded ver­sion of Löb’s the­o­rem, this spe­cific pre­dic­tion im­ple­men­ta­tion al­lows to co­op­er­ate when play­ing against it­self.

Re­sults of this kind (open-source game the­ory /​ pro­gram equil­ibrium) could be es­pe­cially rele­vant in a fu­ture in which im­por­tant policy choices are made by AIs that in­ter­act with each other. Note, how­ever, that no claim is made about the ra­tio­nal­ity of ‘s over­all be­havi­our—it is de­bat­able whether ’s de­ci­sion to co­op­er­ate against a pro­gram that always co­op­er­ates is cor­rect.

More­over, see­ing de­ci­sion mak­ers as pro­grams can be con­fus­ing and less pre­cise than one would in­tu­itively think, be­cause it is still un­clear how to prop­erly for­mal­ise con­cepts such as ac­tion, policy and de­ci­sion-mak­ing pro­ce­dure, as dis­cussed pre­vi­ously. If ac­tions in cer­tain situ­a­tions cor­re­spond to pro­gram out­puts given cer­tain in­puts, does policy se­lec­tion cor­re­spond to pro­gram se­lec­tion? If so, why is policy se­lec­tion not an ac­tion like the other ones? And—re­lated to what I said be­fore about us­ing a hi­er­ar­chy of ex­actly two lev­els—why don’t we also “se­lect” the code frag­ment that does policy se­lec­tion?

In gen­eral, ap­proaches that use some kind of for­mal­ism tend to be more pre­cise than purely philo­soph­i­cal ap­proaches, but there are some dis­ad­van­tages as well. Fo­cus­ing on low-level de­tails can make us lose sight of the big­ger pic­ture and limit lat­eral think­ing, which can be a great source of in­sight for find­ing al­ter­na­tive solu­tions in cer­tain situ­a­tions. In a black­mail sce­nario, be­sides the de­ci­sion to pay or not, we could con­sider what fac­tors caused the leak­age of sen­si­ble in­for­ma­tion, or the ex­po­sure of some­thing we care about, to ad­ver­sar­ial agents. Another ex­am­ple: in a pris­oner’s dilemma, the equil­ibrium can shift to mu­tual co­op­er­a­tion thanks to the in­ter­ven­tion of an ex­ter­nal ac­tor that makes the pay­offs for defec­tion worse (the chap­ter on game the­ory in Al­gorithms to Live By gives a nice pre­sen­ta­tion of this equil­ibrium shift and re­lated con­cepts).

We may also take into ac­count that, for effi­ciency rea­sons, pre­dic­tions in prac­tice might be made with meth­ods differ­ent from close-to-perfect phys­i­cal or al­gorith­mic simu­la­tion, and the spe­cific method used could be rele­vant for an ac­cu­rate anal­y­sis of the situ­a­tion, as men­tioned be­fore. In the case of hu­man in­ter­ac­tion, some­times it is pos­si­ble to in­fer some­thing about one’s fu­ture ac­tions by read­ing fa­cial ex­pres­sions; but this also means that a pre­dic­tor can be tricked if one is ca­pa­ble of mask­ing their own in­ten­tions by keep­ing a poker face.

# Summary

• The claim that a cer­tain de­ci­sion is cor­rect be­cause it max­imises util­ity may re­quire fur­ther ex­pla­na­tion, since ev­ery de­ci­sion prob­lem sits in a con­text which might not be fully cap­tured in the prob­lem for­mal­i­sa­tion.

• Perfect pre­dic­tion leads to seem­ingly para­dox­i­cal situ­a­tions. It is un­clear whether these prob­lems un­der­lie other sce­nar­ios in­volv­ing pre­dic­tion. This does not mean the con­cept must be re­jected; but our cur­rent un­der­stand­ing of pre­dic­tion might lack crit­i­cal de­tails. Cer­tain prob­lems may re­quire clar­ifi­ca­tion of how the pre­dic­tion is made be­fore a solu­tion is claimed as cor­rect.

• The use of pre­cise math­e­mat­i­cal for­mal­ism can re­solve some am­bi­gui­ties. At the same time, in­ter­est­ing solu­tions to cer­tain situ­a­tions may lie “out­side” the origi­nal prob­lem state­ment.

Thanks to Abram Dem­ski, Wolf­gang Sch­warz and Cas­par Oester­held for ex­ten­sive feed­back.

This work was sup­ported by CEEALAR.

## Biases

There are bi­ases in fa­vor of the there-is-always-a-cor­rect-solu­tion frame­work. Un­cov­er­ing the right solu­tion in de­ci­sion prob­lems can be fun, and find­ing the De­ci­sion The­ory to solve them all can be ap­peal­ing.

## On “wrong” solutions

Many of the rea­sons pro­vided in this post ex­plain also why it’s tricky to de­ter­mine what a cer­tain de­ci­sion the­ory does in a prob­lem, and if the given solu­tion is wrong. But I want to provide an­other rea­son, namely the fol­low­ing in­for­mal...

Con­jec­ture: for any de­ci­sion prob­lem that you be­lieve CDT/​EDT gets wrong, there ex­ists a pa­per or book in which a par­tic­u­lar ver­sion of CDT/​EDT gives the solu­tion that you be­lieve is cor­rect, and/​or a pa­per or book that ar­gues that the solu­tion you be­lieve is cor­rect is ac­tu­ally wrong.

Here’s an ex­am­ple about New­comb’s prob­lem.

• Hey Michael, I agree that it is im­por­tant to look very closely at prob­lems like Coun­ter­fac­tual Mug­ging and not ac­cept solu­tions that in­volve hand­wav­ing.

Sup­pose the pre­dic­tor knows that it writes M on the pa­per you’ll choose N and if it writes N on the pa­per you’ll choose M. Fur­ther, if it writes noth­ing you’ll choose M. That isn’t a prob­lem since re­gard­less of what it writes it would have pre­dicted your choice cor­rectly. It just can’t write down the choice with­out mak­ing you choose the op­po­site.

I was quite skep­ti­cal of pay­ing in Coun­ter­fac­tual Mug­ging un­til I dis­cov­ered the Coun­ter­fac­tual Pri­soner’s Dilemma which ad­dresses the prob­lem of why you should care about coun­ter­fac­tu­als given that they aren’t fac­tual by defi­ni­tion.

Ideally you’d start do­ing some­thing like UDT from the be­gin­ning of time, but hu­mans don’t know UDT when they are born, you’d have to ad­just it to take this into ac­count by treat­ing these ini­tial de­ci­sions as in­de­pen­dent of your UDT policy.

• Hi Chris!

Sup­pose the pre­dic­tor knows that it writes M on the pa­per you’ll choose N and if it writes N on the pa­per you’ll choose M. Fur­ther, if it writes noth­ing you’ll choose M. That isn’t a prob­lem since re­gard­less of what it writes it would have pre­dicted your choice cor­rectly. It just can’t write down the choice with­out mak­ing you choose the op­po­site.

My point in the post is that the para­dox­i­cal situ­a­tion oc­curs when the pre­dic­tion out­come is com­mu­ni­cated to the de­ci­sion maker. We have a seem­ingly cor­rect pre­dic­tion—the one that you wrote about—that ceases to be cor­rect af­ter it is com­mu­ni­cated. And later in the post I dis­cuss whether this prob­le­matic fea­ture of pre­dic­tion ex­tends to other sce­nar­ios, leav­ing the ques­tion open. What did you want to say ex­actly?

I was quite skep­ti­cal of pay­ing in Coun­ter­fac­tual Mug­ging un­til I dis­cov­ered the Coun­ter­fac­tual Pri­soner’s Dilemma which ad­dresses the prob­lem of why you should care about coun­ter­fac­tu­als given that they aren’t fac­tual by defi­ni­tion.

I’ve read the prob­lem and the anal­y­sis I did for (stan­dard) coun­ter­fac­tual mug­ging ap­plies to your ver­sion as well.

The first in­tu­ition is that, be­fore know­ing the toss out­come, the DM wants to pay in both cases, be­cause that gives the high­est util­ity (9900) in ex­pec­ta­tion.

The sec­ond in­tu­ition is that, af­ter the DM knows (wlog) the out­come is heads, he doesn’t want to pay any­more in that case—and wants to be some­one who pays when tails is the out­come, thus get­ting 10000.

• Well, you can only pre­dict con­di­tional on what you write, you can’t pre­dict un­con­di­tion­ally. How­ever, once you’ve fixed what you’ll write in or­der to make a pre­dic­tion, you can’t then change what you’ll write in re­sponse to that pre­dic­tion.

Ac­tu­ally, it isn’t about util­ity in ex­pec­ta­tion. If you are the kind of per­son who pays you gain $9900, if you aren’t you gain$100. This is guaran­teed util­ity, not ex­pected util­ity.

• The fact that it is “guaran­teed” util­ity doesn’t make a sig­nifi­cant differ­ence: my anal­y­sis still ap­plies. After you know the out­come, you can avoid pay­ing in that case and get 10000 in­stead of 9900 (sec­ond in­tu­ition).

• “After you know the out­come, you can avoid pay­ing in that case and get 10000 in­stead of 9900 (sec­ond in­tu­ition)”—No you can’t. The only way to get 10,000 is to pay if the coin comes up the op­po­site way it comes up. And that’s only a 5050 chance.

• If the DM knows the out­come is heads, why can’t he not pay in that case and de­cide to pay in the other case? In other words: why can’t he adopt the policy (not pay when heads; pay when tails), which leads to 10000?

• If you pre-com­mit to that strat­egy (heads don’t post, tails pay) it pro­vides 10000, but it only works half the time.

If you de­cide that af­ter you see the coin, not to pay in that case, then this will lead to the strat­egy (not pay, not pay) which pro­vides 0.

• It seems you are ar­gu­ing for the po­si­tion that I called “the first in­tu­ition” in my post. Be­fore know­ing the out­come, the best you can do is (pay, pay), be­cause that leads to 9900.

On the other hand, as in stan­dard coun­ter­fac­tual mug­ging, you could be asked: “You know that, this time, the coin came up tails. What do you do?”. And here the sec­ond in­tu­ition ap­plies: the DM can de­cide to not pay (in this case) and to pay when heads. Omega recog­nises the in­tent of the DM, and gives 10000.

Maybe you are not even con­sid­er­ing the sec­ond in­tu­ition be­cause you take for granted that the agent has to de­cide one policy “at the be­gin­ning” and stick to it, or, as you wrote, “pre-com­mit”. One of the points of the post is that it is un­clear where this as­sump­tion comes from, and what it ex­actly means. It’s pos­si­ble that my rea­son­ing in the post was not clear, but I think that if you reread the anal­y­sis you will see the situ­a­tion from both view­points.

• I am con­sid­er­ing the sec­ond in­tu­iton. Act­ing ac­cord­ing to it re­sults in you re­ceiv­ing $0 in Coun­ter­fac­tual Pri­soner’s Dilemma, in­stead of los­ing$100. This is be­cause if you act up­date­fully when it comes up heads, you have to also act up­date­fully when it comes up tails. If this still doesn’t make sense, I’d en­courage you to reread the post.

• Omega, a perfect pre­dic­tor, flips a coin. If it comes up heads, Omega asks you for $100, then pays you$10,000 if it pre­dict you would have paid if it had come up tails and you were told it was tails. If it comes up tails, Omega asks you for $100, then pays you$10,000 if it pre­dicts you would have paid if it had come up heads and you were told it was heads.

Here there is no ques­tion, so I as­sume it is some­thing like: “What do you do?” or “What is your policy?”

That for­mu­la­tion is analo­gous to stan­dard coun­ter­fac­tual mug­ging, stated in this way:

Omega flips a coin. If it comes up heads, Omega will give you 10000 in case you would pay 100 when tails. If it comes up tails, Omega will ask you to pay 100. What do you do?

Ac­cord­ing to these two for­mu­la­tions, the cor­rect an­swer seems to be the one cor­re­spond­ing to the first in­tu­ition.

Now con­sider in­stead this for­mu­la­tion of coun­ter­fac­tual PD:

Omega, a perfect pre­dic­tor, tells you that it has flipped a coin, and it has come up heads. Omega asks you to pay 100 (here and now) and gives you 10000 (here and now) if you would pay in case the coin landed tails. Omega also ex­plains that, if the coin had come up tails—but note that it hasn’t—Omega would tell you such and such (sym­met­ri­cal situ­a­tion). What do you do?

The an­swer of the sec­ond in­tu­ition would be: I re­fuse to pay here and now, and I would have paid in case the coin had come up tails. I get 10000.

And this for­mu­la­tion of coun­ter­fac­tual PD is analo­gous to this for­mu­la­tion of coun­ter­fac­tual mug­ging, where the sec­ond in­tu­ition re­fuses to pay.

Omega, a perfect pre­dic­tor, flips a coin and tell you how it came up. If if comes up heads, Omega asks you for $100, then pays you$10,000 if it pre­dict you would have paid if it had come up tails. If it comes up tails, Omega asks you for $100, then pays you$10,000 if it pre­dicts you would have paid if it had come up heads. In this case it was heads.