Why you must maximize expected utility

This post ex­plains von Neu­mann-Mor­gen­stern (VNM) ax­ioms for de­ci­sion the­ory, and what fol­lows from them: that if you have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture, you must be an ex­pected util­ity max­i­mizer. I’m writ­ing this post in prepa­ra­tion for a se­quence on up­date­less an­throp­ics, but I’m hop­ing that it will also be in­de­pen­dently use­ful.

The the­o­rems of de­ci­sion the­ory say that if you fol­low cer­tain ax­ioms, then your be­hav­ior is de­scribed by a util­ity func­tion. (If you don’t know what that means, I’ll ex­plain be­low.) So you should have a util­ity func­tion! Ex­cept, why should you want to fol­low these ax­ioms in the first place?

A cou­ple of years ago, Eliezer ex­plained how vi­o­lat­ing one of them can turn you into a money pump — how, at time 11:59, you will want to pay a penny to get op­tion B in­stead of op­tion A, and then at 12:01, you will want to pay a penny to switch back. Either that, or the game will have ended and the op­tion won’t have made a differ­ence.

When I read that post, I was suit­ably im­pressed, but not com­pletely con­vinced: I would cer­tainly not want to be­have one way if be­hav­ing differ­ently always gave bet­ter re­sults. But couldn’t you avoid the prob­lem by vi­o­lat­ing the ax­iom only in situ­a­tions where it doesn’t give any­one an op­por­tu­nity to money-pump you? I’m not say­ing that would be el­e­gant, but is there a rea­son it would be ir­ra­tional?

It took me a while, but I have since come around to the view that you re­ally must have a util­ity func­tion, and re­ally must be­have in a way that max­i­mizes the ex­pec­ta­tion of this func­tion, on pain of stu­pidity (or at least that there are strong ar­gu­ments in this di­rec­tion). But I don’t know any source that comes close to ex­plain­ing the rea­son, the way I see it; hence, this post.

I’ll use the von Neu­mann-Mor­gen­stern ax­ioms, which as­sume prob­a­bil­ity the­ory as a foun­da­tion (un­like the Sav­age ax­ioms, which ac­tu­ally im­ply that any­one fol­low­ing them has not only a util­ity func­tion but also a prob­a­bil­ity dis­tri­bu­tion). I will as­sume that you already ac­cept Bayesi­anism.


Epistemic ra­tio­nal­ity is about figur­ing out what’s true; in­stru­men­tal ra­tio­nal­ity is about steer­ing the fu­ture where you want it to go. The way I see it, the ax­ioms of de­ci­sion the­ory tell you how to have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture. If my choice at 12:01 de­pends on whether at 11:59 I had a chance to de­cide differ­ently, then per­haps I won’t ever be money-pumped; but if I want to save as many hu­man lives as pos­si­ble, and I must de­cide be­tween differ­ent plans that have differ­ent prob­a­bil­ities of sav­ing differ­ent num­bers of peo­ple, then it starts to at least seem doubt­ful that which plan is bet­ter at 12:01 could gen­uinely de­pend on my op­por­tu­nity to choose at 11:59.

So how do we for­mal­ize the no­tion of a co­her­ent di­rec­tion in which you can steer the fu­ture?


Set­ting the stage

De­ci­sion the­ory asks what you would do if faced with choices be­tween differ­ent sets of op­tions, and then places re­stric­tions on how you can act in one situ­a­tion, de­pend­ing on how you would act in oth­ers. This is an­other thing that has always both­ered me: If we are talk­ing about choices be­tween differ­ent lot­ter­ies with small prizes, it makes some sense that we could in­vite you to the lab and run ten ses­sions with differ­ent choices, and you should prob­a­bly act con­sis­tently across them. But if we’re in­ter­ested in the big ques­tions, like how to save the world, then you’re not go­ing to face a se­ries of in­de­pen­dent, analo­gous sce­nar­ios. So what is the con­tent of ask­ing what you would do if you faced a set of choices differ­ent from the one you ac­tu­ally face?

The real point is that you have bounded com­pu­ta­tional re­sources, and you can’t ac­tu­ally vi­su­al­ize the ex­act set of choices you might face in the fu­ture. A perfect Bayesian ra­tio­nal­ist could just figure out what they would do in any con­ceiv­able situ­a­tion and write it down in a gi­ant lookup table, which means that they only face a sin­gle one-time choice be­tween differ­ent pos­si­ble ta­bles. But you can’t do that, and so you need to figure out gen­eral prin­ci­ples to fol­low. A perfect Bayesian is like a Carnot en­g­ine — it’s what a the­o­ret­i­cally perfect en­g­ine would look like, so even though you can at best ap­prox­i­mate it, it still has some­thing to teach you about how to build a real en­g­ine.

But de­ci­sion the­ory is about what a perfect Bayesian would do, and it’s an­noy­ing to have our prac­ti­cal con­cerns in­trude into our ideal pic­ture like that. So let’s give our story some lo­cal color and say that you aren’t a perfect Bayesian, but you have a ge­nie — that is, a pow­er­ful op­ti­miza­tion pro­cess — that is, an AI, which is. (That, too, is phys­i­cally im­pos­si­ble: AIs, like hu­mans, can only ap­prox­i­mate perfect Bayesi­anism. But we are still ideal­iz­ing.) Your ge­nie is able to com­pre­hend the set of pos­si­ble gi­ant lookup ta­bles it must choose be­tween; you must write down a for­mula, to be eval­u­ated by the ge­nie, that chooses the best table from this set, given the available in­for­ma­tion. (An un­mod­ified hu­man won’t ac­tu­ally be able to write down an ex­act for­mula de­scribing their prefer­ences, but we might be able to write down one for a pa­per­clip max­i­mizer.)

The first con­straint de­ci­sion the­ory places on your for­mula is that it must or­der all op­tions your ge­nie might have to choose be­tween from best to worst (though you might be in­differ­ent be­tween some of them), and then given any par­tic­u­lar set of fea­si­ble op­tions, it must choose the one that is least bad. In par­tic­u­lar, if you pre­fer op­tion A when op­tions A and B are available, then you can’t pre­fer op­tion B when op­tions A, B and C are available.

Med­i­ta­tion: Alice is try­ing to de­cide how large a bonus each mem­ber of her team should get this year. She has just de­cided on giv­ing Bob the same, already large, bonus as last year when she re­ceives an e-mail from the head of a differ­ent di­vi­sion, ask­ing her if she can recom­mend any­one for a new pro­ject he is set­ting up. Alice im­me­di­ately re­al­izes that Bob would love to be on that pro­ject, and would fit the bill ex­actly. But she needs Bob on the con­tract he’s cur­rently work­ing on; los­ing him would be a pretty bad blow for her team.

Alice de­cides there is no way that she can recom­mend Bob for the new pro­ject. But she still feels bad about it, and she de­cides to make up for it by giv­ing Bob a larger bonus. On re­flec­tion, she finds that she gen­uinely feels that this is the right thing to do, sim­ply be­cause she could have recom­mended him but didn’t. Does that mean that Alice’s prefer­ences are ir­ra­tional? Or that some­thing is wrong with de­ci­sion the­ory?

Med­i­ta­tion: One kind of an­swer to the above and to many other crit­i­cisms of de­ci­sion the­ory goes like this: Alice’s de­ci­sion isn’t be­tween giv­ing Bob a larger bonus or not, it’s be­tween (give Bob a larger bonus un­con­di­tion­ally), (give Bob the same bonus un­con­di­tion­ally), (only give Bob a larger bonus if I could have recom­mended him), and so on. But if that sort of thing is al­lowed, is there any way left in which de­ci­sion the­ory con­strains Alice’s be­hav­ior? If not, what good is it to Alice in figur­ing out what she should do?




My short an­swer is that Alice can care about any­thing she damn well likes. But there are a lot of things that she doesn’t care about, and de­ci­sion the­ory has some­thing to say about those.

In fact, de­cid­ing that some kinds of prefer­ences should be out­lawed as ir­ra­tional can be dan­ger­ous: you might think that no­body in their right mind should ever care about the de­tailed plan­ning al­gorithms their AI uses, as long as they work. But how cer­tain are you that it’s wrong to care about whether the AI has planned out your whole life in ad­vance, in de­tail? (Worse: Depend­ing on how strictly you in­ter­pret it, this in­junc­tion might even rule out not want­ing the AI to run con­scious simu­la­tions of peo­ple.)

But nev­er­the­less, I be­lieve the “any­thing she damn well likes” needs to be qual­ified. Imag­ine that Alice and Carol both have an AI, and for­tu­itously, both AIs have been pro­grammed with the same prefer­ences and the same Bayesian prior (and they talk, so they also have the same pos­te­rior, be­cause Bayesi­ans can­not agree to dis­agree). But Alice’s AI has taken over the stock mar­kets, while Carol’s AI has seized the world’s nu­clear ar­se­nals (and is pro­tect­ing them well). So Alice’s AI not only doesn’t want to blow up Earth, it couldn’t do so even if it wanted to; it couldn’t even bribe Carol’s AI, be­cause Carol’s AI re­ally doesn’t want the Earth blown up ei­ther. And so, if it makes a differ­ence to the AIs’ prefer­ence func­tion whether they could blow up Earth if they wanted to, they have a con­flict of in­ter­est.

The moral of this story is not sim­ply that it would be sad if two AIs came into con­flict even though they have the same prefer­ences. The point is that we’re ask­ing what it means to have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture, and it doesn’t look like our AIs are on the same bear­ing. Surely, a di­rec­tion for steer­ing the world should only de­pend on fea­tures of the world, not on ad­di­tional in­for­ma­tion about which agent is at the rud­der.

You can want to not have your life planned out by an AI. But I think you should have to state your wish as a prop­erty of the world: you want all AIs to re­frain from do­ing so, not just “what­ever AI hap­pens to be ex­e­cut­ing this”. And Alice can want Bob to get a larger bonus if the com­pany could have as­signed him to the new pro­ject and de­cided not to, but she must figure out whether this is the cor­rect way to trans­late her moral in­tu­itions into prefer­ences over prop­er­ties of the world.


You may care about any fea­ture of the world, but you don’t in fact care about most of them. For ex­am­ple, there are many ways the atoms in the sun could be ar­ranged that all add up to the same thing as far as you are con­cerned, and you don’t have ter­mi­nal prefer­ences about which of these will be the ac­tual one to­mor­row. And though you might care about some prop­er­ties of the al­gorithms your AI is run­ning, mostly they re­ally do not mat­ter.

Let’s define a func­tion that takes a com­plete de­scrip­tion of the world — past, pre­sent and fu­ture — and re­turns a data struc­ture con­tain­ing all in­for­ma­tion about the world that mat­ters to your ter­mi­nal val­ues, and only that in­for­ma­tion. (Our imag­i­nary perfect Bayesian doesn’t know ex­actly which way the world will turn out, but it can work with “pos­si­ble wor­lds”, com­plete de­scrip­tions of ways the world may turn out.) We’ll call this data struc­ture an “out­come”, and we re­quire you to be in­differ­ent be­tween any two courses of ac­tion that will always pro­duce the same out­come. Of course, any course of ac­tion is some­thing that your AI would be ex­e­cut­ing in the ac­tual world, and you are cer­tainly al­lowed to care about the differ­ence — but then the two courses of ac­tion do not lead to the same “out­come”!1

With this defi­ni­tion, I think it is pretty rea­son­able to say that in or­der to have a con­sis­tent di­rec­tion in which you want to steer the world, you must be able to or­der these out­comes from best to worst, and always want to pick the least bad you can get.


Prefer­ence relations

That won’t be suffi­cient, though. Our ge­nie doesn’t know what out­come each ac­tion will pro­duce, it only has prob­a­bil­is­tic in­for­ma­tion about that, and that’s a com­pli­ca­tion we very much do not want to ideal­ize away (be­cause we’re try­ing to figure out the right way to deal with it). And so our de­ci­sion the­ory amends the ear­lier re­quire­ment: You must not only be in­differ­ent be­tween ac­tions that always pro­duce the same out­come, but also be­tween all ac­tions that only yield the same prob­a­bil­ity dis­tri­bu­tion over out­comes.

This is not at all a mild as­sump­tion, though it’s usu­ally built so deeply into the defi­ni­tions that it’s not even called an “ax­iom”. But we’ve as­sumed that all fea­tures of the world you care about are already en­coded in the out­comes, so it does seem to me that the only rea­son left why you might pre­fer one ac­tion over an­other is that it gives you a bet­ter trade-off in terms of what out­comes it makes more or less likely; and I’ve as­sumed that you’re already a Bayesian, so you agree that how likely it makes an out­come is cor­rectly rep­re­sented by the prob­a­bil­ity of that out­come, given the ac­tion. So it cer­tainly seems that the prob­a­bil­ity dis­tri­bu­tion over out­comes should give you all the in­for­ma­tion about an ac­tion that you could pos­si­bly care about. And that you should be able to or­der these prob­a­bil­ity dis­tri­bu­tions from best to worst, and all that.

For­mally, we rep­re­sent a di­rec­tion for steer­ing the world as a set of pos­si­ble out­comes and a bi­nary re­la­tion on the prob­a­bil­ity dis­tri­bu­tions over (with is in­ter­preted as ” is at least as good as ”) which is a to­tal pre­order; that is, for all , and :

  • If and , then (that is, is tran­si­tive); and

  • We have ei­ther or or both (that is, is to­tal).

In this post, I’ll as­sume that is finite. We write (for “I’m in­differ­ent be­tween and ”) when both and , and we write (” is strictly bet­ter than ”) when but not . Our ge­nie will com­pute the set of all ac­tions it could pos­si­bly take, and the prob­a­bil­ity dis­tri­bu­tion over pos­si­ble out­comes that (ac­cord­ing to the ge­nie’s Bayesian pos­te­rior) each of these ac­tions leads to, and then it will choose to act in a way that max­i­mizes . I’ll also as­sume that the set of pos­si­ble ac­tions will always be finite, so there is always at least one op­ti­mal ac­tion.

Med­i­ta­tion: Omega is in the neigh­bour­hood and in­vites you to par­ti­ci­pate in one of its lit­tle games. Next Satur­day, it plans to flip a fair coin; would you please in­di­cate on the at­tached form whether you would like to bet that this coin will fall heads, or tails? If you cor­rectly bet heads, you will win $10,000; if you cor­rectly bet tails, you’ll win $100. If you bet wrongly, you will still re­ceive $1 for your par­ti­ci­pa­tion.

We’ll as­sume that you pre­fer a 50% chance of $10,000 and a 50% chance of $1 to a 50% chance of $100 and a 50% chance of $1. Thus, our the­ory would say that you should bet heads. But there is a twist: Given re­cent galac­topoli­ti­cal events, you es­ti­mate a 3% chance that af­ter post­ing its let­ter, Omega has been called away on ur­gent busi­ness. In this case, the game will be can­cel­led and you won’t get any money, though as a con­so­la­tion, Omega will prob­a­bly send you some book from its rare SF col­lec­tion when it re­turns (mar­ket value: ap­prox­i­mately $55–$70). Our the­ory so far tells you noth­ing about how you should bet in this case, but does Ra­tion­al­ity have any­thing to say about it?



The Ax­iom of Independence

So here’s how I think about that prob­lem: If you already knew that Omega is still in the neigh­bour­hood (but not which way the coin is go­ing to fall), you would pre­fer to bet heads, and if you knew it has been called away, you wouldn’t care. (And what you bet has no in­fluence on whether Omega has been called away.) So heads is ei­ther bet­ter or ex­actly the same; clearly, you should bet heads.

This type of rea­son­ing is the con­tent of the von Neu­mann-Mor­gen­stern Ax­iom of In­de­pen­dence. Ap­par­ently, that’s the most con­tro­ver­sial of the the­ory’s ax­ioms.

You’re already a Bayesian, so you already ac­cept that if you perform an ex­per­i­ment to de­ter­mine whether some­one is a witch, and the ex­per­i­ment can come out two ways, then if one of these out­comes is ev­i­dence that the per­son is a witch, the other out­come must be ev­i­dence that they are not. New in­for­ma­tion is al­lowed to make a hy­poth­e­sis more likely, but not pre­dictably so; if all ways the ex­per­i­ment could come out make the hy­poth­e­sis more likely, then you should already be find­ing it more likely than you do. The same thing is true even if only one re­sult would make the hy­poth­e­sis more likely, but the other would leave your prob­a­bil­ity es­ti­mate ex­actly un­changed.

The Ax­iom of In­de­pen­dence is equiv­a­lent to say­ing that if you’re eval­u­at­ing a pos­si­ble course of ac­tion, and one ex­per­i­men­tal re­sult would make it seem more at­trac­tive than it cur­rently seems to you, while the other ex­per­i­men­tal re­sult would at least make it seem no less at­trac­tive, then you should already be find­ing it more at­trac­tive than you do. This does seem rather solid to me.


So what does this ax­iom say for­mally? (Feel free to skip this sec­tion if you don’t care.)

Sup­pose that your ge­nie is con­sid­er­ing two pos­si­ble ac­tions and (bet heads or tails), and an event (Omega is called away). Each ac­tion gives rise to a prob­a­bil­ity dis­tri­bu­tion over pos­si­ble out­comes: E.g., is the prob­a­bil­ity of out­come if your ge­nie chooses . But your ge­nie can also com­pute a prob­a­bil­ity dis­tri­bu­tion con­di­tional on , . Sup­pose that con­di­tional on , it doesn’t mat­ter which ac­tion you pick: for all . And fi­nally, sup­pose that the prob­a­bil­ity of doesn’t de­pend on which ac­tion you pick: , with . The Ax­iom of In­de­pen­dence says that in this situ­a­tion, you should pre­fer the dis­tri­bu­tion to the dis­tri­bu­tion , and there­fore pre­fer to , if and only if you pre­fer the dis­tri­bu­tion to the dis­tri­bu­tion .

Let’s write for the dis­tri­bu­tion , for the dis­tri­bu­tion , and for the dis­tri­bu­tion . (For­mally, we think of these as vec­tors in : e.g., .) For all , we have

so , and similarly . Thus, we can state the Ax­iom of In­de­pen­dence as fol­lows:

  • .

We’ll as­sume that you can’t ever rule out the pos­si­bil­ity that your AI might face this type of situ­a­tion for any given , , , and , so we re­quire that this con­di­tion hold for all prob­a­bil­ity dis­tri­bu­tions , and , and for all with .


Here’s a com­mon crit­i­cism of In­de­pen­dence. Sup­pose a par­ent has two chil­dren, and one old car that they can give to one of these chil­dren. Can’t they be in­differ­ent be­tween giv­ing the car to their older child or their younger child, but strictly pre­fer throw­ing a coin? But let mean that the younger child gets the gift, and that the older child gets it, and ; then by In­de­pen­dence, if , then , so it would seem that the par­ent can not strictly pre­fer the coin throw.

In fair­ness, the peo­ple who find this crit­i­cism per­sua­sive may not be Bayesi­ans. But if you think this is a good crit­i­cism: Do you think that the par­ent must be in­differ­ent be­tween throw­ing a coin and ask­ing the chil­dren’s crazy old kinder­garten teacher which of them was bet­ter-be­haved, as long as they as­sign 50% prob­a­bil­ity to ei­ther an­swer? Be­cause if not, shouldn’t you already have protested when we de­cided that de­ci­sions must only de­pend on the prob­a­bil­ities of differ­ent out­comes?

My own re­s­olu­tion is that this is an­other case of ter­mi­nal val­ues in­trud­ing where they don’t be­long. All that is rele­vant to the par­ent’s ter­mi­nal val­ues must already be de­scribed in the out­come; the par­ent is al­lowed to pre­fer “I threw a coin and my younger child got the car” to “I de­cided that my younger child would get the car” or “I asked the kinder­garten teacher and they thought my younger child was bet­ter-be­haved”, but if so, then these must already be differ­ent out­comes. The thing to re­mem­ber is that it isn’t a prop­erty of the world that ei­ther child had a 50% prob­a­bil­ity of get­ting the car, and you can’t steer the fu­ture in the di­rec­tion of hav­ing this myth­i­cal prop­erty. It is a prop­erty of the world that the par­ent as­signed a 50% prob­a­bil­ity to each child get­ting the car, and that is a di­rec­tion you can steer in — though the ex­am­ple with the kinder­garten teacher shows that this is prob­a­bly not quite the di­rec­tion you ac­tu­ally wanted.

The prefer­ence re­la­tion is only sup­posed to be about trade-offs be­tween prob­a­bil­ity dis­tri­bu­tions; if you’re tempted to say that you want to steer the world to­wards one prob­a­bil­ity dis­tri­bu­tion or an­other, rather than one out­come or other, some­thing has gone ter­ribly wrong.


The Ax­iom of Continuity

And… that’s it. Th­ese are all the ax­ioms that I’ll ask you to ac­cept in this post.

There is, how­ever, one more ax­iom in the von Neu­mann-Mor­gen­stern the­ory, the Ax­iom of Con­ti­nu­ity. I do not think this ax­iom is a nec­es­sary re­quire­ment on any co­her­ent plan for steer­ing the world; I think the best ar­gu­ment for it is that it doesn’t make a prac­ti­cal differ­ence whether you adopt it, so you might as well. But there is also a good ar­gu­ment to be made that if we’re talk­ing about any­thing short of steer­ing the en­tire fu­ture of hu­man­ity, your prefer­ences do in fact obey this ax­iom, and it makes things eas­ier tech­ni­cally if we adopt it, so I’ll do that at least for now.

Let’s look at an ex­am­ple: If you pre­fer $50 in your pocket to $40, the ax­iom says that there must be some small such that you pre­fer a prob­a­bil­ity of of $50 and a prob­a­bil­ity of of dy­ing to­day to a cer­tainty of $40. Some crit­ics seem to see this as the ul­ti­mate re­duc­tio ad ab­sur­dum for the VNM the­ory; they seem to think that no sane hu­man would ac­cept that deal.

Eliezer was surely not the first to ob­serve that this prefer­ence is ex­hibited each time some­one drives an ex­tra mile to save $10.

Con­ti­nu­ity says that if you strictly pre­fer to , then there is no so ter­rible that you wouldn’t be will­ing to in­cur a small prob­a­bil­ity of it in or­der to (prob­a­bly) get rather than , and no so won­der­ful that you’d be will­ing to (prob­a­bly) get in­stead of if this gives you some ar­bi­trar­ily small prob­a­bil­ity of get­ting . For­mally, for all , and ,

  • If , then there is an such that and .

I think if we’re talk­ing about ev­ery­day life, we can pretty much rule out that there are things so ter­rible that for ar­bi­trar­ily small , you’d be will­ing to die with prob­a­bil­ity to avoid a prob­a­bil­ity of of the ter­rible thing. And if you feel that it’s not worth the ex­pense to call a doc­tor ev­ery time you sneeze, you’re will­ing to in­cur a slightly higher prob­a­bil­ity of death in or­der to save some mere money. And it seems un­likely that there is no at which you’d pre­fer a cer­tainty of $1 to a chance of $100. And if you have some prefer­ence that is so slight that you wouldn’t be will­ing to ac­cept any chance of los­ing $1 in or­der to in­dulge it, it can’t be a very strong prefer­ence. So I think for most prac­ti­cal pur­poses, we might as well ac­cept Con­ti­nu­ity.


The VNM theorem

If your prefer­ences are de­scribed by a tran­si­tive and com­plete re­la­tion on the prob­a­bil­ity dis­tri­bu­tions over some set of “out­comes”, and this re­la­tion satis­fies In­de­pen­dence and Con­ti­nu­ity, then you have a util­ity func­tion, and your ge­nie will be max­i­miz­ing ex­pected util­ity.

Here’s what that means. A util­ity func­tion is a func­tion which as­signs a nu­mer­i­cal “util­ity” to ev­ery out­come. Given a prob­a­bil­ity dis­tri­bu­tion over , we can com­pute the ex­pected value of un­der , ; this is called the ex­pected util­ity. We can prove that there is some util­ity func­tion such that for all and , we have if and only if the ex­pected util­ity un­der is greater than the ex­pected util­ity un­der .

In other words: is com­pletely de­scribed by ; if you know , you know . In­stead of pro­gram­ming your ge­nie with a func­tion that takes two out­comes and says which one is bet­ter, you might as well pro­gram it with a func­tion that takes one out­come and re­turns its util­ity. Any co­her­ent di­rec­tion for steer­ing the world which hap­pens to satisfy Con­ti­nu­ity can be re­duced to a func­tion that takes out­comes and as­signs them nu­mer­i­cal rat­ings.

In fact, it turns out that the for a given is “al­most” unique: Given two util­ity func­tions and that de­scribe the same , there are num­bers and such that for all , ; this is called an “af­fine trans­for­ma­tion”. On the other hand, it’s not hard to see that for any such and ,

so two util­ity func­tions rep­re­sent the same prefer­ence re­la­tion if and only if they are re­lated in this way.


You shouldn’t read too much into this con­cep­tion of util­ity. For ex­am­ple, it doesn’t make sense to see a fun­da­men­tal dis­tinc­tion be­tween out­comes with “pos­i­tive” and with “nega­tive” von Neu­mann-Mor­gen­stern util­ity — be­cause adding the right can make any nega­tive util­ity pos­i­tive and any pos­i­tive util­ity nega­tive, with­out chang­ing the un­der­ly­ing prefer­ence re­la­tion. The num­bers that have real mean­ing are ra­tios be­tween differ­ences be­tween util­ities, , be­cause these don’t change un­der af­fine trans­for­ma­tions (the ‘s can­cel when you take the differ­ence, and the ’s can­cel when you take the ra­tio). Academian’s post has more about mi­s­un­der­stand­ings of VNM util­ity.

In my view, what VNM util­ities rep­re­sent is not nec­es­sar­ily how good each out­come is; what they rep­re­sent is what trade-offs be­tween prob­a­bil­ity dis­tri­bu­tions you are will­ing to ac­cept. Now, if you strongly felt that the differ­ence be­tween and was about the same as the differ­ence be­tween and , then you should have a very good rea­son be­fore you make your a huge num­ber. But on the other hand, I think it’s ul­ti­mately your re­spon­si­bil­ity to de­cide what trade-offs you are will­ing to make; I don’t think you can get away with “stat­ing how much you value differ­ent out­comes” and out­sourc­ing the rest of the job to de­ci­sion the­ory, with­out ever con­sid­er­ing what these val­u­a­tions should mean in terms of prob­a­bil­is­tic trade-offs.


Do­ing with­out Continuity

What hap­pens if your prefer­ences do not satisfy Con­ti­nu­ity? Say, you want to save hu­man lives, but you’re not will­ing to in­cur any prob­a­bil­ity, no mat­ter how small, of in­finitely many peo­ple get­ting tor­tured in­finitely long for this?

I do not see a good ar­gu­ment that this couldn’t add up to a co­her­ent di­rec­tion for steer­ing the world. I do, how­ever, see an ar­gu­ment that in this case you care so lit­tle about finite num­bers of hu­man lives that in prac­tice, you can prob­a­bly ne­glect this con­cern en­tirely. (As a re­sult, I doubt that your re­flec­tive equil­ibrium would want to adopt such prefer­ences. But I don’t think they’re in­co­her­ent.)

I’ll as­sume that your moral­ity can still dis­t­in­guish only a finite num­ber of out­comes, and you can choose only be­tween a finite num­ber of de­ci­sions. It’s not ob­vi­ous that these as­sump­tions are jus­tified if we want to take into ac­count the pos­si­bil­ity that the true laws of physics might turn out to al­low for in­finite com­pu­ta­tions, but even in this case you and any AI you build will prob­a­bly still be finite (though it might build a suc­ces­sor that isn’t), so I do in fact think there is a good chance that re­sults de­rived un­der this as­sump­tion have rele­vance in the real world.

In this case, it turns out that you still have a util­ity func­tion, in a cer­tain sense. (Proofs for non-stan­dard re­sults can be found in the math ap­pendix to this post. I did the work my­self, but I don’t ex­pect these re­sults to be new.) This util­ity func­tion de­scribes only the con­cern most im­por­tant to you: in our ex­am­ple, only the prob­a­bil­ity of in­finite tor­ture makes a differ­ence to ex­pected util­ity; any change in the prob­a­bil­ity of sav­ing a finite num­ber of lives leaves ex­pected util­ity un­changed.

Let’s define a re­la­tion , read ” is much bet­ter than ”, which says that there is noth­ing you wouldn’t give up a lit­tle prob­a­bil­ity of in or­der to get in­stead of — in our ex­am­ple: doesn’t merely save lives com­pared to , it makes in­finite tor­ture less likely. For­mally, we define to mean that for all and “close enough” to and re­spec­tively; more pre­cisely: if there is an such that for all and with

(Or equiv­a­lently: if there are open sets and around and , re­spec­tively, such that for all and .)

It turns out that if is a prefer­ence re­la­tion satis­fy­ing In­de­pen­dence, then is a prefer­ence re­la­tion satis­fy­ing In­de­pen­dence and Con­ti­nu­ity, and there is a util­ity func­tion such that iff the ex­pected util­ity un­der is larger than the ex­pected util­ity un­der . Ob­vi­ously, im­plies , so when­ever two op­tions have differ­ent ex­pected util­ities, you pre­fer the one with the larger ex­pected util­ity. Your ge­nie is still an ex­pected util­ity max­i­mizer.

Fur­ther­more, un­less for all and , isn’t con­stant — that is, there are some and with . (If this weren’t the case, the re­sult above ob­vi­ously wouldn’t tell us very much about !) Be­ing in­differ­ent be­tween all pos­si­ble ac­tions doesn’t make for a par­tic­u­larly in­ter­est­ing di­rec­tion for steer­ing the world, if it can be called one at all, so from now on let’s as­sume that you are not.


It can hap­pen that there are two dis­tri­bu­tions and with the same ex­pected util­ity, but . ( saves more lives, but the prob­a­bil­ity of eter­nal tor­ture is the same.) Thus, if your ge­nie hap­pens to face a choice be­tween two ac­tions that lead to the same ex­pected util­ity, it must do more work to figure out which of the ac­tions it should take. But there is some rea­son to ex­pect that such situ­a­tions should be rare.

If there are pos­si­ble out­comes, then the set of prob­a­bil­ity dis­tri­bu­tions over is -di­men­sional (be­cause the prob­a­bil­ities must add up to 1, so if you know of them, you can figure out the last one). For ex­am­ple, if there are three out­comes, is a tri­an­gle, and if there are four out­comes, it’s a tetra­he­dron. On the other hand, it turns out that for any , the set of all for which the ex­pected util­ity equals has di­men­sion or smaller: if , it’s a line (or a point or the empty set); if , it’s a plane (or a line or a point or the empty set).

Thus, in or­der to have the same ex­pected util­ity, and must lie on the same hy­per­plane — not just on a plane very close by, but on ex­actly the same plane. That’s not just a small tar­get to hit, that’s an in­finitely small tar­get. If you use, say, a Solomonoff prior, then it seems very un­likely that two of your finitely many op­tions just hap­pen to lead to prob­a­bil­ity dis­tri­bu­tions which yield the same ex­pected util­ity.

But we are bounded ra­tio­nal­ists, not perfect Bayesi­ans with un­com­putable Solomonoff pri­ors. We as­sign heads and tails ex­actly the same prob­a­bil­ity, not be­cause there is no in­for­ma­tion that would make one or the other more likely (we could try to ar­rive at a best guess about which side is a lit­tle heav­ier than the other?), but be­cause the prob­lem is so com­pli­cated that we sim­ply give up on it. What if it turns out that be­cause of this, all the difficult de­ci­sions we need to make turn out to be be­tween ac­tions that hap­pen to have the same ex­pected util­ity?

If you do your im­perfect calcu­la­tion and find that two of your op­tions seem to yield ex­actly the same prob­a­bil­ity of eter­nal hell for in­finitely many peo­ple, you could then try to figure out which of them is more likely to save a finite num­ber of lives. But it seems to me that this is not the best ap­prox­i­ma­tion of an ideal Bayesian with your stated prefer­ences. Shouldn’t you spend those com­pu­ta­tional re­sources on do­ing a bet­ter calcu­la­tion of which op­tion is more likely to lead to eter­nal hell?

For you might ar­rive at a new es­ti­mate un­der which the prob­a­bil­ities of hell are at least slightly differ­ent. Even if you sus­pect that the new calcu­la­tion will again come out with the prob­a­bil­ities ex­actly equal, you don’t know that. And there­fore, can you truly in good con­science ar­gue that do­ing the new calcu­la­tion does not im­prove the odds of avoid­ing hell —

at least a teeny tiny in­cred­ibly su­per-small for all or­di­nary in­tents and pur­poses com­pletely ir­rele­vant bit?

Even if it should be the case that to a perfect Bayesian, the ex­pected util­ities un­der a Solomonoff prior were ex­actly the same, you don’t know that, so how can you pos­si­bly jus­tify stop­ping the calcu­la­tion and sav­ing a mere finite num­ber of lives?


So there you have it. In or­der to have a co­her­ent di­rec­tion in which you want to steer the world, you must have a set of out­comes and a prefer­ence re­la­tion over the prob­a­bil­ity dis­tri­bu­tions over these out­comes, and this re­la­tion must satisfy In­de­pen­dence — or so it seems to me, any­way. And if you do, then you have a util­ity func­tion, and a perfect Bayesian max­i­miz­ing your prefer­ences will always max­i­mize ex­pected util­ity.

It could hap­pen that two op­tions have ex­actly the same ex­pected util­ity, and in this case the util­ity func­tion doesn’t tell you which of these is bet­ter, un­der your prefer­ences; but as a bounded ra­tio­nal­ist, you can never know this, so if you have any com­pu­ta­tional re­sources left that you could spend on figur­ing out what your true prefer­ences have to say, you should spend them on a bet­ter calcu­la­tion of the ex­pected util­ities in­stead.

Given this, we might as well just talk about , which satis­fies Con­ti­nu­ity as well as In­de­pen­dence, in­stead of ; and you might as well pro­gram your ge­nie with your util­ity func­tion, which only re­flects , in­stead of with your true prefer­ences.

(Note: I am not liter­ally say­ing that you should not try to un­der­stand the whole topic bet­ter than this if you are ac­tu­ally go­ing to pro­gram a Friendly AI. This is still meant as a metaphor. I am, how­ever, say­ing that ex­pected util­ity the­ory, even with bor­ing old real num­bers as util­ities, is not to be dis­carded lightly.)


Next post: Deal­ing with time

So far, we’ve always pre­tended that you only face one choice, at one point in time. But not only is there a way to ap­ply our the­ory to re­peated in­ter­ac­tions with the en­vi­ron­ment — there are two!

One way is to say that at each point in time, you should ap­ply de­ci­sion the­ory to set of ac­tions you can perform at that point. Now, the ac­tual out­come de­pends of course not only on what you do now, but also on what you do later; but you know that you’ll still use de­ci­sion the­ory later, so you can fore­see what you will do in any pos­si­ble fu­ture situ­a­tion, and take it into ac­count when com­put­ing what ac­tion you should choose now.

The sec­ond way is to make a choice only once, not be­tween the ac­tions you can take at that point in time, but be­tween com­plete plans — gi­ant lookup ta­bles — which spec­ify how you will be­have in any situ­a­tion you might pos­si­bly face. Thus, you sim­ply do your ex­pected util­ity calcu­la­tion once, and then stick with the plan you have de­cided on.

Med­i­ta­tion: Which of these is the right thing to do, if you have a perfect Bayesian ge­nie and you want steer the fu­ture in some par­tic­u­lar di­rec­tion? (Does it even make a differ­ence which one you use?)

» To the math­e­mat­i­cal appendix


1 The ac­counts of de­ci­sion the­ory I’ve read use the term “out­come”, or “con­se­quence”, but leave it mostly un­defined; in a lot­tery, it’s the prize you get at the end, but clearly no­body is say­ing de­ci­sion the­ory should only ap­ply to lot­ter­ies. I’m not chang­ing its role in the math­e­mat­ics, and I think my ex­pla­na­tion of it is what the term always wanted to mean; I ex­pect that other peo­ple have ex­plained it in similar ways, though I’m not sure how similar pre­cisely.