Why you must maximize expected utility

This post ex­plains von Neu­mann-Mor­gen­stern (VNM) ax­ioms for de­ci­sion the­ory, and what fol­lows from them: that if you have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture, you must be an ex­pected util­ity max­i­mizer. I’m writ­ing this post in prepa­ra­tion for a se­quence on up­date­less an­throp­ics, but I’m hop­ing that it will also be in­de­pen­dently use­ful.

The the­o­rems of de­ci­sion the­ory say that if you fol­low cer­tain ax­ioms, then your be­hav­ior is de­scribed by a util­ity func­tion. (If you don’t know what that means, I’ll ex­plain be­low.) So you should have a util­ity func­tion! Ex­cept, why should you want to fol­low these ax­ioms in the first place?

A cou­ple of years ago, Eliezer ex­plained how vi­o­lat­ing one of them can turn you into a money pump — how, at time 11:59, you will want to pay a penny to get op­tion B in­stead of op­tion A, and then at 12:01, you will want to pay a penny to switch back. Either that, or the game will have ended and the op­tion won’t have made a differ­ence.

When I read that post, I was suit­ably im­pressed, but not com­pletely con­vinced: I would cer­tainly not want to be­have one way if be­hav­ing differ­ently always gave bet­ter re­sults. But couldn’t you avoid the prob­lem by vi­o­lat­ing the ax­iom only in situ­a­tions where it doesn’t give any­one an op­por­tu­nity to money-pump you? I’m not say­ing that would be el­e­gant, but is there a rea­son it would be ir­ra­tional?

It took me a while, but I have since come around to the view that you re­ally must have a util­ity func­tion, and re­ally must be­have in a way that max­i­mizes the ex­pec­ta­tion of this func­tion, on pain of stu­pidity (or at least that there are strong ar­gu­ments in this di­rec­tion). But I don’t know any source that comes close to ex­plain­ing the rea­son, the way I see it; hence, this post.

I’ll use the von Neu­mann-Mor­gen­stern ax­ioms, which as­sume prob­a­bil­ity the­ory as a foun­da­tion (un­like the Sav­age ax­ioms, which ac­tu­ally im­ply that any­one fol­low­ing them has not only a util­ity func­tion but also a prob­a­bil­ity dis­tri­bu­tion). I will as­sume that you already ac­cept Bayesi­anism.

*

Epistemic ra­tio­nal­ity is about figur­ing out what’s true; in­stru­men­tal ra­tio­nal­ity is about steer­ing the fu­ture where you want it to go. The way I see it, the ax­ioms of de­ci­sion the­ory tell you how to have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture. If my choice at 12:01 de­pends on whether at 11:59 I had a chance to de­cide differ­ently, then per­haps I won’t ever be money-pumped; but if I want to save as many hu­man lives as pos­si­ble, and I must de­cide be­tween differ­ent plans that have differ­ent prob­a­bil­ities of sav­ing differ­ent num­bers of peo­ple, then it starts to at least seem doubt­ful that which plan is bet­ter at 12:01 could gen­uinely de­pend on my op­por­tu­nity to choose at 11:59.

So how do we for­mal­ize the no­tion of a co­her­ent di­rec­tion in which you can steer the fu­ture?

*

Set­ting the stage

De­ci­sion the­ory asks what you would do if faced with choices be­tween differ­ent sets of op­tions, and then places re­stric­tions on how you can act in one situ­a­tion, de­pend­ing on how you would act in oth­ers. This is an­other thing that has always both­ered me: If we are talk­ing about choices be­tween differ­ent lot­ter­ies with small prizes, it makes some sense that we could in­vite you to the lab and run ten ses­sions with differ­ent choices, and you should prob­a­bly act con­sis­tently across them. But if we’re in­ter­ested in the big ques­tions, like how to save the world, then you’re not go­ing to face a se­ries of in­de­pen­dent, analo­gous sce­nar­ios. So what is the con­tent of ask­ing what you would do if you faced a set of choices differ­ent from the one you ac­tu­ally face?

The real point is that you have bounded com­pu­ta­tional re­sources, and you can’t ac­tu­ally vi­su­al­ize the ex­act set of choices you might face in the fu­ture. A perfect Bayesian ra­tio­nal­ist could just figure out what they would do in any con­ceiv­able situ­a­tion and write it down in a gi­ant lookup table, which means that they only face a sin­gle one-time choice be­tween differ­ent pos­si­ble ta­bles. But you can’t do that, and so you need to figure out gen­eral prin­ci­ples to fol­low. A perfect Bayesian is like a Carnot en­g­ine — it’s what a the­o­ret­i­cally perfect en­g­ine would look like, so even though you can at best ap­prox­i­mate it, it still has some­thing to teach you about how to build a real en­g­ine.

But de­ci­sion the­ory is about what a perfect Bayesian would do, and it’s an­noy­ing to have our prac­ti­cal con­cerns in­trude into our ideal pic­ture like that. So let’s give our story some lo­cal color and say that you aren’t a perfect Bayesian, but you have a ge­nie — that is, a pow­er­ful op­ti­miza­tion pro­cess — that is, an AI, which is. (That, too, is phys­i­cally im­pos­si­ble: AIs, like hu­mans, can only ap­prox­i­mate perfect Bayesi­anism. But we are still ideal­iz­ing.) Your ge­nie is able to com­pre­hend the set of pos­si­ble gi­ant lookup ta­bles it must choose be­tween; you must write down a for­mula, to be eval­u­ated by the ge­nie, that chooses the best table from this set, given the available in­for­ma­tion. (An un­mod­ified hu­man won’t ac­tu­ally be able to write down an ex­act for­mula de­scribing their prefer­ences, but we might be able to write down one for a pa­per­clip max­i­mizer.)

The first con­straint de­ci­sion the­ory places on your for­mula is that it must or­der all op­tions your ge­nie might have to choose be­tween from best to worst (though you might be in­differ­ent be­tween some of them), and then given any par­tic­u­lar set of fea­si­ble op­tions, it must choose the one that is least bad. In par­tic­u­lar, if you pre­fer op­tion A when op­tions A and B are available, then you can’t pre­fer op­tion B when op­tions A, B and C are available.

Med­i­ta­tion: Alice is try­ing to de­cide how large a bonus each mem­ber of her team should get this year. She has just de­cided on giv­ing Bob the same, already large, bonus as last year when she re­ceives an e-mail from the head of a differ­ent di­vi­sion, ask­ing her if she can recom­mend any­one for a new pro­ject he is set­ting up. Alice im­me­di­ately re­al­izes that Bob would love to be on that pro­ject, and would fit the bill ex­actly. But she needs Bob on the con­tract he’s cur­rently work­ing on; los­ing him would be a pretty bad blow for her team.

Alice de­cides there is no way that she can recom­mend Bob for the new pro­ject. But she still feels bad about it, and she de­cides to make up for it by giv­ing Bob a larger bonus. On re­flec­tion, she finds that she gen­uinely feels that this is the right thing to do, sim­ply be­cause she could have recom­mended him but didn’t. Does that mean that Alice’s prefer­ences are ir­ra­tional? Or that some­thing is wrong with de­ci­sion the­ory?

Med­i­ta­tion: One kind of an­swer to the above and to many other crit­i­cisms of de­ci­sion the­ory goes like this: Alice’s de­ci­sion isn’t be­tween giv­ing Bob a larger bonus or not, it’s be­tween (give Bob a larger bonus un­con­di­tion­ally), (give Bob the same bonus un­con­di­tion­ally), (only give Bob a larger bonus if I could have recom­mended him), and so on. But if that sort of thing is al­lowed, is there any way left in which de­ci­sion the­ory con­strains Alice’s be­hav­ior? If not, what good is it to Alice in figur­ing out what she should do?

...
...
...

*

Outcomes

My short an­swer is that Alice can care about any­thing she damn well likes. But there are a lot of things that she doesn’t care about, and de­ci­sion the­ory has some­thing to say about those.

In fact, de­cid­ing that some kinds of prefer­ences should be out­lawed as ir­ra­tional can be dan­ger­ous: you might think that no­body in their right mind should ever care about the de­tailed plan­ning al­gorithms their AI uses, as long as they work. But how cer­tain are you that it’s wrong to care about whether the AI has planned out your whole life in ad­vance, in de­tail? (Worse: Depend­ing on how strictly you in­ter­pret it, this in­junc­tion might even rule out not want­ing the AI to run con­scious simu­la­tions of peo­ple.)

But nev­er­the­less, I be­lieve the “any­thing she damn well likes” needs to be qual­ified. Imag­ine that Alice and Carol both have an AI, and for­tu­itously, both AIs have been pro­grammed with the same prefer­ences and the same Bayesian prior (and they talk, so they also have the same pos­te­rior, be­cause Bayesi­ans can­not agree to dis­agree). But Alice’s AI has taken over the stock mar­kets, while Carol’s AI has seized the world’s nu­clear ar­se­nals (and is pro­tect­ing them well). So Alice’s AI not only doesn’t want to blow up Earth, it couldn’t do so even if it wanted to; it couldn’t even bribe Carol’s AI, be­cause Carol’s AI re­ally doesn’t want the Earth blown up ei­ther. And so, if it makes a differ­ence to the AIs’ prefer­ence func­tion whether they could blow up Earth if they wanted to, they have a con­flict of in­ter­est.

The moral of this story is not sim­ply that it would be sad if two AIs came into con­flict even though they have the same prefer­ences. The point is that we’re ask­ing what it means to have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture, and it doesn’t look like our AIs are on the same bear­ing. Surely, a di­rec­tion for steer­ing the world should only de­pend on fea­tures of the world, not on ad­di­tional in­for­ma­tion about which agent is at the rud­der.

You can want to not have your life planned out by an AI. But I think you should have to state your wish as a prop­erty of the world: you want all AIs to re­frain from do­ing so, not just “what­ever AI hap­pens to be ex­e­cut­ing this”. And Alice can want Bob to get a larger bonus if the com­pany could have as­signed him to the new pro­ject and de­cided not to, but she must figure out whether this is the cor­rect way to trans­late her moral in­tu­itions into prefer­ences over prop­er­ties of the world.

*

You may care about any fea­ture of the world, but you don’t in fact care about most of them. For ex­am­ple, there are many ways the atoms in the sun could be ar­ranged that all add up to the same thing as far as you are con­cerned, and you don’t have ter­mi­nal prefer­ences about which of these will be the ac­tual one to­mor­row. And though you might care about some prop­er­ties of the al­gorithms your AI is run­ning, mostly they re­ally do not mat­ter.

Let’s define a func­tion that takes a com­plete de­scrip­tion of the world — past, pre­sent and fu­ture — and re­turns a data struc­ture con­tain­ing all in­for­ma­tion about the world that mat­ters to your ter­mi­nal val­ues, and only that in­for­ma­tion. (Our imag­i­nary perfect Bayesian doesn’t know ex­actly which way the world will turn out, but it can work with “pos­si­ble wor­lds”, com­plete de­scrip­tions of ways the world may turn out.) We’ll call this data struc­ture an “out­come”, and we re­quire you to be in­differ­ent be­tween any two courses of ac­tion that will always pro­duce the same out­come. Of course, any course of ac­tion is some­thing that your AI would be ex­e­cut­ing in the ac­tual world, and you are cer­tainly al­lowed to care about the differ­ence — but then the two courses of ac­tion do not lead to the same “out­come”!1

With this defi­ni­tion, I think it is pretty rea­son­able to say that in or­der to have a con­sis­tent di­rec­tion in which you want to steer the world, you must be able to or­der these out­comes from best to worst, and always want to pick the least bad you can get.

*

Prefer­ence relations

That won’t be suffi­cient, though. Our ge­nie doesn’t know what out­come each ac­tion will pro­duce, it only has prob­a­bil­is­tic in­for­ma­tion about that, and that’s a com­pli­ca­tion we very much do not want to ideal­ize away (be­cause we’re try­ing to figure out the right way to deal with it). And so our de­ci­sion the­ory amends the ear­lier re­quire­ment: You must not only be in­differ­ent be­tween ac­tions that always pro­duce the same out­come, but also be­tween all ac­tions that only yield the same prob­a­bil­ity dis­tri­bu­tion over out­comes.

This is not at all a mild as­sump­tion, though it’s usu­ally built so deeply into the defi­ni­tions that it’s not even called an “ax­iom”. But we’ve as­sumed that all fea­tures of the world you care about are already en­coded in the out­comes, so it does seem to me that the only rea­son left why you might pre­fer one ac­tion over an­other is that it gives you a bet­ter trade-off in terms of what out­comes it makes more or less likely; and I’ve as­sumed that you’re already a Bayesian, so you agree that how likely it makes an out­come is cor­rectly rep­re­sented by the prob­a­bil­ity of that out­come, given the ac­tion. So it cer­tainly seems that the prob­a­bil­ity dis­tri­bu­tion over out­comes should give you all the in­for­ma­tion about an ac­tion that you could pos­si­bly care about. And that you should be able to or­der these prob­a­bil­ity dis­tri­bu­tions from best to worst, and all that.

For­mally, we rep­re­sent a di­rec­tion for steer­ing the world as a set $\mathcal{O}$ of pos­si­ble out­comes and a bi­nary re­la­tion $\succcurlyeq$ on the prob­a­bil­ity dis­tri­bu­tions over $\mathcal{O}$ (with $x\succcurlyeq y$ is in­ter­preted as ”$x$ is at least as good as $y$”) which is a to­tal pre­order; that is, for all $x$, $y$ and $z$:

• If $x\succcurlyeq y$ and $y\succcurlyeq z$, then $x\succcurlyeq z$ (that is, $\succcurlyeq$ is tran­si­tive); and

• We have ei­ther $x\succcurlyeq y$ or $y\succcurlyeq x$ or both (that is, $\succcurlyeq$ is to­tal).

In this post, I’ll as­sume that $\mathcal{O}$ is finite. We write $x\sim y$ (for “I’m in­differ­ent be­tween $x$ and $y$”) when both $x\succcurlyeq y$ and $y\succcurlyeq x$, and we write $x\succ y$ (”$x$ is strictly bet­ter than $y$”) when $x\succcurlyeq y$ but not $y\succcurlyeq x$. Our ge­nie will com­pute the set of all ac­tions it could pos­si­bly take, and the prob­a­bil­ity dis­tri­bu­tion over pos­si­ble out­comes that (ac­cord­ing to the ge­nie’s Bayesian pos­te­rior) each of these ac­tions leads to, and then it will choose to act in a way that max­i­mizes $\succcurlyeq$. I’ll also as­sume that the set of pos­si­ble ac­tions will always be finite, so there is always at least one op­ti­mal ac­tion.

Med­i­ta­tion: Omega is in the neigh­bour­hood and in­vites you to par­ti­ci­pate in one of its lit­tle games. Next Satur­day, it plans to flip a fair coin; would you please in­di­cate on the at­tached form whether you would like to bet that this coin will fall heads, or tails? If you cor­rectly bet heads, you will win $10,000; if you cor­rectly bet tails, you’ll win$100. If you bet wrongly, you will still re­ceive $1 for your par­ti­ci­pa­tion. We’ll as­sume that you pre­fer a 50% chance of$10,000 and a 50% chance of $1 to a 50% chance of$100 and a 50% chance of $1. Thus, our the­ory would say that you should bet heads. But there is a twist: Given re­cent galac­topoli­ti­cal events, you es­ti­mate a 3% chance that af­ter post­ing its let­ter, Omega has been called away on ur­gent busi­ness. In this case, the game will be can­cel­led and you won’t get any money, though as a con­so­la­tion, Omega will prob­a­bly send you some book from its rare SF col­lec­tion when it re­turns (mar­ket value: ap­prox­i­mately$55–$70). Our the­ory so far tells you noth­ing about how you should bet in this case, but does Ra­tion­al­ity have any­thing to say about it? ... ... ... * The Ax­iom of Independence So here’s how I think about that prob­lem: If you already knew that Omega is still in the neigh­bour­hood (but not which way the coin is go­ing to fall), you would pre­fer to bet heads, and if you knew it has been called away, you wouldn’t care. (And what you bet has no in­fluence on whether Omega has been called away.) So heads is ei­ther bet­ter or ex­actly the same; clearly, you should bet heads. This type of rea­son­ing is the con­tent of the von Neu­mann-Mor­gen­stern Ax­iom of In­de­pen­dence. Ap­par­ently, that’s the most con­tro­ver­sial of the the­ory’s ax­ioms. You’re already a Bayesian, so you already ac­cept that if you perform an ex­per­i­ment to de­ter­mine whether some­one is a witch, and the ex­per­i­ment can come out two ways, then if one of these out­comes is ev­i­dence that the per­son is a witch, the other out­come must be ev­i­dence that they are not. New in­for­ma­tion is al­lowed to make a hy­poth­e­sis more likely, but not pre­dictably so; if all ways the ex­per­i­ment could come out make the hy­poth­e­sis more likely, then you should already be find­ing it more likely than you do. The same thing is true even if only one re­sult would make the hy­poth­e­sis more likely, but the other would leave your prob­a­bil­ity es­ti­mate ex­actly un­changed. The Ax­iom of In­de­pen­dence is equiv­a­lent to say­ing that if you’re eval­u­at­ing a pos­si­ble course of ac­tion, and one ex­per­i­men­tal re­sult would make it seem more at­trac­tive than it cur­rently seems to you, while the other ex­per­i­men­tal re­sult would at least make it seem no less at­trac­tive, then you should already be find­ing it more at­trac­tive than you do. This does seem rather solid to me. * So what does this ax­iom say for­mally? (Feel free to skip this sec­tion if you don’t care.) Sup­pose that your ge­nie is con­sid­er­ing two pos­si­ble ac­tions $a_1$ and $a_2$ (bet heads or tails), and an event $E$ (Omega is called away). Each ac­tion gives rise to a prob­a­bil­ity dis­tri­bu­tion over pos­si­ble out­comes: E.g., $\mathbb{P}[i\mid a_1]$ is the prob­a­bil­ity of out­come $i\in\mathcal{O}$ if your ge­nie chooses $a_1$. But your ge­nie can also com­pute a prob­a­bil­ity dis­tri­bu­tion con­di­tional on $E$, $\mathbb{P}[i\mid E,a_1]$. Sup­pose that con­di­tional on $E$, it doesn’t mat­ter which ac­tion you pick: $\mathbb{P}[i\mid E,a_1] \,=\, \mathbb{P}[i\mid E,a_2]$ for all $i\in\mathcal{O}$. And fi­nally, sup­pose that the prob­a­bil­ity of $E$ doesn’t de­pend on which ac­tion you pick: $\mathbb{P}[E\mid a_1]\,=\,\mathbb{P}[E\mid a_2]\,=:\, p$, with $0 < p < 1$. The Ax­iom of In­de­pen­dence says that in this situ­a­tion, you should pre­fer the dis­tri­bu­tion $\mathbb{P}[\;\cdot\mid a_1]$ to the dis­tri­bu­tion $\mathbb{P}[\;\cdot\mid a_2]$, and there­fore pre­fer $a_1$ to $a_2$, if and only if you pre­fer the dis­tri­bu­tion $\mathbb{P}[\;\cdot\mid\neg E,a_1]$ to the dis­tri­bu­tion $\mathbb{P}[\;\cdot\mid\neg E,a_2]$. Let’s write $x$ for the dis­tri­bu­tion $\mathbb{P}[\;\cdot\mid\neg E,a_1]$, $y$ for the dis­tri­bu­tion $\mathbb{P}[\;\cdot\mid\neg E,a_2]$, and $z$ for the dis­tri­bu­tion $\mathbb{P}[\;\cdot\mid E,a_1] \,=\, \mathbb{P}[\;\cdot\mid E,a_2]$. (For­mally, we think of these as vec­tors in $\mathbb{R}^{|\mathcal{O}|}$: e.g., $z_i \,=\, \mathbb{P}[i\mid E,a_1]$.) For all $i\in\mathcal{O}$, we have $\mathbb{P}[i\mid a_1] \;=\; \mathbb{P}[\neg E\mid a_1]\cdot\mathbb{P}[i\mid\neg E,a_1] \;+\; \mathbb{P}[E\mid a_1]\cdot\mathbb{P}[i\mid E,a_1],$ so $\mathbb{P}[\;\cdot\mid a_1] = (1-p)x + pz$, and similarly $\mathbb{P}[\;\cdot\mid a_2] \,=\, (1-p)y + pz$. Thus, we can state the Ax­iom of In­de­pen­dence as fol­lows: • $(1-p)x + pz \,\succcurlyeq\, (1-p)y + pz \;\iff\; x\succcurlyeq y$. We’ll as­sume that you can’t ever rule out the pos­si­bil­ity that your AI might face this type of situ­a­tion for any given $x$, $y$, $z$, and $p$, so we re­quire that this con­di­tion hold for all prob­a­bil­ity dis­tri­bu­tions $x$, $y$ and $z$, and for all $p$ with $0. * Here’s a com­mon crit­i­cism of In­de­pen­dence. Sup­pose a par­ent has two chil­dren, and one old car that they can give to one of these chil­dren. Can’t they be in­differ­ent be­tween giv­ing the car to their older child or their younger child, but strictly pre­fer throw­ing a coin? But let $x = z$ mean that the younger child gets the gift, and $y$ that the older child gets it, and $p = 1/2$; then by In­de­pen­dence, if $x\sim y$, then $\textstyle x \;=\; \frac12x + \frac12 z \;\sim\; \frac12y + \frac12 z$, so it would seem that the par­ent can not strictly pre­fer the coin throw. In fair­ness, the peo­ple who find this crit­i­cism per­sua­sive may not be Bayesi­ans. But if you think this is a good crit­i­cism: Do you think that the par­ent must be in­differ­ent be­tween throw­ing a coin and ask­ing the chil­dren’s crazy old kinder­garten teacher which of them was bet­ter-be­haved, as long as they as­sign 50% prob­a­bil­ity to ei­ther an­swer? Be­cause if not, shouldn’t you already have protested when we de­cided that de­ci­sions must only de­pend on the prob­a­bil­ities of differ­ent out­comes? My own re­s­olu­tion is that this is an­other case of ter­mi­nal val­ues in­trud­ing where they don’t be­long. All that is rele­vant to the par­ent’s ter­mi­nal val­ues must already be de­scribed in the out­come; the par­ent is al­lowed to pre­fer “I threw a coin and my younger child got the car” to “I de­cided that my younger child would get the car” or “I asked the kinder­garten teacher and they thought my younger child was bet­ter-be­haved”, but if so, then these must already be differ­ent out­comes. The thing to re­mem­ber is that it isn’t a prop­erty of the world that ei­ther child had a 50% prob­a­bil­ity of get­ting the car, and you can’t steer the fu­ture in the di­rec­tion of hav­ing this myth­i­cal prop­erty. It is a prop­erty of the world that the par­ent as­signed a 50% prob­a­bil­ity to each child get­ting the car, and that is a di­rec­tion you can steer in — though the ex­am­ple with the kinder­garten teacher shows that this is prob­a­bly not quite the di­rec­tion you ac­tu­ally wanted. The prefer­ence re­la­tion is only sup­posed to be about trade-offs be­tween prob­a­bil­ity dis­tri­bu­tions; if you’re tempted to say that you want to steer the world to­wards one prob­a­bil­ity dis­tri­bu­tion or an­other, rather than one out­come or other, some­thing has gone ter­ribly wrong. * The Ax­iom of Continuity And… that’s it. Th­ese are all the ax­ioms that I’ll ask you to ac­cept in this post. There is, how­ever, one more ax­iom in the von Neu­mann-Mor­gen­stern the­ory, the Ax­iom of Con­ti­nu­ity. I do not think this ax­iom is a nec­es­sary re­quire­ment on any co­her­ent plan for steer­ing the world; I think the best ar­gu­ment for it is that it doesn’t make a prac­ti­cal differ­ence whether you adopt it, so you might as well. But there is also a good ar­gu­ment to be made that if we’re talk­ing about any­thing short of steer­ing the en­tire fu­ture of hu­man­ity, your prefer­ences do in fact obey this ax­iom, and it makes things eas­ier tech­ni­cally if we adopt it, so I’ll do that at least for now. Let’s look at an ex­am­ple: If you pre­fer$50 in your pocket to $40, the ax­iom says that there must be some small $\epsilon > 0$ such that you pre­fer a prob­a­bil­ity of $1-\epsilon$ of$50 and a prob­a­bil­ity of $\epsilon$ of dy­ing to­day to a cer­tainty of $40. Some crit­ics seem to see this as the ul­ti­mate re­duc­tio ad ab­sur­dum for the VNM the­ory; they seem to think that no sane hu­man would ac­cept that deal. Eliezer was surely not the first to ob­serve that this prefer­ence is ex­hibited each time some­one drives an ex­tra mile to save$10.

Con­ti­nu­ity says that if you strictly pre­fer $x$ to $y$, then there is no $z$ so ter­rible that you wouldn’t be will­ing to in­cur a small prob­a­bil­ity of it in or­der to (prob­a­bly) get $x$ rather than $y$, and no $z$ so won­der­ful that you’d be will­ing to (prob­a­bly) get $y$ in­stead of $x$ if this gives you some ar­bi­trar­ily small prob­a­bil­ity of get­ting $z$. For­mally, for all $x$, $y$ and $z$,

• If $x\succ y$, then there is an $\epsilon > 0$ such that $(1-\epsilon)x + \epsilon z \;\succ\; y$ and $x \;\succ\; (1-\epsilon)y + \epsilon z$.

I think if we’re talk­ing about ev­ery­day life, we can pretty much rule out that there are things so ter­rible that for ar­bi­trar­ily small $\epsilon$, you’d be will­ing to die with prob­a­bil­ity $1-\epsilon$ to avoid a prob­a­bil­ity of $\epsilon$ of the ter­rible thing. And if you feel that it’s not worth the ex­pense to call a doc­tor ev­ery time you sneeze, you’re will­ing to in­cur a slightly higher prob­a­bil­ity of death in or­der to save some mere money. And it seems un­likely that there is no $\epsilon$ at which you’d pre­fer a cer­tainty of $1 to a chance $\epsilon$ of$100. And if you have some prefer­ence that is so slight that you wouldn’t be will­ing to ac­cept any chance of los­ing $1 in or­der to in­dulge it, it can’t be a very strong prefer­ence. So I think for most prac­ti­cal pur­poses, we might as well ac­cept Con­ti­nu­ity. * The VNM theorem If your prefer­ences are de­scribed by a tran­si­tive and com­plete re­la­tion $\succcurlyeq$ on the prob­a­bil­ity dis­tri­bu­tions over some set $\mathcal{O}$ of “out­comes”, and this re­la­tion satis­fies In­de­pen­dence and Con­ti­nu­ity, then you have a util­ity func­tion, and your ge­nie will be max­i­miz­ing ex­pected util­ity. Here’s what that means. A util­ity func­tion is a func­tion $u : \mathcal{O}\to\mathbb{R}$ which as­signs a nu­mer­i­cal “util­ity” to ev­ery out­come. Given a prob­a­bil­ity dis­tri­bu­tion $x$ over $\mathcal{O}$, we can com­pute the ex­pected value of $u$ un­der $x$, $\textstyle\sum_{i\in\mathcal{O}} u(i)\,x_i$; this is called the ex­pected util­ity. We can prove that there is some util­ity func­tion such that for all $x$ and $y$, we have $x\succ y$ if and only if the ex­pected util­ity un­der $x$ is greater than the ex­pected util­ity un­der $y$. In other words: $\succ$ is com­pletely de­scribed by $u$; if you know $u$, you know $\succ$. In­stead of pro­gram­ming your ge­nie with a func­tion that takes two out­comes and says which one is bet­ter, you might as well pro­gram it with a func­tion that takes one out­come and re­turns its util­ity. Any co­her­ent di­rec­tion for steer­ing the world which hap­pens to satisfy Con­ti­nu­ity can be re­duced to a func­tion that takes out­comes and as­signs them nu­mer­i­cal rat­ings. In fact, it turns out that the $u$ for a given $\succ$ is “al­most” unique: Given two util­ity func­tions $u$ and $v$ that de­scribe the same $\succ$, there are num­bers $a>0$ and $b\in\mathbb{R}$ such that for all $i\in\mathcal{O}$, $v(i) = au(i) + b$; this is called an “af­fine trans­for­ma­tion”. On the other hand, it’s not hard to see that for any such $a$ and $b$, $\sum_{i\in\mathcal{O}} u(i)\,x_i > \sum_{i\in\mathcal{O}} u(i)\,y_i \;\iff\; \sum_{i\in\mathcal{O}} \big(au(i) + b\big)\,x_i > \sum_{i\in\mathcal{O}} \big(au(i) + b\big)\,y_i,$ so two util­ity func­tions rep­re­sent the same prefer­ence re­la­tion if and only if they are re­lated in this way. * You shouldn’t read too much into this con­cep­tion of util­ity. For ex­am­ple, it doesn’t make sense to see a fun­da­men­tal dis­tinc­tion be­tween out­comes with “pos­i­tive” and with “nega­tive” von Neu­mann-Mor­gen­stern util­ity — be­cause adding the right $b$ can make any nega­tive util­ity pos­i­tive and any pos­i­tive util­ity nega­tive, with­out chang­ing the un­der­ly­ing prefer­ence re­la­tion. The num­bers that have real mean­ing are ra­tios be­tween differ­ences be­tween util­ities, $\textstyle\frac{u(i) - u(j)}{u(k) - u(\ell)}$, be­cause these don’t change un­der af­fine trans­for­ma­tions (the $b$‘s can­cel when you take the differ­ence, and the $a$’s can­cel when you take the ra­tio). Academian’s post has more about mi­s­un­der­stand­ings of VNM util­ity. In my view, what VNM util­ities rep­re­sent is not nec­es­sar­ily how good each out­come is; what they rep­re­sent is what trade-offs be­tween prob­a­bil­ity dis­tri­bu­tions you are will­ing to ac­cept. Now, if you strongly felt that the differ­ence be­tween $i$ and $j$ was about the same as the differ­ence be­tween $k$ and $\ell$, then you should have a very good rea­son be­fore you make your $\textstyle\frac{u(i) - u(j)}{u(k) - u(\ell)}$ a huge num­ber. But on the other hand, I think it’s ul­ti­mately your re­spon­si­bil­ity to de­cide what trade-offs you are will­ing to make; I don’t think you can get away with “stat­ing how much you value differ­ent out­comes” and out­sourc­ing the rest of the job to de­ci­sion the­ory, with­out ever con­sid­er­ing what these val­u­a­tions should mean in terms of prob­a­bil­is­tic trade-offs. * Do­ing with­out Continuity What hap­pens if your prefer­ences do not satisfy Con­ti­nu­ity? Say, you want to save hu­man lives, but you’re not will­ing to in­cur any prob­a­bil­ity, no mat­ter how small, of in­finitely many peo­ple get­ting tor­tured in­finitely long for this? I do not see a good ar­gu­ment that this couldn’t add up to a co­her­ent di­rec­tion for steer­ing the world. I do, how­ever, see an ar­gu­ment that in this case you care so lit­tle about finite num­bers of hu­man lives that in prac­tice, you can prob­a­bly ne­glect this con­cern en­tirely. (As a re­sult, I doubt that your re­flec­tive equil­ibrium would want to adopt such prefer­ences. But I don’t think they’re in­co­her­ent.) I’ll as­sume that your moral­ity can still dis­t­in­guish only a finite num­ber of out­comes, and you can choose only be­tween a finite num­ber of de­ci­sions. It’s not ob­vi­ous that these as­sump­tions are jus­tified if we want to take into ac­count the pos­si­bil­ity that the true laws of physics might turn out to al­low for in­finite com­pu­ta­tions, but even in this case you and any AI you build will prob­a­bly still be finite (though it might build a suc­ces­sor that isn’t), so I do in fact think there is a good chance that re­sults de­rived un­der this as­sump­tion have rele­vance in the real world. In this case, it turns out that you still have a util­ity func­tion, in a cer­tain sense. (Proofs for non-stan­dard re­sults can be found in the math ap­pendix to this post. I did the work my­self, but I don’t ex­pect these re­sults to be new.) This util­ity func­tion de­scribes only the con­cern most im­por­tant to you: in our ex­am­ple, only the prob­a­bil­ity of in­finite tor­ture makes a differ­ence to ex­pected util­ity; any change in the prob­a­bil­ity of sav­ing a finite num­ber of lives leaves ex­pected util­ity un­changed. Let’s define a re­la­tion $x\succ_* y$, read ”$x$ is much bet­ter than $y$”, which says that there is noth­ing you wouldn’t give up a lit­tle prob­a­bil­ity of in or­der to get $x$ in­stead of $y$ — in our ex­am­ple: $x$ doesn’t merely save lives com­pared to $y$, it makes in­finite tor­ture less likely. For­mally, we define $x\succ_* y$ to mean that $x'\succ y'$ for all $x'$ and $y'$ “close enough” to $x$ and $y$ re­spec­tively; more pre­cisely: $x\succ_* y$ if there is an $\epsilon>0$ such that $x'\succ y'$ for all $x'$ and $y'$ with $\sum_{i\in\mathcal{O}}|x'_i - x_i| \;<\; \epsilon \qquad\text{and}\qquad \sum_{i\in\mathcal{O}} |y'_i - y_i| \;<\; \epsilon.$ (Or equiv­a­lently: if there are open sets $U$ and $V$ around $x$ and $y$, re­spec­tively, such that $x'\succ y'$ for all $x'\in U$ and $y'\in V$.) It turns out that if $\succ$ is a prefer­ence re­la­tion satis­fy­ing In­de­pen­dence, then $\succ_*$ is a prefer­ence re­la­tion satis­fy­ing In­de­pen­dence and Con­ti­nu­ity, and there is a util­ity func­tion $u$ such that $x\succ_* y$ iff the ex­pected util­ity un­der $x$ is larger than the ex­pected util­ity un­der $y$. Ob­vi­ously, $x\succ_* y$ im­plies $x\succ y$, so when­ever two op­tions have differ­ent ex­pected util­ities, you pre­fer the one with the larger ex­pected util­ity. Your ge­nie is still an ex­pected util­ity max­i­mizer. Fur­ther­more, un­less $x\sim y$ for all $x$ and $y$, $u$ isn’t con­stant — that is, there are some $x$ and $y$ with $x\succ_* y$. (If this weren’t the case, the re­sult above ob­vi­ously wouldn’t tell us very much about $\succ$!) Be­ing in­differ­ent be­tween all pos­si­ble ac­tions doesn’t make for a par­tic­u­larly in­ter­est­ing di­rec­tion for steer­ing the world, if it can be called one at all, so from now on let’s as­sume that you are not. * It can hap­pen that there are two dis­tri­bu­tions $x$ and $y$ with the same ex­pected util­ity, but $x\succ y$. ($x$ saves more lives, but the prob­a­bil­ity of eter­nal tor­ture is the same.) Thus, if your ge­nie hap­pens to face a choice be­tween two ac­tions that lead to the same ex­pected util­ity, it must do more work to figure out which of the ac­tions it should take. But there is some rea­son to ex­pect that such situ­a­tions should be rare. If there are $N$ pos­si­ble out­comes, then the set $\Delta_N$ of prob­a­bil­ity dis­tri­bu­tions over$\mathcal{O}$ is $(N-1)$-di­men­sional (be­cause the prob­a­bil­ities must add up to 1, so if you know $N-1$ of them, you can figure out the last one). For ex­am­ple, if there are three out­comes, $\Delta_N$ is a tri­an­gle, and if there are four out­comes, it’s a tetra­he­dron. On the other hand, it turns out that for any $r\in\mathbb{R}$, the set of all $x\in\Delta_N$ for which the ex­pected util­ity equals $r$ has di­men­sion $N-2$ or smaller: if $N=3$, it’s a line (or a point or the empty set); if $N=4$, it’s a plane (or a line or a point or the empty set). Thus, in or­der to have the same ex­pected util­ity, $x$ and $y$ must lie on the same hy­per­plane — not just on a plane very close by, but on ex­actly the same plane. That’s not just a small tar­get to hit, that’s an in­finitely small tar­get. If you use, say, a Solomonoff prior, then it seems very un­likely that two of your finitely many op­tions just hap­pen to lead to prob­a­bil­ity dis­tri­bu­tions which yield the same ex­pected util­ity. But we are bounded ra­tio­nal­ists, not perfect Bayesi­ans with un­com­putable Solomonoff pri­ors. We as­sign heads and tails ex­actly the same prob­a­bil­ity, not be­cause there is no in­for­ma­tion that would make one or the other more likely (we could try to ar­rive at a best guess about which side is a lit­tle heav­ier than the other?), but be­cause the prob­lem is so com­pli­cated that we sim­ply give up on it. What if it turns out that be­cause of this, all the difficult de­ci­sions we need to make turn out to be be­tween ac­tions that hap­pen to have the same ex­pected util­ity? If you do your im­perfect calcu­la­tion and find that two of your op­tions seem to yield ex­actly the same prob­a­bil­ity of eter­nal hell for in­finitely many peo­ple, you could then try to figure out which of them is more likely to save a finite num­ber of lives. But it seems to me that this is not the best ap­prox­i­ma­tion of an ideal Bayesian with your stated prefer­ences. Shouldn’t you spend those com­pu­ta­tional re­sources on do­ing a bet­ter calcu­la­tion of which op­tion is more likely to lead to eter­nal hell? For you might ar­rive at a new es­ti­mate un­der which the prob­a­bil­ities of hell are at least slightly differ­ent. Even if you sus­pect that the new calcu­la­tion will again come out with the prob­a­bil­ities ex­actly equal, you don’t know that. And there­fore, can you truly in good con­science ar­gue that do­ing the new calcu­la­tion does not im­prove the odds of avoid­ing hell — at least a teeny tiny in­cred­ibly su­per-small for all or­di­nary in­tents and pur­poses com­pletely ir­rele­vant bit? Even if it should be the case that to a perfect Bayesian, the ex­pected util­ities un­der a Solomonoff prior were ex­actly the same, you don’t know that, so how can you pos­si­bly jus­tify stop­ping the calcu­la­tion and sav­ing a mere finite num­ber of lives? * So there you have it. In or­der to have a co­her­ent di­rec­tion in which you want to steer the world, you must have a set of out­comes and a prefer­ence re­la­tion over the prob­a­bil­ity dis­tri­bu­tions over these out­comes, and this re­la­tion must satisfy In­de­pen­dence — or so it seems to me, any­way. And if you do, then you have a util­ity func­tion, and a perfect Bayesian max­i­miz­ing your prefer­ences will always max­i­mize ex­pected util­ity. It could hap­pen that two op­tions have ex­actly the same ex­pected util­ity, and in this case the util­ity func­tion doesn’t tell you which of these is bet­ter, un­der your prefer­ences; but as a bounded ra­tio­nal­ist, you can never know this, so if you have any com­pu­ta­tional re­sources left that you could spend on figur­ing out what your true prefer­ences have to say, you should spend them on a bet­ter calcu­la­tion of the ex­pected util­ities in­stead. Given this, we might as well just talk about $\succ_*$, which satis­fies Con­ti­nu­ity as well as In­de­pen­dence, in­stead of $\succ$; and you might as well pro­gram your ge­nie with your util­ity func­tion, which only re­flects $\succ_*$, in­stead of with your true prefer­ences. (Note: I am not liter­ally say­ing that you should not try to un­der­stand the whole topic bet­ter than this if you are ac­tu­ally go­ing to pro­gram a Friendly AI. This is still meant as a metaphor. I am, how­ever, say­ing that ex­pected util­ity the­ory, even with bor­ing old real num­bers as util­ities, is not to be dis­carded lightly.) * Next post: Deal­ing with time So far, we’ve always pre­tended that you only face one choice, at one point in time. But not only is there a way to ap­ply our the­ory to re­peated in­ter­ac­tions with the en­vi­ron­ment — there are two! One way is to say that at each point in time, you should ap­ply de­ci­sion the­ory to set of ac­tions you can perform at that point. Now, the ac­tual out­come de­pends of course not only on what you do now, but also on what you do later; but you know that you’ll still use de­ci­sion the­ory later, so you can fore­see what you will do in any pos­si­ble fu­ture situ­a­tion, and take it into ac­count when com­put­ing what ac­tion you should choose now. The sec­ond way is to make a choice only once, not be­tween the ac­tions you can take at that point in time, but be­tween com­plete plans — gi­ant lookup ta­bles — which spec­ify how you will be­have in any situ­a­tion you might pos­si­bly face. Thus, you sim­ply do your ex­pected util­ity calcu­la­tion once, and then stick with the plan you have de­cided on. Med­i­ta­tion: Which of these is the right thing to do, if you have a perfect Bayesian ge­nie and you want steer the fu­ture in some par­tic­u­lar di­rec­tion? (Does it even make a differ­ence which one you use?) » To the math­e­mat­i­cal appendix Notes 1 The ac­counts of de­ci­sion the­ory I’ve read use the term “out­come”, or “con­se­quence”, but leave it mostly un­defined; in a lot­tery, it’s the prize you get at the end, but clearly no­body is say­ing de­ci­sion the­ory should only ap­ply to lot­ter­ies. I’m not chang­ing its role in the math­e­mat­ics, and I think my ex­pla­na­tion of it is what the term always wanted to mean; I ex­pect that other peo­ple have ex­plained it in similar ways, though I’m not sure how similar pre­cisely. • This type of ar­gu­ment strikes me as analo­gous to us­ing Ar­row’s the­o­rem to ar­gue that we must im­ple­ment a dic­ta­tor­ship. • You know, given Ar­row’s re­sult, and given the ob­ser­va­tion com­monly made around here that there are lots of lit­tle agents run­ning around in our head, it is not so sur­pris­ing that hu­man be­ings ex­hibit “in­co­her­ent be­hav­ior.” It’s a con­se­quence of our mind ar­chi­tec­ture. I am not sure I am pre­pared to start cul­ling my in­ter­nal Congress just so I can have a co­her­ent util­ity func­tion that makes the sur­vivors happy. • But the post is an ar­gu­ment for us­ing car­di­nal util­ity (VNM util­ity)! And Ar­row’s “im­pos­si­bil­ity” the­o­rem only ap­plies when try­ing to ag­gre­gate or­di­nal util­ities across vot­ers. It is well-known that vot­ing sys­tems which ag­gre­gate car­di­nal util­ity, such as Range Vot­ing can es­cape the im­pos­si­bil­ity the­o­rem. So Ar­row is ac­tu­ally an­other rea­son for hav­ing a VNM util­ity func­tion: it al­lows col­lec­tively ra­tio­nal de­ci­sions, as well as in­di­vi­d­u­ally ra­tio­nal de­ci­sions. • Analo­gous in what way? • As the The­o­rem treats them, vot­ers are already util­ity-max­i­miz­ing agents who have a clear prefer­ence set which they act on in ra­tio­nal ways. The ques­tion: how to ag­gre­gate these? It turns out that if you want cer­tain su­perfi­cially rea­son­able things out of a vot­ing pro­cess from such agents—noth­ing gets cho­sen at ran­dom, it doesn’t mat­ter how you cut up choices or what­ever, &c. - you’re in for dis­ap­point­ment. There isn’t ac­tu­ally a way to have a group that is it­self ra­tio­nally agen­tic in the pre­cise way the The­o­rem pos­tu­lates. One bul­let you could bite is hav­ing a dic­ta­tor. Then none of the in­con­sis­ten­cies arise from hav­ing all these ex­tra prefer­ence sets ly­ing around be­cause there’s only one and it’s perfectly co­her­ent. This is very eas­ily com­pa­rable to re­duc­ing all of your own prefer­ences into a sin­gle co­her­ent util­ity func­tion. • Both in­volve tak­ing a math­e­mat­i­cal re­sult about the only way to do some­thing in a way that satis­fies cer­tain in­tu­itively ap­peal­ing prop­er­ties, and us­ing it to ar­gue that we there­fore should do it that way. • A dic­ta­tor­ship isn’t the only re­s­olu­tion to Ar­row’s the­o­rem. Any­way, this sounds like a rather weak ar­gu­ment against the po­si­tion. • Not re­ally, be­cause the ar­gu­ment isn’t that you should do any­thing differ­ently at all. It says that there’s some util­ity func­tion that rep­re­sents your prefer­ences, some ex­pected-util­ity-max­i­miz­ing ge­nie that makes the same choices as you, but it doesn’t tell you to have differ­ent prefer­ences, or make differ­ent de­ci­sions un­der any cir­cum­stances. In fact, I don’t re­ally know why this post is called “Why you must max­i­mize ex­pected util­ity” in­stead of “Why you already max­i­mize ex­pected util­ity.” It seems that even if I have some al­gorithm that is on the sur­face not max­i­miz­ing ex­pected util­ity, such as be­ing risk-averse in some way deal­ing with money, then I’m re­ally just max­i­miz­ing the ex­pected value of a non-ob­vi­ous util­ity func­tion. • No. Most hu­mans do not max­i­mize ex­pected util­ity with re­spect to any util­ity func­tion what­so­ever be­cause they have prefer­ences which vi­o­late the hy­pothe­ses of the VNM the­o­rem. For ex­am­ple, fram­ing effects show that hu­mans do not even con­sis­tently have the same prefer­ences re­gard­ing fixed prob­a­bil­ity dis­tri­bu­tions over out­comes (but that their prefer­ences change de­pend­ing on whether the out­comes are de­scribed in terms of gains or losses). Edit: in other words, the VNM the­o­rem shows that “you must max­i­mize ex­pected util­ity” is equiv­a­lent to “your prefer­ences should satisfy the hy­pothe­ses of the VNM the­o­rem” (and not all of these hy­pothe­ses are en­cap­su­lated in the VNM ax­ioms), and this is a state­ment with non­triv­ial con­tent. • No. Most hu­mans do not max­i­mize the ex­pected util­ity of any util­ity func­tion what­so­ever be­cause they have prefer­ences which vi­o­late the hy­pothe­ses of the VNM the­o­rem. Ax­ioms? (Hy­pothe­ses does seem to quite fit. One could have a hy­poth­e­sis that hu­mans had prefer­ences that are in ac­cord with the VNM ax­ioms and falsify said the­o­rem but the VNM doesn’t make the hy­poth­e­sis it­self.) • In the nomen­cla­ture that I think is rel­a­tively stan­dard among math­e­mat­i­ci­ans, if a the­o­rem states “if P1, P2, … then Q” then P1, P2, … are the hy­pothe­ses of the the­o­rem and Q is the con­clu­sion. One of the hy­pothe­ses of the VNM the­o­rem, which isn’t strictly speak­ing one of the von Neu­mann-Mor­gen­stern ax­ioms, is that you as­sign con­sis­tent prefer­ences at all (that is, that the de­ci­sion of whether you pre­fer A to B de­pends only on what A and B are). I’m not us­ing “con­sis­tent” here in the same sense as the Wikipe­dia ar­ti­cle does when talk­ing about tran­si­tivity; I mean con­sis­tent over time. (Edit: Eliezer uses “in­co­her­ent”; maybe that’s a bet­ter word.) • Again, among math­e­mat­i­ci­ans, I think “hy­pothe­ses” is more com­mon. Ex­hibit A; Ex­hibit B. I would guess that “premises” is more com­mon among philoso­phers...? • I usu­ally say “as­sump­tions”, but I’m nei­ther a math­e­mat­i­cian nor a philoso­pher. I do say “hy­pothe­ses” if for some rea­son I’m wear­ing math­e­mat­i­cian at­tire. • It seems that even if I have some al­gorithm that is on the sur­face not max­i­miz­ing ex­pected util­ity, such as be­ing risk-averse in some way deal­ing with money, then I’m re­ally just max­i­miz­ing the ex­pected value of a non-ob­vi­ous util­ity func­tion. Not all de­ci­sion al­gorithms are util­ity-max­imis­ing al­gorithms. If this were not so, the ax­ioms of the VNM the­o­rem would not be nec­es­sary. But they are nec­es­sary: the con­clu­sion re­quires the ax­ioms, and when ax­ioms are dropped, de­ci­sion al­gorithms vi­o­lat­ing the con­clu­sion ex­ist. For ex­am­ple, sup­pose that given a choice be­tween A and B it chooses A; be­tween B and C it chooses B; be­tween C and A it chooses C. No util­ity func­tion de­scribes this de­ci­sion al­gorithm. Sup­pose that given a choice be­tween A and B it never makes a choice. No util­ity func­tion de­scribes this de­ci­sion al­gorithm. Another way that a de­ci­sion al­gorithm can fail to have an as­so­ci­ated util­ity func­tion is by ly­ing out­side the on­tol­ogy of the VNM the­o­rem. The VNM the­o­rem treats only of de­ci­sions over prob­a­bil­ity dis­tri­bu­tions of out­comes. De­ci­sions can be made over many other things. And what is an “out­come”? Can it be any­thing less than the com­plete state of the agent’s en­tire pos­i­tive light-cone? If not, it is prac­ti­cally im­pos­si­ble to calcu­late with; but if it can be smaller, what counts as an out­come and what does not? Here is an­other de­ci­sion al­gorithm. It is the one im­ple­mented by a room ther­mo­stat. It has two pos­si­ble ac­tions: turn the heat­ing on, or turn the heat­ing off. It has two sen­sors: one for the ac­tual tem­per­a­ture and one for the set-point tem­per­a­ture. Its de­ci­sions are given by this al­gorithm: if the tem­per­a­ture falls 0.5 de­grees be­low the set point, turn the heat­ing on; if it rises 0.5 de­grees above the set-point, turn the heat­ing off. Ex­er­cise: what re­la­tion­ship holds be­tween this sys­tem, the VNM the­o­rem, and util­ity func­tions? • Med­i­ta­tion: So far, we’ve always pre­tended that you only face one choice, at one point in time. But not only is there a way to ap­ply our the­ory to re­peated in­ter­ac­tions with the en­vi­ron­ment — there are two! One way is to say that at each point in time, you should ap­ply de­ci­sion the­ory to set of ac­tions you can perform at that point. Now, the ac­tual out­come de­pends of course not only on what you do now, but also on what you do later; but you know that you’ll still use de­ci­sion the­ory later, so you can fore­see what you will do in any pos­si­ble fu­ture situ­a­tion, and take it into ac­count when com­put­ing what ac­tion you should choose now. The sec­ond way is to make a choice only once, not be­tween the ac­tions you can take at that point in time, but be­tween com­plete plans — gi­ant lookup ta­bles — which spec­ify how you will be­have in any situ­a­tion you might pos­si­bly face. Thus, you sim­ply do your ex­pected util­ity calcu­la­tion once, and then stick with the plan you have de­cided on. Which of these is the right thing to do, if you have a perfect Bayesian ge­nie and you want steer the fu­ture in some par­tic­u­lar di­rec­tion? (Does it even make a differ­ence which one you use?) • “Ap­ply de­ci­sion the­ory to the set of ac­tions you can perform at that point” is un­der­speci­fied — are you com­put­ing coun­ter­fac­tu­als the way CDT does, or EDT, TDT, etc? This ques­tion sounds like a fuzzier way of ask­ing which de­ci­sion the­ory to use, but maybe I’ve missed the point. • I re­ally like this trend of adding med­i­ta­tions to posts, ask­ing peo­ple to figure some­thing out not just on their own but here and out loud. • Does it mat­ter if your util­ity func­tion is con­stant with re­spect to time, pro­vided that the most preferred out­come changes rarely? • There is no dis­tinc­tion be­tween these. How do you con­struct this hy­po­thet­i­cal lookup table? By ap­ply­ing de­ci­sion the­ory to ev­ery pos­si­ble fu­ture his­tory. In other words, by ap­ply­ing op­tion 1 to calcu­late out ev­ery­thing in ad­vance. But why bother? Ap­ply­ing op­tion 1 as events un­fold will pro­duce re­sults iden­ti­cal to ap­ply­ing it to all pos­si­ble fu­tures now, and avoids the small prob­lem of re­quiring vastly more com­pu­ta­tional re­sources than the uni­verse is ca­pa­ble of hold­ing, run­ning ex­traor­di­nar­ily faster than any­thing is ca­pa­ble of hap­pen­ing, and op­er­at­ing for gi­gan­ti­cally longer than the uni­verse will ex­ist, be­fore you can do any­thing. • Calcu­lat­ing the lo­cally op­ti­mal ac­tion with­out any refer­ence to plans can some­times get you differ­ent re­sults—see the ab­sent­minded driver prob­lem. • I’m not con­vinced that the ab­sent­minded driver prob­lem has such im­pli­ca­tions. Its straight­for­ward (to me) re­s­olu­tion is that the op­ti­mal p is 23 by the ob­vi­ous anal­y­sis, and that the driver can­not use alpha as a prob­a­bil­ity, for rea­sons set out here. But I’d rather not get into a dis­cus­sion of self-refer­en­tial de­ci­sion the­ory, since it doesn’t cur­rently ex­ist. • It is es­sen­tial to both of these para­doxes that they deal with so­cial situ­a­tions. Rephrase them so that the agent is in­ter­act­ing with na­ture, and the para­doxes dis­ap­pear. For ex­am­ple, sup­pose that the par­ent is in­stead col­lect­ing shells on the beach. He has room in his bag for one more shell, and finds two on the ground that he has no prefer­ence be­tween. Clearly, there’s no rea­son he would rather flip a coin to de­cide be­tween them than just pick one of them up, say, the one on the left. What this tells me is that you have to be care­ful us­ing de­ci­sion the­ory in so­cial situ­a­tions, be­cause you have sub­tle, un­spo­ken val­ues that you can eas­ily for­get to take into ac­count. It’s fairly ob­vi­ous in the par­ent and kids ex­am­ple: she has no prefer­ence be­tween them, but she also wants to prove that she has no prefer­ence be­tween them, so she flips the coin. I’m not ex­actly sure what the so­cial drives are in the first ex­am­ple, though. Of course this is not differ­ent from your own solu­tion, only more spe­cific. As you said, the par­ent is al­lowed to pre­fer “I threw a coin and my younger child got the car” to “I de­cided that my younger child would get the car” … but if so, then these must already be differ­ent out­comes. The pres­ence of a coin flip con­sti­tutes a sep­a­rate out­come, be­cause it mat­ters to her ter­mi­nal val­ues that her chil­dren know that she’s not play­ing fa­vorites. • Is the real-world im­per­a­tive “you must max­i­mize ex­pected util­ity”, given by the VNM the­o­rem, stronger or weaker than the im­per­a­tive “ev­ery­one must have the same be­liefs” given by Au­mann’s agree­ment the­o­rem? If only there was some way of com­par­ing these things! One pos­si­ble met­ric is how much money I’m los­ing by not fol­low­ing this or that im­per­a­tive. Can any­one give an es­ti­mate? • Peo­ple can’t or­der out­comes from best to worst. Peo­ple ex­hibit cir­cu­lar prefer­ences. I, my­self, ex­hibit cir­cu­lar prefer­ences. This is a prob­lem for a util­ity-func­tion based the­ory of what I want. • In­ter­est­ing. Ex­am­ple of cir­cu­lar prefer­ences? • There’s a whole liter­a­ture on prefer­ence in­tran­si­tivity, but re­ally, it’s not that hard to catch your­self do­ing it. Just pay at­ten­tion to your pair­wise com­par­i­sons when you’re choos­ing among three or more op­tions, and don’t let your mind cover up its dirty lit­tle se­cret. • Can you give an ex­am­ple of cir­cu­lar prefer­ences that aren’t con­tex­tual and there­fore only su­perfi­cially cir­cu­lar (like Benja’s Alice and coin-flip­ping ex­am­ples are con­tex­tual and only su­perfi­cially ir­ra­tional), and that you en­dorse, rather than re­gard­ing as bugs that should be re­solved some­how? I’m pretty sure that any time I feel like I have in­tran­si­tive prefer­ences, it’s be­cause of things like fram­ing effects or loss aver­sion that I would rather not be sub­ject to. • That does hap­pen to me from time to time, but when it does (and I no­tice that) I just think “hey, I’ve found a bug in my mind­ware” and try to fix that. (Usu­ally it’s a re­sult of some ugh field.) • This would mean, of course, that hu­mans can be money-pumped. In other words, if this is re­ally true, there is a lot of money out there “on the table” for any­one to grab by sim­ply money-pump­ing ar­bi­trary hu­mans. But in real life, if you went and tried to money-pump peo­ple, you would not get very far. But I ac­cept a weaker form of what you are say­ing, that in the nor­mal course of events when peo­ple are not con­sciously think­ing about it we can ex­hibit cir­cu­lar rea­son­ing. But in a situ­a­tion where we ac­tu­ally are sit­ting down and think­ing and calcu­lat­ing about it, we are ca­pa­ble of “re­solv­ing” those ap­par­ently cir­cu­lar prefer­ences. • No, not “of course”. It only im­plies that if they’re ra­tio­nal ac­tors, which of course they are not. They are deal-averse and if they see you try­ing to pump them around in a cir­cle they will take their ball and go home. You can still profit by do­ing one step of the money pump, and peo­ple do. Lots of re­search goes into ex­ploit­ing peo­ple’s cir­cu­lar prefer­ences on things like su­per­mar­ket dis­plays. • I think you are tak­ing my point as some­thing stronger than what I said. As you pointed out, with hu­mans you can of­ten money pump them once, but not more than that. So it can not truly be said that that prefer­ence is fully cir­cu­lar. It is some­thing weaker, and per­haps you could call it a semi-cir­cu­lar prefer­ence. My point was that the thing that hu­mans ex­hibit is not a “cir­cu­lar prefer­ence” in the ful­lest tech­ni­cal sense of the term. • It seems es­sen­tial to the idea of “a co­her­ent di­rec­tion for steer­ing the world” or “prefer­ences” that the or­der­ing be­tween choices does not de­pend on what choices are ac­tu­ally available. But in stan­dard co­op­er­a­tive multi-agent de­ci­sion pro­ce­dures, the or­der­ing does de­pend on the set of choices available. How to make sense of this? Does it mean that a group of more than one agent can’t be said to have a co­her­ent di­rec­tion for steer­ing the world? What is it that they do have then? And if a hu­man should be viewed as a group of sub-agents rep­re­sent­ing differ­ent val­ues and/​or moral the­o­ries, does it mean a hu­man also doesn’t have such a co­her­ent di­rec­tion? • Does it mean that a group of more than one agent can’t be said to have a co­her­ent di­rec­tion for steer­ing the world? That’s in­deed my cur­rent in­tu­ition. Sup­pose that there is a pa­per­clip max­i­mizer and a sta­ples max­i­mizer, and the pa­per­clip max­i­mizer has sole con­trol over all that hap­pens in the uni­verse, and the two have a com­mon prior which as­signs near-cer­tainty to this be­ing the case. Then I ex­pect the uni­verse to be filled with pa­per­clips. But if Sta­ples has con­trol, I ex­pect the uni­verse to be tiled with sta­ples. On the other hand (steal­ing your ex­am­ple, but let’s make it about a phys­i­cal coin­flip, to hope­fully make it non­con­tro­ver­sial): If both pri­ors as­sign 50% prob­a­bil­ity to “Clippy has con­trol and the uni­verse can sup­port 10^10 pa­per­clips or 10^20 sta­ples” and 50% prob­a­bil­ity to “Sta­ples has con­trol and the uni­verse can sup­port 10^10 sta­ples or 10^20 pa­per­clips”, and it turns out that in fact the first of these is true, then I ex­pect Clippy to tile the uni­verse with sta­ples. I dis­agree with Stu­art’s post ar­gu­ing that this means that Nash’s bar­gain­ing solu­tion (NBS) can’t be cor­rect, be­cause it is dy­nam­i­cally in­con­sis­tent, as it gives a differ­ent solu­tion af­ter Clippy up­dates on the in­for­ma­tion that it has sole con­trol. I think this is sim­ply a coun­ter­fac­tual mug­ging: Clippy’s pay­off in the pos­si­ble world where Sta­ples has con­trol de­pends on Clippy’s co­op­er­a­tion in the world where Clippy has con­trol. The usual solu­tion to coun­ter­fac­tual mug­gings is to sim­ply op­ti­mize ex­pected util­ity rel­a­tive to your prior, so the ob­vi­ous thing to do would be to ap­ply NBS to your prior dis­tri­bu­tion, giv­ing you dy­namic con­sis­tency. That said, I’m not say­ing that I’m sure NBS is in fact the right solu­tion. My cur­rent in­tu­ition is that there should be some way to for­mal­ize the “bar­gain­ing power” of each agent, and when hold­ing the bar­gain­ing pow­ers fixed, a group of agents should be steer­ing the world in a co­her­ent di­rec­tion. This sug­gests that the right for­mal­iza­tion of “bar­gain­ing power” would give a non­nega­tive scal­ing fac­tor to each mem­ber of the group, and the group will act to max­i­mize the sum of the agents’ ex­pected util­ities weighed by their re­spec­tive scal­ing fac­tors. (As in Stu­art’s post, the scal­ing fac­tors will of course not be in­var­i­ant un­der af­fine trans­for­ma­tions ap­plied to the agents’ util­ity func­tions—if you mul­ti­ply an agent’s util­ity func­tion by x, you will need to di­vide their scal­ing fac­tor by x in or­der to com­pen­sate.) Of course, at this point this is merely an in­tu­ition, and I do not have a worked-out pro­posal nor a care­ful jus­tifi­ca­tion. And if a hu­man should be viewed as a group of sub-agents rep­re­sent­ing differ­ent val­ues and/​or moral the­o­ries, does it mean a hu­man also doesn’t have such a co­her­ent di­rec­tion? I have to say that this ap­proach does not make much sense to me in the first place, and I’m tempted to take your ques­tion as a modus tol­lens ar­gu­ment against that ap­proach. Maybe it would be use­ful to have a more de­tailed dis­cus­sion about this, but in short, I think as­piring ra­tio­nal­ist hu­mans should see it as their re­spon­si­bil­ity to ac­tu­ally choose one di­rec­tion in which they want to steer the world, rather than spec­i­fy­ing con­flict­ing goals and then ask­ing for some for­mula that will de­cide for them how to trade these goals against each other. If you choose to trade off differ­ent goals by weigh­ing them with differ­ent fac­tors, fine; but if you try to find some ‘laws of ra­tio­nal­ity’ that will tell you the one cor­rect way to trade off these goals, with­out ever need­ing to make a de­ci­sion about this your­self, I think you’re try­ing to pass off a re­spon­si­bil­ity that is prop­erly yours. • I think as­piring ra­tio­nal­ist hu­mans should see it as their re­spon­si­bil­ity to ac­tu­ally choose one di­rec­tion in which they want to steer the world, rather than spec­i­fy­ing con­flict­ing goals and then ask­ing for some for­mula that will de­cide for them how to trade these goals against each other. If you choose to trade off differ­ent goals by weigh­ing them with differ­ent fac­tors, fine; but if you try to find some ‘laws of ra­tio­nal­ity’ that will tell you the one cor­rect way to trade off these goals, with­out ever need­ing to make a de­ci­sion about this your­self, I think you’re try­ing to pass off a re­spon­si­bil­ity that is prop­erly yours. Why so much em­pha­sis on “re­spon­si­bil­ity”? In my mind, I have a re­spon­si­bil­ity to fulfill any promises I make to oth­ers and … and that’s about it. As for figur­ing out what my prefer­ences are, or should be, I’m go­ing to try any promis­ing ap­proaches I can find, and see if one of them works out. Think­ing of my­self as a bunch of sub-agents and us­ing ideas from bar­gain­ing the­ory is one such an ap­proach. Try­ing to solve nor­ma­tive ethics us­ing the meth­ods of moral philoso­phers may be an­other. When you say “see it as their re­spon­si­bil­ity to ac­tu­ally choose one di­rec­tion in which they want to steer the world”, what does that mean, in terms of an ap­proach I can ex­plore? ETA: I wrote a post that may help ex­plain what I meant here. • This sug­gests that the right for­mal­iza­tion of “bar­gain­ing power” would give a non­nega­tive scal­ing fac­tor to each mem­ber of the group, and the group will act to max­i­mize the sum of the agents’ ex­pected util­ities weighed by their re­spec­tive scal­ing fac­tors. … Of course, at this point this is merely an in­tu­ition, and I do not have a worked-out pro­posal nor a care­ful jus­tifi­ca­tion. There is a jus­tifi­ca­tion for that in­tu­ition. Some have ob­jected to the ax­iom that the ag­gre­ga­tion must also be VNM-ra­tio­nal, but Nisan has proved a similar the­o­rem that does not rely on the VNM-ra­tio­nal­ity of the col­lec­tive as an ax­iom. • I do not un­der­stand the first part of the post. As far as I can tell, you are re­spond­ing to con­cerns that have been raised el­se­where (pos­si­bly in your head while dis­cussing the is­sue with your­self) but it is un­clear to me what ex­actly these con­cerns are, so I’m lost. Speci­fi­cally, I do not un­der­stand the fol­low­ing: Med­i­ta­tion: Alice is try­ing to de­cide how large a bonus each mem­ber of her team should get this year. She has just de­cided on giv­ing Bob the same, already large, bonus as last year when she re­ceives an e-mail from the head of a differ­ent di­vi­sion, ask­ing her if she can recom­mend any­one for a new pro­ject he is set­ting up. Alice im­me­di­ately re­al­izes that Bob would love to be on that pro­ject, and would fit the bill ex­actly. But she needs Bob on the con­tract he’s cur­rently work­ing on; los­ing him would be a pretty bad blow for her team. Alice de­cides there is no way that she can recom­mend Bob for the new pro­ject. But she still feels bad about it, and she de­cides to make up for it by giv­ing Bob a larger bonus. On re­flec­tion, she finds that she gen­uinely feels that this is the right thing to do, sim­ply be­cause she could have recom­mended him but didn’t. Does that mean that Alice’s prefer­ences are ir­ra­tional? Or that some­thing is wrong with de­ci­sion the­ory? This ex­am­ple has dis­tract­ing de­tails which I sus­pect are hid­ing the point you’re ac­tu­ally try­ing to make (which I can’t figure out), at least to me. In prac­tice, it seems to me that what Alice is con­cerned with are the so­cial (sig­nal­ing) im­pli­ca­tions of Bob gain­ing knowl­edge of both the bonus and of the pos­si­bil­ity of recom­men­da­tion. My short an­swer is that Alice can care about any­thing she damn well likes. But there are a lot of things that she doesn’t care about, and de­ci­sion the­ory has some­thing to say about those. In fact, de­cid­ing that some kinds of prefer­ences should be out­lawed as ir­ra­tional can be dan­ger­ous: you might think that no­body in their right mind should ever care about the de­tailed plan­ning al­gorithms their AI uses, as long as they work. But how cer­tain are you that it’s wrong to care about whether the AI has planned out your whole life in ad­vance, in de­tail? (Worse: Depend­ing on how strictly you in­ter­pret it, this in­junc­tion might even rule out not want­ing the AI to run con­scious simu­la­tions of peo­ple.) Maybe I’m just be­ing slow right now, but I can’t figure out what this has to do with the dis­cus­sion pre­ced­ing it. The point is that we’re ask­ing what it means to have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture, and it doesn’t look like our AIs are on the same bear­ing. I ei­ther don’t un­der­stand or dis­agree with this. In the situ­a­tion you de­scribe it sounds to me like the two AIs will make differ­ent de­ci­sions in prac­tice for game-the­o­retic rea­sons, but I don’t see why one would sus­pect that they are try­ing to steer the fu­ture in differ­ent di­rec­tions. • In prac­tice, it seems to me that what Alice is con­cerned with are the so­cial (sig­nal­ing) im­pli­ca­tions of Bob gain­ing knowl­edge of both the bonus and of the pos­si­bil­ity of recom­men­da­tion. I am as­sum­ing that Alice, on re­flec­tion, de­cides that she wants to give Bob the higher bonus even if no­body else ever learned that she had the op­por­tu­nity to recom­mend him for the pro­ject, the way I would not want to steal food from a starv­ing per­son even if no­body ever found out about it. The con­cern I’m re­ply­ing to is that de­ci­sion the­ory as­sumes your prefer­ences can be de­scribed by a bi­nary “is preferred to” re­la­tion, but hu­mans might choose op­tion A if the available op­tions are A and B, and op­tion B if the available op­tions are A, B and C, so how do you model that as a bi­nary re­la­tion? I ac­tu­ally don’t re­call see­ing this raised in the con­text of VNM util­ity the­ory, but I be­lieve I’ve seen it in dis­cus­sions of Ar­row’s im­pos­si­bil­ity the­o­rem, where the In­de­pen­dence of Ir­rele­vant Alter­na­tives ax­iom (con­fus­ingly, not the ana­log of VNM’s In­de­pen­dence of Ir­rele­vant Alter­na­tives) says that adding op­tion C must not change the de­ci­sion from A to B. I’m not par­tic­u­larly both­ered for de­ci­sion the­ory if you can do an ex­per­i­ment and have hu­mans ex­hibit such be­hav­ior, be­cause some hu­man be­hav­ior is patently self-defeat­ing and I don’t think we should re­quire de­ci­sion the­ory to ex­plain all our bi­ases as “ra­tio­nal”, but I want a de­ci­sion the­ory that won’t ex­clude the prefer­ences that we would ac­tu­ally want to adopt on re­flec­tion, so I ei­ther want it to sup­port Alice’s prefer­ences or I want to un­der­stand why Alice’s prefer­ences are in fact ir­ra­tional. In fact, de­cid­ing that some kinds of prefer­ences should be out­lawed as ir­ra­tional can be dan­ger­ous: [...] Maybe I’m just be­ing slow right now, but I can’t figure out what this has to do with the dis­cus­sion pre­ced­ing it. It’s like this: Car­ing about the set of op­tions you were able to choose be­tween seems like a bad idea to me; I’m skep­ti­cal that prefer­ences like Alice’s are what I would want to adopt, on re­flec­tion. I might be tempted to sim­ply say, they’re ob­vi­ously ir­ra­tional, no prob­lem if de­ci­sion the­ory doesn’t cater to them. But car­ing about the al­gorithm your AI runs also seems like a bad idea, and by similar in­tu­itions I might have been will­ing to ac­cept a de­ci­sion the­ory that would out­law such prefer­ences—which, as it turns out, would not be good. The point is that we’re ask­ing what it means to have a con­sis­tent di­rec­tion in which you are try­ing to steer the fu­ture, and it doesn’t look like our AIs are on the same bear­ing. I ei­ther don’t un­der­stand or dis­agree with this. In the situ­a­tion you de­scribe it sounds to me like the two AIs will make differ­ent de­ci­sions in prac­tice for game-the­o­retic rea­sons, but I don’t see why one would sus­pect that they are try­ing to steer the fu­ture in differ­ent di­rec­tions. Let’s sup­pose that both AIs have the fol­low­ing prefer­ences: Most im­por­tantly, they don’t want Earth blown up. How­ever, if they are able to blow up Earth no later than one month from now, they would like to max­i­mize the ex­pected num­ber of pa­per­clips in the uni­verse; if they aren’t able to, they want to max­i­mize the ex­pected num­ber of sta­ples. Now, if in two months a freak ac­ci­dent wipes out Alice’s AI, then the world ends up tiled with pa­per­clips; if it wipes out Carol’s AI, the world ends up tiled with sta­ples. (Un­less they made a deal that if ei­ther was wiped out, the other would carry on its work, as any pa­per­clip max­i­mizer might do with any sta­ples max­i­mizer—though they might not have a rea­son to, since they’re not risk-averse.) This does not sound like steer­ing the fu­ture in the same di­rec­tion, to me. Could you ex­pand on the game-the­o­retic rea­sons? My in­tu­ition is that from a game the­o­retic per­spec­tive, “steer­ing the fu­ture in the same di­rec­tion” should mean we’re talk­ing about a part­ner­ship game, i.e., that both agents will get the same pay­off for any strat­egy pro­file, and I do not see why this would lead to rea­sons to “make differ­ent de­ci­sions in prac­tice”. • The con­cern I’m re­ply­ing to is that de­ci­sion the­ory as­sumes your prefer­ences can be de­scribed by a bi­nary “is preferred to” re­la­tion, but hu­mans might choose op­tion A if the available op­tions are A and B, and op­tion B if the available op­tions are A, B and C, so how do you model that as a bi­nary re­la­tion? Oh. I still do not think the ex­am­ple you gave illus­trates this con­cern. One in­ter­pre­ta­tion of the situ­a­tion is that Alice gains new knowl­edge in the sce­nario. The ex­is­tence of a new pro­ject suited to Bob’s tal­ents in­creases Alice’s as­sess­ment of Bob’s value. More gen­er­ally, it’s rea­son­able for an agent’s prefer­ences to change as its knowl­edge changes. In re­sponse to this ob­jec­tion, I think you only need to as­sume that de­cid­ing be­tween A and B and C is equiv­a­lent to de­cid­ing be­tween A and (B and C) and also equiv­a­lent to de­cid­ing be­tween (A and B) and C, to­gether with the as­sump­tion that your agent is ca­pa­ble of con­sis­tently as­sign­ing prefer­ences to “com­pos­ite choices” like (A and B). Car­ing about the set of op­tions you were able to choose be­tween seems like a bad idea to me; I’m skep­ti­cal that prefer­ences like Alice’s are what I would want to adopt, on re­flec­tion. I might be tempted to sim­ply say, they’re ob­vi­ously ir­ra­tional, no prob­lem if de­ci­sion the­ory doesn’t cater to them. But car­ing about the al­gorithm your AI runs also seems like a bad idea, and by similar in­tu­itions I might have been will­ing to ac­cept a de­ci­sion the­ory that would out­law such prefer­ences—which, as it turns out, would not be good. Are you claiming that these two situ­a­tions are analo­gous or only claiming that they are two ex­am­ples of car­ing about whether de­ci­sion the­ory should al­low cer­tain kinds of prefer­ences? That’s one of the things I was con­fused about (be­cause I can’t see the anal­ogy but your writ­ing sug­gests that one ex­ists). Also, where does your in­tu­ition that it is a bad idea to care about the al­gorithm your AI runs come from? It seems like an ob­vi­ously good idea to care about the al­gorithm your AI runs to me. Could you ex­pand on the game-the­o­retic rea­sons? My in­tu­ition is that from a game the­o­retic per­spec­tive, “steer­ing the fu­ture in the same di­rec­tion” should mean we’re talk­ing about a part­ner­ship game, i.e., that both agents will get the same pay­off for any strat­egy pro­file, and I do not see why this would lead to rea­sons to “make differ­ent de­ci­sions in prac­tice”. I guess that de­pends on what “same” means. If you in­stan­ti­ate two AIs that are run­ning iden­ti­cal al­gorithms but both AIs are ex­plic­itly try­ing to mo­nop­o­lize all of the re­sources on the planet, then they’re play­ing a zero-sum game but there’s a rea­son­able sense in which they are try­ing to steer the fu­ture in the “same” di­rec­tion (namely that they are run­ning iden­ti­cal al­gorithms). If this isn’t a rea­son­able no­tion of same­ness be­cause the al­gorithm in­volves refer­ence to thisA­gent and the refer­ent of this poin­ter changes de­pend­ing on who’s in­stan­ti­at­ing the al­gorithm, then the prefer­ences you’ve de­scribed are also not the same prefer­ences be­cause they also re­fer to thisA­gent. If the prefer­ences are mod­ified to say “if an agent run­ning thisAl­gorithm has ac­cess to foo,” then as far as I can tell the two AIs you de­scribe should be­have as if they are the same agent. • Thanks for the feed­back! It’s pos­si­ble that I’m just mis­read­ing your words to match my pic­ture of the world, but it sounds to me as if we’re not dis­agree­ing too much, but I failed to get my point across in the post. Speci­fi­cally: If this isn’t a rea­son­able no­tion of same­ness be­cause the al­gorithm in­volves refer­ence to thisA­gent and the refer­ent of this poin­ter changes de­pend­ing on who’s in­stan­ti­at­ing the al­gorithm, then the prefer­ences you’ve de­scribed are also not the same prefer­ences be­cause they also re­fer to thisA­gent. If the prefer­ences are mod­ified to say “if an agent run­ning thisAl­gorithm has ac­cess to foo,” then as far as I can tell the two AIs you de­scribe should be­have as if they are the same agent. I am say­ing that I think that a “di­rec­tion for steer­ing the fu­ture” should not de­pend on a global thisA­gent vari­able. To make the ear­lier ex­am­ple even more blatant, I don’t think it’s use­ful to call “If thisA­gent = Alice’s AI, max­i­mize pa­per­clips; if thisA­gent = Carol’s AI, max­i­mize sta­ples” a co­her­ent di­rec­tion, I’d call it a func­tion that re­turns a co­her­ent di­rec­tion. Whether or not the con­cept I’m try­ing to define is the best mean­ing for “same di­rec­tion” is of course only a defi­ni­tional de­bate and not that in­ter­est­ing, but I think it’s a use­ful con­cept. I agree that the most ob­vi­ous for­mal­iza­tion of Alice’s prefer­ences would de­pend on thisA­gent. So I’m say­ing that there ac­tu­ally is a non­triv­ial re­stric­tion on her prefer­ences: If she wants to keep some­thing like her in­for­mal for­mu­la­tion, she will need to de­cide what they are sup­posed to mean in terms that do not re­fer to thisA­gent. They may sim­ply re­fer to “Alice”, but then the AI is in­fluenced only by what Alice was able to do, not by what the AI was able to do, and Alice will have to de­cide whether that is what she wants. Oh. I still do not think the ex­am­ple you gave illus­trates this con­cern. One in­ter­pre­ta­tion of the situ­a­tion is that Alice gains new knowl­edge in the sce­nario. The ex­is­tence of a new pro­ject suited to Bob’s tal­ents in­creases Alice’s as­sess­ment of Bob’s value. More gen­er­ally, it’s rea­son­able for an agent’s prefer­ences to change as its knowl­edge changes. But how could you come up with a pair of situ­a­tions such that in situ­a­tion (i), the agent can choose op­tions A and B, while in situ­a­tion (ii), the agent can choose be­tween A, B and C, and yet the agent has ex­actly the same in­for­ma­tion in situ­a­tions (i) and (ii)? So un­der your rules, how could any ex­am­ple illus­trate the con­cern? I do agree that it’s rea­son­able for Alice to choose a differ­ent op­tion be­cause the knowl­edge she has is differ­ent—that’s my re­s­olu­tion to the prob­lem. In re­sponse to this ob­jec­tion, I think you only need to as­sume that de­cid­ing be­tween A and B and C is equiv­a­lent to de­cid­ing be­tween A and (B and C) and also equiv­a­lent to de­cid­ing be­tween (A and B) and C, to­gether with the as­sump­tion that your agent is ca­pa­ble of con­sis­tently as­sign­ing prefer­ences to “com­pos­ite choices” like (A and B). Sorry, I do not un­der­stand—what do you mean by your com­pos­ite choices? What does it mean to choose (A and B) when A and B are mu­tu­ally ex­clu­sive op­tions? Are you claiming that these two situ­a­tions are analo­gous or only claiming that they are two ex­am­ples of car­ing about whether de­ci­sion the­ory should al­low cer­tain kinds of prefer­ences? That’s one of the things I was con­fused about (be­cause I can’t see the anal­ogy but your writ­ing sug­gests that one ex­ists). I’m claiming they are both ex­am­ples of prefer­ences you might think you could out­law as ir­ra­tional, so you might think it’s ok to use a de­ci­sion the­ory that doesn’t al­low for such prefer­ences. In one of the two cases, it’s clearly not ok, which sug­gests we shouldn’t be too quick to de­cide it’s ok in the other case. Also, where does your in­tu­ition that it is a bad idea to care about the al­gorithm your AI runs come from? It seems like an ob­vi­ously good idea to care about the al­gorithm your AI runs to me. Could it be that it’s not clear enough that I’m talk­ing about ter­mi­nal val­ues, not in­stru­men­tal val­ues? Maybe it’s not right to say that it seems like a bad idea, more like it would seem at first that peo­ple just don’t have ter­mi­nal prefer­ences about the al­gorithm run (or at least not strong ones—you might de­rive en­joy­ment from an el­e­gant al­gorithm, but that wouldn’t out­weigh your de­sire to save lives, so your in­stru­men­tal prefer­ence for a well-work­ing al­gorithm would always dom­i­nate your ter­mi­nal prefer­ence for en­joy­ing an el­e­gant al­gorithm, if the two came into con­flict). So at first it might seem rea­son­able to de­sign a de­ci­sion the­ory where you are not al­lowed to care about the al­gorithm your AI is run­ning—I find it at least con­ceiv­able that when try­ing to prove the­o­rems about self-mod­ify­ing AI, mak­ing such an as­sump­tion might sim­plify things, so this does seem like a con­ceiv­able failure mode to me. • I agree that the most ob­vi­ous for­mal­iza­tion of Alice’s prefer­ences would de­pend on thisA­gent. So I’m say­ing that there ac­tu­ally is a non­triv­ial re­stric­tion on her prefer­ences: If she wants to keep some­thing like her in­for­mal for­mu­la­tion, she will need to de­cide what they are sup­posed to mean in terms that do not re­fer to thisA­gent. Got it. I think. But how could you come up with a pair of situ­a­tions such that in situ­a­tion (i), the agent can choose op­tions A and B, while in situ­a­tion (ii), the agent can choose be­tween A, B and C, and yet the agent has ex­actly the same in­for­ma­tion in situ­a­tions (i) and (ii)? In situ­a­tion (i), Alice can choose be­tween choco­late and vanilla ice cream. In situ­a­tion (ii), Alice can choose be­tween choco­late, vanilla, and straw­berry ice cream. Hav­ing ac­cess to these op­tions doesn’t change Alice’s knowl­edge about her prefer­ences for ice cream fla­vors (un­der the as­sump­tion that ac­cess to fla­vors on a given day doesn’t re­flect some kind of global short­age of a fla­vor). In gen­eral it might help to have Alice’s choices ran­domly de­ter­mined, so that Alice’s knowl­edge of her choices doesn’t give her in­for­ma­tion about any­thing else. Sorry, I do not un­der­stand—what do you mean by your com­pos­ite choices? What does it mean to choose (A and B) when A and B are mu­tu­ally ex­clu­sive op­tions? Sorry, I should prob­a­bly have used “or” in­stead of “and.” If A and B are the prim­i­tive choices “choco­late ice cream” and “vanilla ice cream,” then the com­pos­ite choice (A or B) is “the op­por­tu­nity to choose be­tween choco­late and vanilla ice cream.” The point is that once you al­low a de­ci­sion the­ory to as­sign prefer­ences among com­pos­ite choices, then com­po­si­tion of choices is as­so­ci­a­tive, so prefer­ences among an ar­bi­trary num­ber of prim­i­tive choices are de­ter­mined by prefer­ences among pairs of prim­i­tive choices. Maybe it’s not right to say that it seems like a bad idea, more like it would seem at first that peo­ple just don’t have ter­mi­nal prefer­ences about the al­gorithm run (or at least not strong ones—you might de­rive en­joy­ment from an el­e­gant al­gorithm, but that wouldn’t out­weigh your de­sire to save lives, so your in­stru­men­tal prefer­ence for a well-work­ing al­gorithm would always dom­i­nate your ter­mi­nal prefer­ence for en­joy­ing an el­e­gant al­gorithm, if the two came into con­flict). So at first it might seem rea­son­able to de­sign a de­ci­sion the­ory where you are not al­lowed to care about the al­gorithm your AI is run­ning—I find it at least con­ceiv­able that when try­ing to prove the­o­rems about self-mod­ify­ing AI, mak­ing such an as­sump­tion might sim­plify things, so this does seem like a con­ceiv­able failure mode to me. Okay, but it still seems rea­son­able to have in­stru­men­tal prefer­ences about al­gorithms that AIs run, and I don’t see why de­ci­sion the­ory is not al­lowed to talk about in­stru­men­tal prefer­ences. (Ad­mit­tedly I don’t know very much about de­ci­sion the­ory.) • The “not want­ing the AI to run con­scious simu­la­tions of peo­ple” link un­der the “Out­comes” head­ing does not work. • Fixed, thanks! • What hap­pens if your prefer­ences do not satisfy Con­ti­nu­ity? Say, you want to save hu­man lives, but you’re not will­ing to in­cur any prob­a­bil­ity, no mat­ter how small, of in­finitely many peo­ple get­ting tor­tured in­finitely long for this? Then you ba­si­cally have a two-step op­ti­miza­tion; “find me the set of ac­tions that have a min­i­mal num­ber of in­finitely many peo­ple get­ting tor­tured in­finitely long, and then of that set, find me the set of ac­tions that save a max­i­mal num­ber of hu­man lives.” The trou­ble with that is that peo­ple like to ex­press their prefer­ences with rules like that, but those prefer­ences are not ones that they re­flec­tively en­dorse. For ex­am­ple, would you rather it be cer­tain that all in­tel­li­gent life in the uni­verse is de­stroyed for­ever, or there be a one out of R chance that in­finitely many peo­ple get tor­tured for an in­finitely long pe­riod, and R-1 out of R chance that hu­man­ity con­tinues along hap­pily? If R is suffi­ciently large (say, x^x with x ^s, with x equal to the num­ber of atoms in the uni­verse), then it seems that the first op­tion is ob­vi­ously worse. A way to think about this is that in­fini­ties de­stroy av­er­ages, and VNM re­lies on scor­ing ac­tions by their av­er­age util­ity. If util­ities are bounded, then Con­ti­nu­ity holds, and av­er­age util­ity always gives the re­sults you ex­pect if you mea­sured util­ity cor­rectly. • New in­for­ma­tion is al­lowed to make a hy­poth­e­sis more likely, but not pre­dictably so; if all ways the ex­per­i­ment could come out make the hy­poth­e­sis more likely, then you should already be find­ing it more likely than you do. The same thing is true even if only one re­sult would make the hy­poth­e­sis more likely, but the other would leave your prob­a­bil­ity es­ti­mate ex­actly un­changed. One re­sult might change my prob­a­bil­ity es­ti­mate by less than my cur­rent im­pre­ci­sion/​un­cer­tainty/​round­ing er­ror in stat­ing said es­ti­mate. If the coin comes up H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,H,H,H,H,H,H,H,H,H,H,H… then I can as­sign a low prob­a­bil­ity to it be­ing fair, but a lower prob­a­bil­ity to it hav­ing two heads. • Thank you for this ex­cel­lent post. I read this pri­mar­ily be­cause I would like to use for­mal the­o­ries to aid my own de­ci­sion-mak­ing about big, ab­stract de­ci­sions like what to do with my life and what char­i­ties to donate to, where the num­bers are much more available than the emo­tional re­sponses. In a way this didn’t help at all: it only says any­thing about a situ­a­tion where you start with a prefer­ence or­der­ing. But in an­other way it helped im­mensely, of course, since I need to un­der­stand these fun­da­men­tal con­cepts. It was re­ally valuable to me that you were so care­ful about what these “util­ities” re­ally mean. • Max­i­miz­ing ex­pected util­ity can be para­dox­i­cally shown to min­i­mize ac­tual util­ity, how­ever. Con­sider a game in which you place an ini­tial bet of$1 on a 6-sided die com­ing up any­thing but 1 (2-6), which pays even money if you win and costs you your bet if you lose. The twist, how­ever, is that upon win­ning (i.e. you now have $2 in front of you) you must ei­ther bet the en­tire sum formed by your bet and its wins or leave the game per­ma­nently. The­o­ret­i­cally, since the odds are in your fa­vor, you should always keep go­ing. Always. But wait, this means you will even­tu­ally lose it all. Even if you say “just one more and I’ll stop”, it’ll be math­e­mat­i­cally op­ti­mal to keep re­peat­ing this be­hav­ior. This “op­ti­mal” strat­egy does worse than any ar­bi­trary ran­dom strat­egy pos­si­ble. • You aren’t an­a­lyz­ing this game cor­rectly. At the be­gin­ning of the game, you’re de­cid­ing be­tween pos­si­ble strate­gies for play­ing the game, and you should be eval­u­at­ing the ex­pected value of each of these strate­gies. The strat­egy where you keep go­ing un­til you lose has ex­pected value −1. There is also a se­quence of strate­gies de­pend­ing on a pos­i­tive in­te­ger n where you quit at the lat­est af­ter the nth bet, and their ex­pected val­ues form an ar­ith­metic pro­gres­sion. In other words, there isn’t an op­ti­mal strat­egy for this game be­cause there are in­finitely many strate­gies and their ex­pected val­ues get ar­bi­trar­ily high. In ad­di­tion, the se­quence of strate­gies I de­scribed tends to the first strat­egy in the limit as n tends to in­finity, in some sense, but their ex­pected val­ues don’t re­spect this limit, which is what leads to the ap­par­ent para­dox that you noted. In more math­e­mat­i­cal lan­guage, what you’re see­ing here is a failure of the abil­ity to ex­change limits and in­te­grals (where the in­te­grals are ex­pected val­ues). Less math­e­mat­i­cally, you can’t eval­u­ate the ex­pected value of a se­quence of in­finitely many de­ci­sions by adding up the ex­pected value of each in­di­vi­d­ual de­ci­sion. In prac­tice, you will never be able to make in­finitely many de­ci­sions, so this doesn’t re­ally mat­ter. This is­sue is closely re­lated to the puz­zle where the Devil gives you money and takes it away in­finitely many times. I don’t re­mem­ber what it’s called. • In­deed they don’t, but the point is that while stop­ping at N+1 always dom­i­nates stop­ping at N, this think­ing leads one to keep con­tin­u­ing and lose. As such, the only win­ning move is to do ex­actly NOT this and de­cide some ar­bi­trary prior point to stop at (or de­cide in­de­ter­minis­ti­cally such as by coin flip). At­tempt­ing to max­i­mize ex­pected util­ity is the only strat­egy that won’t work. This game, pris­on­ers’ dilemma, and new­comblike prob­lems are all cases where choos­ing in such a way that does bet­ter (than the al­ter­na­tive) in all cases can still do worse over­all. • The point isn’t that the strat­egy that is sup­posed to max­i­mize ex­pected util­ity is a bad idea. The point is that you’re com­put­ing its ex­pected util­ity in­cor­rectly be­cause you’re switch­ing a limit and an in­te­gral that you can’t switch. This is a com­pletely differ­ent is­sue from the pris­oner’s dilemma; it is en­tirely an is­sue of in­fini­ties and has noth­ing to do with the prac­ti­cal is­sue of be­ing a de­ci­sion-maker with bounded re­sources mak­ing finitely many de­ci­sions. • It isn’t a mat­ter of switch­ing a limit and an in­te­gral, or any means of in­finity re­ally. You could just con­sider the 1 num­ber you’re cur­rently on, your op­tions are to con­tinue or stop. To come out of the game with any money, one must at some point say “for­get max­i­miz­ing ex­pected util­ity, I’m not risk­ing los­ing what I’ve ac­quired”. By stop­ping, you lose ex­pected util­ity com­pared to con­tin­u­ing ex­actly 1 more time. My point be­ing that it is not always the case that “you must max­i­mize ex­pected util­ity”, for in some cases it may be wrong or im­pos­si­ble to do so. • All you’ve shown is that max­i­miz­ing ex­pected util­ity in­finitely many times does not max­i­mize the ex­pected util­ity you get at the end of the in­finitely many de­ci­sions you’ve made. This is en­tirely a mat­ter of switch­ing a limit and an in­te­gral, and it is ir­rele­vant to prac­ti­cal de­ci­sion-mak­ing. • 1 This ar­gu­ment only works if the bet is de­nom­i­nated in utils rather than in dol­lars. Other­wise, some­one who gets diminish­ing marginal util­ity from dol­lars for very large sums—that would in­clude most peo­ple—will even­tu­ally de­cide to stop. (If I have util­ity = log(dol­lars) and ini­tial as­sets of$1M then I will stop af­ter 25 wins, if I did the calcu­la­tions right.)

1a It is not at all clear that a bet de­nom­i­nated in utils is even ac­tu­ally pos­si­ble. Espe­cially not one which, with high prob­a­bil­ity, ends up in­volv­ing an as­tro­nom­i­cally large quan­tity of util­ity.

2 Even some­one who doesn’t gen­er­ally get diminish­ing marginal util­ity from dol­lars—say, an al­tru­ist who will use all those dol­lars for sav­ing other peo­ple’s lives, and who cares equally about all—will find marginal util­ity de­creas­ing for large enough sums, be­cause (a) even­tu­ally the cheap prob­lems are solved and sav­ing the next life starts cost­ing more, and (b) if you give me 10^15 dol­lars and I try to spend it all (on my­self or oth­ers) then the re­sult­ing in­fla­tion will make them worth less.

3 Given that “you will even­tu­ally lose it all”, a strat­egy of con­tin­u­ing to bet does not in fact max­i­mize ex­pected util­ity.

4 The ex­pected util­ity from a given choice at a given stage in the game de­pends on what you’d then do with the re­main­der of the game. For in­stance, if I know that my fu­ture strat­egy af­ter win­ning this roll is go­ing to be “keep bet­ting for ever” then I know that my ex­pected util­ity if I keep play­ing is zero, so I’ll choose not to do that.

5 So at most what we have (even if we as­sume we’ve dealt some­how with is­sues of diminish­ing marginal util­ity etc.) is a game where there’s an in­finite “in­creas­ing” se­quence of strate­gies but no limit­ing strat­egy that’s bet­ter than all of them. But that’s no sur­prise. Here’s an­other game with the same prop­erty: You name a pos­i­tive in­te­ger N and Omega gives you $N. For any fixed N, it is best not to choose N be­cause larger num­bers are bet­ter. “There­fore” you can’t name any par­tic­u­lar num­ber, so you re­fuse to play and get noth­ing. If you don’t find this para­dox­i­cal—and I con­fess that I don’t—then I don’t think you need find the die-rol­ling game any worse. (Choos­ing N in this game <--> de­cid­ing to play for N turns in the die-rol­ling game.) [EDITED to stop the LW soft­ware turn­ing my num­bered points into differ­ently num­bered and weirdly for­mat­ted points.] [EDITED again to ac­knowl­edge that af­ter writ­ing all that I read on and found that oth­ers had already said more or less the same things as me. D’oh. Any­way, since ap­par­ently Qiaochu_Yuan wasn’t suc­cess­ful in con­vinc­ing srn247, per­haps my slightly differ­ent pre­sen­ta­tion will be of some help.] • This is the St. Peters­burg para­dox, dis­cussed here from time to time. • It isn’t re­ally very much like the St. Peters­burg para­dox. The St. Peters­burg game runs for a ran­dom length of time, you don’t choose whether to con­tinue; the only choice you make is at the be­gin­ning of the game where you de­cide how much to pay. Or is it equiv­a­lent in some sub­tle way? • Is it just me or is this es­sen­tially the same as the Lifes­pan Dilemma? At the very least, in both cases, you find that you get high ex­pected util­ities by choos­ing very low prob­a­bil­ities of get­ting any­thing at all. If your prefer­ences can always be mod­el­led with a util­ity func­tion, does that mean that no mat­ter how you make de­ci­sions, there’s some adap­ta­tion of this para­dox that will lead you to ac­cept a near cer­tainty of death? • It is es­sen­tially that, and it does show that try­ing to max­i­mize ex­pected util­ity can lead to such nega­tive out­comes. Un­for­tu­nately, there doesn’t seem to be a sim­ple al­ter­na­tive to max­i­miz­ing ex­pected util­ity that doesn’t lead to be­ing a money pump. The kelly crite­rion is an ex­cel­lent ex­am­ple of a de­ci­sion-mak­ing strat­egy that doesn’t max­i­mize ex­pected util­ity but still wins com­pared to it, so at least it’s known that it can be done. • I ap­pre­ci­ate the hard work here, but all the math sidesteps the real prob­lems, which are in the ax­ioms, par­tic­u­larly the ax­iom of in­de­pen­dence. See this se­quence of com­ments on my post ar­gu­ing that say­ing ex­pec­ta­tion max­i­miza­tion is cor­rect is equiv­a­lent to say­ing that av­er­age util­i­tar­i­anism is cor­rect. Peo­ple ob­ject to av­er­age util­i­tar­i­anism be­cause of cer­tain “re­pug­nant” sce­nar­ios, such as the util­ity mon­ster (a sin­gle in­di­vi­d­ual who en­joys tor­tur­ing ev­ery­one else so much that it’s right to let him or her do so). Some of these sce­nar­ios can be trans­formed into a re­pug­nant sce­nario for ex­pec­ta­tion max­i­miza­tion over your own util­ity func­tion, where in­stead of “one per­son” you have “one pos­si­ble fu­ture you”. Sup­pose the world has one billion peo­ple. Do you think it’s bet­ter to give one billion and one utilons to one per­son than to give one utilon to ev­ery­one? If so, why would you be­lieve it’s bet­ter to take an ac­tion that re­sults in you hav­ing one billion and one utilons one-one-billionth of the time, and noth­ing all other times, than an ac­tion that re­li­ably gives you one utilon? The way peo­ple think about the lot­tery sug­gests that most peo­ple pre­fer to dis­tribute utilons equally among differ­ent peo­ple, but to lump them to­gether and give them to a few win­ners in dis­tri­bu­tions among their pos­si­ble fu­ture selves. This is a case where we re­li­ably vi­o­late the Golden Rule, and call our­selves vir­tu­ous for do­ing so. • Sup­pose the world has one billion peo­ple. Do you think it’s bet­ter to give one billion and one utilons to one per­son than to give one utilon to ev­ery­one? Yes. If you think this con­clu­sion is re­pug­nant, you have not com­pre­hended the mean­ing of 1000000001 times as much util­ity. The only thing that util­ity value even means is that you’d ac­cept such a deal. You don’t “give” peo­ple utilons though. That im­plies scarcity, which im­plies some real re­source to be dis­tributed, which we cor­rectly rec­og­nize as hav­ing diminish­ing re­turns on one per­son, and less diminish­ing re­turns on lots of peo­ple. The bet­ter way to think of it is that you ex­tract util­ity from peo­ple. Would you rather get 1e9 utils from one per­son, or 1 util from each of 1e9 peo­ple? Who cares 1e9 utils is 1e9 utils. If so, why would you be­lieve it’s bet­ter to take an ac­tion that re­sults in you hav­ing one billion and one utilons one-one-billionth of the time, and noth­ing all other times, than an ac­tion that re­li­ably gives you one utilon? Again, by con­struc­tion, we take this deal. VNM should not have called it “util­ity”; it drags in too many con­no­ta­tions. VNM util­ity is a very per­sonal thing that de­scribes what de­ci­sions you would make. • It is per­mis­si­ble to pre­fer the out­come that has a con­stant prob­a­bil­ity dis­tri­bu­tion to the out­come that has the higher definite in­te­gral across the prob­a­bil­ity dis­tri­bu­tion. • What do you mean? Speci­fi­cally, what is a “con­stant prob­a­bil­ity dis­tri­bu­tion”? If you mean I can pre­fer$1M to a 1/​1000 chance of $2B, then sure. Money is not util­ity. On the other hand, I can’t pre­fer 1M utils to 1/​1000 chance of 2B utils. • A con­stant prob­a­bil­ity dis­tri­bu­tion is a flat dis­tri­bu­tion; i.e. a flat line. And the out­comes can be or­dered how­ever one chooses. It is not nec­es­sary to provide ad­di­tive nu­meric val­ues. Are you say­ing that utils are defined such that if one out­come is preferred over an­other, it has more ex­pected utils? • Are you say­ing that utils are defined such that if one out­come is preferred over an­other, it has more ex­pected utils? Yes. That’s ex­actly what I mean. And I’m afraid I still don’t know what you are get­ting at with this con­stant prob­a­bil­ity dis­tri­bu­tion thing. • I mean an out­come where there is 1-ep­silon chance of A. It is per­mis­si­ble to as­sign utils ar­bi­trar­ily, such that flip­ping a coin to de­cide be­tween A and B has more utils than se­lect­ing A and more utils than se­lect­ing B. In that case, the out­come is “Flip a coin and al­low the coin to de­cide”, which has differ­ent util­ity from the sum of half of A and half of B. • It is per­mis­si­ble to as­sign utils ar­bi­trar­ily, such that flip­ping a coin to de­cide be­tween A and B has more utils than se­lect­ing A and more utils than se­lect­ing B. In that case, the out­come is “Flip a coin and al­low the coin to de­cide”, which has differ­ent util­ity from the sum of half of A and half of B. Per­haps if you count “I flipped a coin and got A” > A. You can always define some util­ity func­tion such that it is ra­tio­nal to shoot your­self in the foot, but at that point, you are just do­ing a bunch of work to de­scribe stupid be­hav­ior that you could just do any­ways. You don’t have to fol­low the VNM ax­ioms ei­ther. The point of VNM and such is to con­strain your be­hav­ior. And if you in­put sen­si­ble things, it does. You don’t have to let it con­strain your be­hav­ior, but if you don’t, it is do­ing no work for you. • Right. If you think “I flipped a coin to de­cide” is more valuable than half of the differ­ence be­tween re­sults of the coin flip (per­haps be­cause those re­sults are very close to equal, but you fear that sys­temic bias is a large nega­tive, or per­haps be­cause you de­mand that you are prov­ably fair), then you flip a coin to de­cide. The util­ity func­tion, how­ever, is not some­thing to be defined. It is some­thing to be de­ter­mined and dis­cov­ered- I already want things, and while what I want is time-var­i­ant, it isn’t ar­bi­trar­ily al­ter­able. • Un­less your util­ity as­signs a pos­i­tive util­ity to your util­ity func­tion be­ing al­tered, in which case you’d have to seek to op­ti­mize your meta-util­ity. De­sire to change one’s de­sires re­flects an in­con­sis­tency, how­ever, so one who de­sires to be con­sis­tent should de­sire not to de­sire to change one’s de­sires. (my apolo­gies if this sounds con­fus­ing) • One level deeper: One who is not con­sis­tent but de­sires to be con­sis­tent de­sires to change their de­sires to de­sires that they will not then de­sire to change. If you don’t like not lik­ing where you are, and you don’t like where you are, move to some­where where you will like where you are. • Ah, so true. Ul­ti­mately, I think that’s ex­actly the point this ar­ti­cle tries to make: if you don’t want to do A, but you don’t want to be the kind of per­son who doesn’t want to do A (or you don’t want to be the kind of per­son who doesn’t do A), do A. If that doesn’t work, change who you are. • If so, why would you be­lieve it’s bet­ter to take an ac­tion that re­sults in you hav­ing one billion and one utilons one-one-billionth of the time, and noth­ing all other times, than an ac­tion that re­li­ably gives you one utilon? One pos­si­ble re­sponse is that the former ac­tion is prefer­able, but the in­tu­ition pump yields a differ­ent re­sult be­cause our in­tu­itions are in­formed by ac­tual small and large re­wards (e.g., money), and in the real world get­ting$1 ev­ery day for eight years with cer­tainty does not have the same util­ity as get­ting \$2922 with prob­a­bil­ity 1/​2922 each day for the next eight years. If real-world ex­am­ples like money—which is al­most always more valuable now than later, in­fla­tion aside; and which bears hid­den and non­lin­early chang­ing util­ities like ‘se­cu­rity’ and ‘ver­sa­tility’ and ‘so­cial sta­tus’ and ‘peace of mind’ that we learn to rea­son with in­tu­itively as though they could not be quan­tified in a sin­gle util­ity met­ric analo­gous to the cur­rency mea­sure it­self—are the only in­tu­itive grasp we have on ‘utilons,’ then we may make sys­tem­atic er­rors in try­ing to cash out how our val­ues would, if we bet­ter un­der­stood our bi­ases, be re­flec­tively cashed out.

• See this se­quence of com­ments on my post ar­gu­ing that say­ing ex­pec­ta­tion max­i­miza­tion is cor­rect is equiv­a­lent to say­ing that av­er­age util­i­tar­i­anism is cor­rect.

That the­sis seems ob­vi­ously wrong: the term “util­i­tar­i­anism” refers not to max­imis­ing, but to max­imis­ing some­thing pretty spe­cific—namely: the hap­piness of all peo­ple.

• von Neu­mann-Mor­gen­stern de­ci­sion the­ory only deals with in­stan­ta­neous de­ci­sion mak­ing.