Expected utility without the independence axiom

John von Neu­mann and Oskar Mor­gen­stern de­vel­oped a sys­tem of four ax­ioms that they claimed any ra­tio­nal de­ci­sion maker must fol­low. The ma­jor con­se­quence of these ax­ioms is that when faced with a de­ci­sion, you should always act solely to in­crease your ex­pected util­ity. All four ax­ioms have been at­tacked at var­i­ous times and from var­i­ous di­rec­tions; but three of them are very solid. The fourth—in­de­pen­dence—is the most con­tro­ver­sial.

To un­der­stand the ax­ioms, let A, B and C be lot­ter­ies—pro­cesses that re­sult in differ­ent out­comes, pos­i­tive or nega­tive, with a cer­tain prob­a­bil­ity of each. For 0<p<1, the mixed lot­tery pA + (1-p)B im­plies that you have p chances of be­ing in lot­tery A, and (1-p) chances of be­ing in lot­tery B. Then writ­ing A>B means that you pre­fer lot­tery A to lot­tery B, A<B is the re­verse and A=B means that you are in­differ­ent be­tween the two. Then the von Neu­mann-Mor­gen­stern ax­ioms are:

  • (Com­plete­ness) For ev­ery A and B ei­ther A<B, A>B or A=B.

  • (Tran­si­tivity) For ev­ery A, B and C with A>B and B>C, then A>C.

  • (Con­ti­nu­ity) For ev­ery A>B>C then there ex­ist a prob­a­bil­ity p with B=pA + (1-p)C.

  • (In­de­pen­dence) For ev­ery A, B and C with A>B, and for ev­ery 0<t≤1, then tA + (1-t)C > tB + (1-t)C.

In this post, I’ll try and prove that even with­out the In­de­pen­dence ax­iom, you should con­tinue to use ex­pected util­ity in most situ­a­tions. This re­quires some mild ex­tra con­di­tions, of course. The prob­lem is that al­though these con­di­tions are con­sid­er­ably weaker than In­de­pen­dence, they are harder to phrase. So please bear with me here.

The whole in­sight in this post rests on the fact that a lot­tery that has 99.999% chance of giv­ing you £1 is very close to be­ing a lot­tery that gives you £1 with cer­tainty. I want to ex­press this fact by look­ing at the nar­row­ness of the prob­a­bil­ity dis­tri­bu­tion, us­ing the stan­dard de­vi­a­tion. How­ever, this nar­row­ness is not an in­trin­sic prop­erty of the dis­tri­bu­tion, but of our util­ity func­tion. Even in the ex­am­ple above, if I de­cide that re­ceiv­ing £1 gives me a util­ity of one, while re­ceiv­ing zero gives me a util­ity of minus ten billion, then I no longer have a nar­row dis­tri­bu­tion, but a wide one. So, un­like the tra­di­tional set-up, we have to as­sume a util­ity func­tion as be­ing given. Once this is cho­sen, this al­lows us to talk about the mean and stan­dard de­vi­a­tion of a lot­tery.

Then if you define c(μ) as the lot­tery giv­ing you a cer­tain re­turn of μ, you can use the fol­low­ing ax­iom in­stead of in­de­pen­dence:

  • (Stan­dard de­vi­a­tion bound) For all ε>0, there ex­ists a δ>0 such that for all μ>0, then any lot­tery B with mean μ and stan­dard de­vi­a­tion less that μδ has B>c((1-ε)μ).

This seems com­pli­cated, but all that it says, in math­e­mat­i­cal terms, is that if we have a prob­a­bil­ity dis­tri­bu­tion that is “nar­row enough” around its mean μ, then we should value it are be­ing very close to a cer­tain re­turn of μ. The nar­row­ness is ex­pressed in terms of its stan­dard de­vi­a­tion—a lot­tery with zero SD is a guaran­teed re­turn of μ, and as the SD gets larger, the dis­tri­bu­tion gets wider, and the chances of get­ting val­ues far away from μ in­creases. So risk, in other words, scales (ap­prox­i­mately) with the SD.

We also need to make sure that we are not risk lov­ing—if we are in­vet­er­ate gam­blers for the point of be­ing gam­blers, our be­havi­our may be a lot more com­pli­cated.

  • (Not risk lov­ing) If A has mean μ>0, then A≤c(μ).

I.e. we don’t love a worse rate of re­turn just be­cause of the risk. This ax­iom can and maybe should be weak­ened, but it’s a good ap­prox­i­ma­tion for the mo­ment—most peo­ple are not risk lov­ing with huge risks.

As­sume you are go­ing to be have to choose n differ­ent times whether to ac­cept in­de­pen­dent lot­ter­ies with fixed mean β>0, and all with SD less than a fixed up­per-bound K. Then if you are not risk lov­ing and n is large enough, you must ac­cept an ar­bi­trar­ily large pro­por­tion of the lot­ter­ies.

Proof: From now on, I’ll use a differ­ent con­ven­tion for adding and scal­ing lot­ter­ies. Treat­ing them as ran­dom vari­ables, A+B will mean the lot­tery con­sist­ing of A and B to­gether, while xA will mean the same lot­tery as A, but with all re­turns (pos­i­tive or nega­tive) scaled by x.

Let X1, X2, … , Xn be these n in­de­pen­dent lot­ter­ies, with means β and var­i­ances vj. The since the stan­dard de­vi­a­tions are less than K, the var­i­ances must be less than K2.

Let Y = X1 + X2 + … + Xn. The mean of Y is nβ. The var­i­ance of Y is the sum of the vj, which is less than nK2. Hence the SD of Y is less than K√(n). Now pick an ε>0, and the re­sult­ing δ>0 from the stan­dard de­vi­a­tion bound ax­iom. For large enough n, nβδ must be larger than K√(n); hence, for large enough n, Y > c((1-ε)nβ). Now, if we were to re­fuse more that εn of the lot­ter­ies, we would be left with a dis­tri­bu­tion with mean ≤ (1-ε)nβ, which, since we are not risk lov­ing, is worse than c((1-ε)nβ), which is worse than Y. Hence we must ac­cept more than a pro­por­tion (1-ε) of the lot­ter­ies on offer.

This only ap­plies to lot­ter­ies that share the same mean, but we can gen­er­al­ise the re­sult as:

As­sume you are go­ing to be have to choose n differ­ent times whether to ac­cept in­de­pen­dent lot­ter­ies all with means greater than a fixed β>0, and all with SD less than a fixed up­per-bound K. Then if you are not risk lov­ing and n is large enough, you must ac­cept lot­ter­ies whose means rep­re­sent an ar­bi­trar­ily large pro­por­tion of the to­tal mean of all lot­ter­ies on offer.

Proof: The same proof works as be­fore, with nβ now be­ing a lower bound on the true mean μ of Y. Thus we get Y > c((1-ε)μ), and we must ac­cept lot­ter­ies whose to­tal mean is greater than (1-ε)μ.

Anal­y­sis: Since we re­jected in­de­pen­dence, we must now con­sider the lot­ter­ies when taken as a whole, rather than just see­ing them in­di­vi­d­u­ally. When con­sid­ered as a whole, “rea­son­able” lot­ter­ies are more tightly bunched around their to­tal mean than they are in­di­vi­d­u­ally. Hence the more lot­ter­ies we con­sider, the more we should treat them as if only their mean mat­tered. So if we are not risk lov­ing, and ex­pect to meet many lot­ter­ies with bounded SD in our lives, we should fol­low ex­pected util­ity. Deprived of in­de­pen­dence, ex­pected util­ity sneaks in via ag­gre­ga­tion.

Note: This restates the first half of my pre­vi­ous post—a post so con­fus­ingly writ­ten it should be staked through the heart and left to die on a cross­road at noon.

Edit: Rewrote a part to em­pha­sis the fact that a util­ity func­tion needs to be cho­sen in ad­vance—thanks to Peter de Blanc and Nick Hay for bring­ing this up.