Expected utility without the independence axiom

John von Neumann and Oskar Morgenstern developed a system of four axioms that they claimed any rational decision maker must follow. The major consequence of these axioms is that when faced with a decision, you should always act solely to increase your expected utility. All four axioms have been attacked at various times and from various directions; but three of them are very solid. The fourth—independence—is the most controversial.

To understand the axioms, let A, B and C be lotteries—processes that result in different outcomes, positive or negative, with a certain probability of each. For 0<p<1, the mixed lottery pA + (1-p)B implies that you have p chances of being in lottery A, and (1-p) chances of being in lottery B. Then writing A>B means that you prefer lottery A to lottery B, A<B is the reverse and A=B means that you are indifferent between the two. Then the von Neumann-Morgenstern axioms are:

  • (Completeness) For every A and B either A<B, A>B or A=B.

  • (Transitivity) For every A, B and C with A>B and B>C, then A>C.

  • (Continuity) For every A>B>C then there exist a probability p with B=pA + (1-p)C.

  • (Independence) For every A, B and C with A>B, and for every 0<t≤1, then tA + (1-t)C > tB + (1-t)C.

In this post, I’ll try and prove that even without the Independence axiom, you should continue to use expected utility in most situations. This requires some mild extra conditions, of course. The problem is that although these conditions are considerably weaker than Independence, they are harder to phrase. So please bear with me here.

The whole insight in this post rests on the fact that a lottery that has 99.999% chance of giving you £1 is very close to being a lottery that gives you £1 with certainty. I want to express this fact by looking at the narrowness of the probability distribution, using the standard deviation. However, this narrowness is not an intrinsic property of the distribution, but of our utility function. Even in the example above, if I decide that receiving £1 gives me a utility of one, while receiving zero gives me a utility of minus ten billion, then I no longer have a narrow distribution, but a wide one. So, unlike the traditional set-up, we have to assume a utility function as being given. Once this is chosen, this allows us to talk about the mean and standard deviation of a lottery.

Then if you define c(μ) as the lottery giving you a certain return of μ, you can use the following axiom instead of independence:

  • (Standard deviation bound) For all ε>0, there exists a δ>0 such that for all μ>0, then any lottery B with mean μ and standard deviation less that μδ has B>c((1-ε)μ).

This seems complicated, but all that it says, in mathematical terms, is that if we have a probability distribution that is “narrow enough” around its mean μ, then we should value it are being very close to a certain return of μ. The narrowness is expressed in terms of its standard deviation—a lottery with zero SD is a guaranteed return of μ, and as the SD gets larger, the distribution gets wider, and the chances of getting values far away from μ increases. So risk, in other words, scales (approximately) with the SD.

We also need to make sure that we are not risk loving—if we are inveterate gamblers for the point of being gamblers, our behaviour may be a lot more complicated.

  • (Not risk loving) If A has mean μ>0, then A≤c(μ).

I.e. we don’t love a worse rate of return just because of the risk. This axiom can and maybe should be weakened, but it’s a good approximation for the moment—most people are not risk loving with huge risks.

Assume you are going to be have to choose n different times whether to accept independent lotteries with fixed mean β>0, and all with SD less than a fixed upper-bound K. Then if you are not risk loving and n is large enough, you must accept an arbitrarily large proportion of the lotteries.

Proof: From now on, I’ll use a different convention for adding and scaling lotteries. Treating them as random variables, A+B will mean the lottery consisting of A and B together, while xA will mean the same lottery as A, but with all returns (positive or negative) scaled by x.

Let X1, X2, … , Xn be these n independent lotteries, with means β and variances vj. The since the standard deviations are less than K, the variances must be less than K2.

Let Y = X1 + X2 + … + Xn. The mean of Y is nβ. The variance of Y is the sum of the vj, which is less than nK2. Hence the SD of Y is less than K√(n). Now pick an ε>0, and the resulting δ>0 from the standard deviation bound axiom. For large enough n, nβδ must be larger than K√(n); hence, for large enough n, Y > c((1-ε)nβ). Now, if we were to refuse more that εn of the lotteries, we would be left with a distribution with mean ≤ (1-ε)nβ, which, since we are not risk loving, is worse than c((1-ε)nβ), which is worse than Y. Hence we must accept more than a proportion (1-ε) of the lotteries on offer.

This only applies to lotteries that share the same mean, but we can generalise the result as:

Assume you are going to be have to choose n different times whether to accept independent lotteries all with means greater than a fixed β>0, and all with SD less than a fixed upper-bound K. Then if you are not risk loving and n is large enough, you must accept lotteries whose means represent an arbitrarily large proportion of the total mean of all lotteries on offer.

Proof: The same proof works as before, with nβ now being a lower bound on the true mean μ of Y. Thus we get Y > c((1-ε)μ), and we must accept lotteries whose total mean is greater than (1-ε)μ.

Analysis: Since we rejected independence, we must now consider the lotteries when taken as a whole, rather than just seeing them individually. When considered as a whole, “reasonable” lotteries are more tightly bunched around their total mean than they are individually. Hence the more lotteries we consider, the more we should treat them as if only their mean mattered. So if we are not risk loving, and expect to meet many lotteries with bounded SD in our lives, we should follow expected utility. Deprived of independence, expected utility sneaks in via aggregation.

Note: This restates the first half of my previous post—a post so confusingly written it should be staked through the heart and left to die on a crossroad at noon.

Edit: Rewrote a part to emphasis the fact that a utility function needs to be chosen in advance—thanks to Peter de Blanc and Nick Hay for bringing this up.