# Coherent decisions imply consistent utilities

(*Written for Arbital in 2017.*)

# Introduction to the introduction: Why expected utility?

So we’re talking about how to make good decisions, or the idea of ‘bounded rationality’, or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of ‘expected utility’ or ‘utility functions’.

And before we even ask what those are, we might first ask, *Why?*

There’s a mathematical formalism, ‘expected utility’, that some people invented to talk about making decisions. This formalism is very academically popular, and appears in all the textbooks.

But so what? Why is that *necessarily* the best way of making decisions under every kind of circumstance? Why would an Artificial Intelligence care what’s academically popular? Maybe there’s some better way of thinking about rational agency? Heck, why is this formalism popular in the first place?

We can ask the same kinds of questions about probability theory:

Okay, we have this mathematical formalism in which the chance that X happens, aka , plus the chance that X doesn’t happen, aka , must be represented in a way that makes the two quantities sum to unity: .

That formalism for probability has some neat mathematical properties. But so what? Why should the best way of reasoning about a messy, uncertain world have neat properties? Why shouldn’t an agent reason about ‘how likely is that’ using something completely unlike probabilities? How do you *know* a sufficiently advanced Artificial Intelligence would reason in probabilities? You haven’t seen an AI, so what do you think you know and how do you think you know it?

That entirely reasonable question is what this introduction tries to answer. There are, indeed, excellent reasons beyond academic habit and mathematical convenience for why we would by default invoke ‘expected utility’ and ‘probability theory’ to think about good human decisions, talk about rational agency, or reason about sufficiently advanced AIs.

The broad form of the answer seems easier to show than to tell, so we’ll just plunge straight in.

# Why not circular preferences?

*De gustibus non est disputandum,* goes the proverb; matters of taste cannot be disputed. If I like onions on my pizza and you like pineapple, it’s not that one of us is right and one of us is wrong. We just prefer different pizza toppings.

Well, but suppose I declare to you that I *simultaneously*:

Prefer onions to pineapple on my pizza.

Prefer pineapple to mushrooms on my pizza.

Prefer mushrooms to onions on my pizza.

If we use to denote my pizza preferences, with denoting that I prefer X to Y, then I am declaring:

That sounds strange, to be sure. But is there anything *wrong* with that? Can we disputandum it?

We used the math symbol which denotes an ordering. If we ask whether can be an ordering, it naughtily violates the standard transitivity axiom .

Okay, so then maybe we shouldn’t have used the symbol or called it an ordering. Why is that necessarily bad?

We can try to imagine each pizza as having a numerical score denoting how much I like it. In that case, there’s no way we could assign consistent numbers to those three pizza toppings such that .

So maybe I don’t assign numbers to my pizza. Why is that so awful?

Are there any grounds besides “we like a certain mathematical formalism and your choices don’t fit into our math,” on which to criticize my three simultaneous preferences?

(Feel free to try to answer this yourself before continuing...)

Click here to reveal and continue:

Suppose I tell you that I prefer pineapple to mushrooms on my pizza. Suppose you’re about to give me a slice of mushroom pizza; but by paying one penny () I can instead get a slice of pineapple pizza (which is just as fresh from the oven). It seems realistic to say that most people with a pineapple pizza preference would probably pay the penny, if they happened to have a penny in their pocket.**¹**

After I pay the penny, though, and just before I’m about to get the pineapple pizza, you offer me a slice of onion pizza instead—no charge for the change! If I was telling the truth about preferring onion pizza to pineapple, I should certainly accept the substitution if it’s free.

And then to round out the day, you offer me a mushroom pizza instead of the onion pizza, and again, since I prefer mushrooms to onions, I accept the swap.

I end up with exactly the same slice of mushroom pizza I started with… and one penny poorer, because I previously paid $0.01 to swap mushrooms for pineapple.

This seems like a *qualitatively* bad behavior on my part. By virtue of my incoherent preferences which cannot be given a consistent ordering, I have shot myself in the foot, done something self-defeating. We haven’t said *how* I ought to sort out my inconsistent preferences. But no matter how it shakes out, it seems like there must be *some *better alternative—some better way I could reason that wouldn’t spend a penny to go in circles. That is, I could at least have kept my original pizza slice and not spent the penny.

In a phrase you’re going to keep hearing, I have executed a ‘dominated strategy’: there exists some other strategy that does strictly better.**²**

Or as Steve Omohundro put it: If you prefer being in Berkeley to being in San Francisco; prefer being in San Jose to being in Berkeley; and prefer being in San Francisco to being in San Jose; then you’re going to waste a lot of time on taxi rides.

None of this reasoning has told us that a non-self-defeating agent must prefer Berkeley to San Francisco or vice versa. There are at least six possible consistent orderings over pizza toppings, like etcetera, and *any* consistent ordering would avoid paying to go in circles.**³ **We have not, in this argument, used pure logic to derive that pineapple pizza must taste better than mushroom pizza to an ideal rational agent. But we’ve seen that eliminating a certain kind of shoot-yourself-in-the-foot behavior, corresponds to imposing a certain *coherence* or *consistency* requirement on whatever preferences are there.

It turns out that this is just one instance of a large family of *coherence theorems* which all end up pointing at the same set of core properties. All roads lead to Rome, and all the roads say, “If you are not shooting yourself in the foot in sense X, we can view you as having coherence property Y.”

There are some caveats to this general idea.

For example: In complicated problems, perfect coherence is usually impossible to compute—it’s just too expensive to consider *all* the possibilities.

But there are also caveats to the caveats! For example, it may be that if there’s a powerful machine intelligence that is not *visibly to us humans* shooting itself in the foot in way X, then *from our perspective* it must look like the AI has coherence property Y. If there’s some sense in which the machine intelligence is going in circles, because *not* going in circles is too hard to compute, well, *we* won’t see that either with our tiny human brains. In which case it may make sense, from our perspective, to think about the machine intelligence *as if* it has some coherent preference ordering.

We are not going to go through all the coherence theorems in this introduction. They form a very large family; some of them are a *lot* more mathematically intimidating; and honestly I don’t know even 5% of the variants.

But we can hopefully walk through enough coherence theorems to at least start to see the reasoning behind, “Why expected utility?” And, because the two are a package deal, “Why probability?”

# Human lives, mere dollars, and coherent trades

An experiment in 2000—from a paper titled “The Psychology of the Unthinkable: Taboo Trade-Offs, Forbidden Base Rates, and Heretical Counterfactuals”—asked subjects to consider the dilemma of a hospital administrator named Robert:

Robert can save the life of Johnny, a five year old who needs a liver transplant, but the transplant procedure will cost the hospital $1,000,000 that could be spent in other ways, such as purchasing better equipment and enhancing salaries to recruit talented doctors to the hospital. Johnny is very ill and has been on the waiting list for a transplant but because of the shortage of local organ donors, obtaining a liver will be expensive. Robert could save Johnny’s life, or he could use the $1,000,000 for other hospital needs.

The main experimental result was that most subjects got angry at Robert for even considering the question.

After all, you can’t put a dollar value on a human life, right?

But better hospital equipment also saves lives, or at least one hopes so.**⁴** It’s not like the other potential use of the money saves zero lives.

Let’s say that Robert has a total budget of $100,000,000 and is faced with a long list of options such as these:

$100,000 for a new dialysis machine, which will save 3 lives

$1,000,000 for a liver for Johnny, which will save 1 life

$10,000 to train the nurses on proper hygiene when inserting central lines, which will save an expected 100 lives

...

Now suppose—this is a supposition we’ll need for our theorem—that Robert *does not care at all about money,* not even a tiny bit. Robert *only* cares about maximizing the total number of lives saved. Furthermore, we suppose for now that Robert cares about every human life equally.

If Robert does save as many lives as possible, given his bounded money, then Robert must *behave like *somebody assigning some consistent dollar value to saving a human life.

We should be able to look down the long list of options that Robert took and didn’t take, and say, e.g., “Oh, Robert took all the options that saved more than 1 life per $500,000 and rejected all options that saved less than 1 life per $500,000; so Robert’s behavior is *consistent* with his spending $500,000 per life.”

Alternatively, if we can’t view Robert’s behavior as being coherent in this sense—if we cannot make up *any* dollar value of a human life, such that Robert’s choices are consistent with that dollar value—then it must be possible to move around the same amount of money, in a way that saves more lives.

We start from the qualitative criterion, “Robert must save as many lives as possible; it shouldn’t be possible to move around the same money to save more lives.” We end up with the quantitative coherence theorem, “It must be possible to view Robert as trading dollars for lives at a consistent price.”

We haven’t proven that dollars have some intrinsic worth that trades off against the intrinsic worth of a human life. By hypothesis, Robert doesn’t care about money at all. It’s just that every dollar has an *opportunity cost* in lives it could have saved if deployed differently; and this opportunity cost is the same for every dollar because money is fungible.

An important caveat to this theorem is that there may be, e.g., an option that saves a hundred thousand lives for $200,000,000. But Robert only has $100,000,000 to spend. In this case, Robert may fail to take that option even though it saves 1 life per $2,000. It was a good option, but Robert didn’t have enough money in the bank to afford it. This does mess up the elegance of being able to say, “Robert must have taken *all* the options saving at least 1 life per $500,000”, and instead we can only say this with respect to options that are in some sense small enough or granular enough.

Similarly, if an option costs $5,000,000 to save 15 lives, but Robert only has $4,000,000 left over after taking all his other best opportunities, Robert’s last selected option might be to save 8 lives for $4,000,000 instead. This again messes up the elegance of the reasoning, but Robert is still doing exactly what an agent *would* do if it consistently valued lives at 1 life per $500,000—it would buy all the best options *it could afford* that purchased at least that many lives per dollar. So that part of the theorem’s conclusion still holds.

Another caveat is that we haven’t proven that there’s some specific dollar value in Robert’s head, as a matter of psychology. We’ve only proven that Robert’s outward behavior can be *viewed as if* it prices lives at *some *consistent value, assuming Robert saves as many lives as possible.

It could be that Robert accepts every option that spends less than $500,000/life and rejects every option that spends over $600,000, and there aren’t any available options in the middle. Then Robert’s behavior can equally be *viewed as* consistent with a price of $510,000 or a price of $590,000. This helps show that we haven’t proven anything about Robert explicitly *thinking* of some number. Maybe Robert never lets himself think of a specific threshold value, because it would be taboo to assign a dollar value to human life; and instead Robert just fiddles the choices until he can’t see how to save any more lives.

We naturally have not proved by pure logic that Robert must want, in the first place, to save as many lives as possible. Even if Robert is a good person, this doesn’t follow. Maybe Robert values a 10-year-old’s life at 5 times the value of a 70-year-old’s life, so that Robert will sacrifice five grandparents to save one 10-year-old. A lot of people would see that as entirely consistent with valuing human life in general.

Let’s consider that last idea more thoroughly. If Robert considers a preteen equally valuable with 5 grandparents, so that Robert will shift $100,000 from saving 8 old people to saving 2 children, then we can no longer say that Robert wants to save as many ‘lives’ as possible. That last decision would decrease by 6 the total number of ‘lives’ saved. So we can no longer say that there’s a qualitative criterion, ‘Save as many lives as possible’, that produces the quantitative coherence requirement, ‘trade dollars for lives at a consistent rate’.

Does this mean that coherence might as well go out the window, so far as Robert’s behavior is concerned? Anything goes, now? Just spend money wherever?

“Hm,” you might think. “But… if Robert trades 8 old people for 2 children *here*… and then trades 1 child for 2 old people *there*...”

To reduce distraction, let’s make this problem be about apples and oranges instead. Suppose:

Alice starts with 8 apples and 1 orange.

Then Alice trades 8 apples for 2 oranges.

Then Alice trades away 1 orange for 2 apples.

Finally, Alice trades another orange for 3 apples.

Then in this example, Alice is using a strategy that’s *strictly dominated* across all categories of fruit. Alice ends up with 5 apples and one orange, but could’ve ended with 8 apples and one orange (by not making any trades at all). Regardless of the *relative* value of apples and oranges, Alice’s strategy is doing *qualitatively* worse than another possible strategy, if apples have any positive value to her at all.

So the fact that Alice can’t be viewed as having any coherent relative value for apples and oranges, corresponds to her ending up with qualitatively less of some category of fruit (without any corresponding gains elsewhere).

This remains true if we introduce more kinds of fruit into the problem. Let’s say the set of fruits Alice can trade includes {apples, oranges, strawberries, plums}. If we can’t look at Alice’s trades and make up some relative quantitative values of fruit, such that Alice could be trading consistently with respect to those values, then Alice’s trading strategy must have been dominated by some other strategy that would have ended up with strictly more fruit across all categories.

In other words, we need to be able to look at Alice’s trades, and say something like:

“Maybe Alice values an orange at 2 apples, a strawberry at 0.1 apples, and a plum at 0.5 apples. That would explain why Alice was willing to trade 4 strawberries for a plum, but not willing to trade 40 strawberries for an orange and an apple.”

And if we *can’t* say this, then there must be some way to rearrange Alice’s trades and get *strictly more fruit across all categories* in the sense that, e.g., we end with the same number of plums and apples, but one more orange and two more strawberries. This is a bad thing if Alice *qualitatively* values fruit from each category—prefers having more fruit to less fruit, ceteris paribus, for each category of fruit.

Now let’s shift our attention back to Robert the hospital administrator. *Either* we can view Robert as consistently assigning some *relative* value of life for 10-year-olds vs. 70-year-olds, *or* there must be a way to rearrange Robert’s expenditures to save either strictly more 10-year-olds or strictly more 70-year-olds. The same logic applies if we add 50-year-olds to the mix. We must be able to say something like, “Robert is consistently behaving as if a 50-year-old is worth a third of a ten-year-old”. If we *can’t* say that, Robert must be behaving in a way that pointlessly discards some saveable lives in some category.

Or perhaps Robert is behaving in a way which implies that 10-year-old girls are worth more than 10-year-old boys. But then the relative values of those subclasses of 10-year-olds need to be viewable as consistent; or else Robert must be qualitatively failing to save one more 10-year-old boy than could’ve been saved otherwise.

If you can denominate apples in oranges, and price oranges in plums, and trade off plums for strawberries, all at consistent rates… then you might as well take it one step further, and factor out an abstract unit for ease of notation.

Let’s call this unit *1 utilon,* and denote it €1. (As we’ll see later, the letters ‘EU’ are appropriate here.)

If we say that apples are worth €1, oranges are worth €2, and plums are worth €0.5, then this tells us the relative value of apples, oranges, and plums. Conversely, if we *can* assign consistent relative values to apples, oranges, and plums, then we can factor out an abstract unit at will—for example, by arbitrarily declaring apples to be worth €100 and then calculating everything else’s price in apples.

Have we proven by pure logic that all apples have the same utility? Of course not; you can prefer some particular apples to other particular apples. But when you’re done saying which things you qualitatively prefer to which other things, if you go around making tradeoffs in a way that can be *viewed as* not qualitatively leaving behind some things you said you wanted, we can *view you* as assigning coherent quantitative utilities to everything you want.

And that’s one coherence theorem—among others—that can be seen as motivating the concept of *utility* in decision theory.

Utility isn’t a solid thing, a separate thing. We could multiply all the utilities by two, and that would correspond to the same outward behaviors. It’s meaningless to ask how much utility you scored at the end of your life, because we could subtract a million or add a million to that quantity while leaving everything else conceptually the same.

You could pick anything you valued—say, the joy of watching a cat chase a laser pointer for 10 seconds—and denominate everything relative to that, without needing any concept of an extra abstract ‘utility’. So (just to be extremely clear about this point) we have not proven that there is a separate thing ‘utility’ that you should be pursuing instead of everything else you wanted in life.

The coherence theorem says nothing about which things to value more than others, or how much to value them relative to other things. It doesn’t say whether you should value your happiness more than someone else’s happiness, any more than the notion of a consistent preference ordering tells us whether .

(The notion that we should assign equal value to all human lives, or equal value to all sentient lives, or equal value to all Quality-Adjusted Life Years, is *utilitarianism.* Which is, sorry about the confusion, a whole ’nother separate different philosophy.)

The conceptual gizmo that maps thingies to utilities—the whatchamacallit that takes in a fruit and spits out a utility—is called a ‘utility function’. Again, this isn’t a separate thing that’s written on a stone tablet. If we multiply a utility function by 9.2, that’s conceptually the same utility function because it’s consistent with the same set of behaviors.

But in general: If we can sensibly view any agent as doing as well as qualitatively possible at *anything*, we must be able to view the agent’s behavior as consistent with there being some coherent relative quantities of wantedness for all the thingies it’s trying to optimize.

# Probabilities and expected utility

We’ve so far made no mention of *probability.* But the way that probabilities and utilities interact, is where we start to see the full structure of *expected utility* spotlighted by all the coherence theorems.

The basic notion in expected utility is that some choices present us with uncertain outcomes.

For example, I come to you and say: “Give me 1 apple, and I’ll flip a coin; if the coin lands heads, I’ll give you 1 orange; if the coin comes up tails, I’ll give you 3 plums.” Suppose you relatively value fruits as described earlier: 2 apples / orange and 0.5 apples / plum. Then *either* possible outcome gives you something that’s worth more to you than 1 apple. Turning down a so-called ‘gamble’ like that… why, it’d be a dominated strategy.

In general, the notion of ‘expected utility’ says that we assign certain quantities called *probabilities* to each possible outcome. In the example above, we might assign a ‘probability’ of to the coin landing heads (1 orange), and a ‘probability’ of to the coin landing tails (3 plums). Then the total value of the ‘gamble’ we get by trading away 1 apple is:

Conversely, if we just keep our 1 apple instead of making the trade, this has an expected utility of . So indeed we ought to trade (as the previous reasoning suggested).

“But wait!” you cry. “Where did these probabilities come from? Why is the ‘probability’ of a fair coin landing heads and not, say, or ? Who says we ought to multiply utilities by probabilities in the first place?”

If you’re used to approaching this problem from a Bayesian standpoint, then you may now be thinking of notions like prior probability and Occam’s Razor and universal priors...

But from the standpoint of coherence theorems, that’s putting the cart before the horse.

From the standpoint of coherence theorems, we don’t *start with* a notion of ‘probability’.

Instead we ought to prove something along the lines of: if you’re not using qualitatively dominated strategies, then you must *behave as if* you are multiplying utilities by certain quantitative thingies.

We might then furthermore show that, for non-dominated strategies, these utility-multiplying thingies must be between and rather than say or .

Having determined what coherence properties these utility-multiplying thingies need to have, we decide to call them ‘probabilities’. And *then*—once we know in the first place that we need ‘probabilities’ in order to not be using dominated strategies—we can start to worry about exactly what the numbers ought to be.

## Probabilities summing to 1

Here’s a taste of the kind of reasoning we might do:

Suppose that—having already accepted some previous proof that non-dominated strategies dealing with uncertain outcomes, must multiply utilities by quantitative thingies—you then say that you are going to assign a probability of to the coin coming up heads, and a probability of to the coin coming up tails.

If you’re already used to the standard notion of probability, you might object, “But those probabilities sum to when they ought to sum to !”**⁵** But now we are in coherence-land; we don’t ask “Did we violate the standard axioms that all the textbooks use?” but “What rules must non-dominated strategies obey?” *De gustibus non est disputandum;* can we *disputandum* somebody saying that a coin has a 60% probability of coming up heads and a 70% probability of coming up tails? (Where these are the only 2 possible outcomes of an uncertain coinflip.)

Well—assuming you’ve already accepted that we need utility-multiplying thingies—I might then offer you a gamble. How about you give me one apple, and if the coin lands heads, I’ll give you 0.8 apples; while if the coin lands tails, I’ll give you 0.8 apples.

According to you, the expected utility of this gamble is:

You’ve just decided to trade your apple for 0.8 apples, which sure sounds like one of ’em dominated strategies.

And that’s why *the thingies you multiply probabilities by*—the thingies that you use to weight uncertain outcomes in your imagination, when you’re trying to decide how much you want one branch of an uncertain choice—must sum to 1, whether you call them ‘probabilities’ or not.

Well… actually we just argued**⁶** that probabilities for mutually exclusive outcomes should sum to *no more than 1.* What would be an example showing that, for non-dominated strategies, the probabilities for exhaustive outcomes should sum to no less than 1?

Why exhaustive outcomes should sum to at least 1:

Suppose that, in exchange for 1 apple, I credibly offer:

* To pay you 1.1 apples if a coin comes up heads.

* To pay you 1.1 apples if a coin comes up tails.

* To pay you 1.1 apples if anything else happens.

If the probabilities you assign to these three outcomes sum to say 0.9, you will refuse to trade 1 apple for 1.1 apples.

(This is strictly dominated by the strategy of agreeing to trade 1 apple for 1.1 apples.)

## Dutch book arguments

Another way we could have presented essentially the same argument as above, is as follows:

Suppose you are a market-maker in a prediction market for some event . When you say that your price for event is , you mean that you will sell for a ticket which pays if happens (and pays out nothing otherwise). In fact, you will sell any number of such tickets!

Since you are a market-maker (that is, you are trying to encourage trading in for whatever reason), you are also willing to *buy* any number of tickets at the price . That is, I can say to you (the market-maker) “I’d like to sign a contract where you give me now, and in return I must pay you iff happens;” and you’ll agree. (We can view this as you selling me a negative number of the original kind of ticket.)

Let and denote two events such that *exactly one* of them must happen; say, is a coin landing heads and is the coin not landing heads.

Now suppose that you, as a market-maker, are motivated to avoid combinations of bets that lead into *certain *losses for you—not just losses that are merely probable, but combinations of bets such that *every* possibility leads to a loss.

Then if exactly one of and must happen, your prices and must sum to exactly . Because:

If , I buy both an -ticket and a -ticket and get a guaranteed payout of minus costs of . Since this is a guaranteed profit for me, it is a guaranteed loss for you.

If , I sell you both tickets and will at the end pay you after you have already paid me . Again, this is a guaranteed profit for me of .

This is more or less exactly the same argument as in the previous section, with trading apples. Except that: (a) the scenario is more crisp, so it is easier to generalize and scale up much more complicated similar arguments; and (b) it introduces a whole lot of assumptions that people new to expected utility would probably find rather questionable.

“What?” one might cry. “What sort of crazy bookie would buy and sell bets at exactly the same price? Why ought *anyone* to buy and sell bets at exactly the same price? Who says that I must value a gain of $1 exactly the opposite of a loss of $1? Why should the price that I put on a bet represent my degree of uncertainty about the environment? What does all of this argument about gambling have to do with real life?”

So again, the key idea is not that we are assuming anything about people valuing every real-world dollar the same; nor is it in real life a good idea to offer to buy or sell bets at the same prices.**⁷** Rather, Dutch book arguments can stand in as shorthand for some longer story in which we only assume that you prefer more apples to less apples.

The Dutch book argument above has to be seen as one more added piece in the company of all the *other *coherence theorems—for example, the coherence theorems suggesting that you ought to be quantitatively weighing events in your mind in the first place.

## Conditional probability

With more complicated Dutch book arguments, we can derive more complicated ideas such as ‘conditional probability’.

Let’s say that we’re pricing three kinds of gambles over two events and :

A ticket that costs , and pays if happens.

A ticket that doesn’t cost anything or pay anything if doesn’t happen (the ticket price is refunded); and if does happen, this ticket costs , then pays if happens.

A ticket that costs , and pays if and both happen.

Intuitively, the idea of conditional probability is that the probability of and both happening, should be equal to the probability of happening, times the probability that happens assuming that happens:

To exhibit a Dutch book argument for this rule, we want to start from the assumption of a qualitatively non-dominated strategy, and derive the quantitative rule .

So let’s give an example that violates this equation and see if there’s a way to make a guaranteed profit. Let’s say somebody:

Prices at the first ticket, aka .

Prices at the second ticket, aka .

Prices at the third ticket, aka , which ought to be assuming the first two prices.

The first two tickets are priced relatively high, compared to the third ticket which is priced relatively low, suggesting that we ought to sell the first two tickets and buy the third.

Okay, let’s ask what happens if we sell 10 of the first ticket, sell 10 of the second ticket, and buy 10 of the third ticket.

If doesn’t happen, we get $6, and pay $2. Net +$4.

If happens and doesn’t happen, we get $6, pay $10, get $7, and pay $2. Net +$1.

If happens and happens, we get $6, pay $10, get $7, pay $10, pay $2, and get $10. Net: +$1.

That is: we can get a guaranteed positive profit over all three possible outcomes.

More generally, let be the (potentially negative) amount of each ticket that is being bought (buying a negative amount is selling). Then the prices can be combined into a ‘Dutch book’ whenever the following three inequalities can be simultaneously true, with at least one inequality strict:

For this is impossible exactly iff . The proof via a bunch of algebra is left as an exercise to the reader.**⁸**

## The Allais Paradox

By now, you’d probably like to see a glimpse of the sort of argument that shows in the first place that we need expected utility—that a non-dominated strategy for uncertain choice must behave as if multiplying utilities by some kinda utility-multiplying thingies (‘probabilities’).

As far as I understand it, the real argument you’re looking for is Abraham Wald’s complete class theorem, which I must confess I don’t know how to reduce to a simple demonstration.

But we can catch a glimpse of the general idea from a famous psychology experiment that became known as the Allais Paradox (in slightly adapted form).

Suppose you ask some experimental subjects which of these gambles they would rather play:

1A: A certainty of $1,000,000.

1B: 90% chance of winning $5,000,000, 10% chance of winning nothing.

Most subjects say they’d prefer 1A to 1B.

Now ask a separate group of subjects which of these gambles they’d prefer:

2A: 50% chance of winning $1,000,000; 50% chance of winning $0.

2B: 45% chance of winning $5,000,000; 55% chance of winning $0.

In this case, most subjects say they’d prefer gamble 2B.

Note that the $ sign here denotes real dollars, not utilities! A gain of five million dollars isn’t, and shouldn’t be, worth exactly five times as much to you as a gain of one million dollars. We can use the € symbol to denote the expected utilities that are abstracted from how much you relatively value different outcomes; $ is just money.

So we certainly aren’t claiming that the first preference is paradoxical because 1B has an expected dollar value of $4.5 million and 1A has an expected dollar value of $1 million. That would be silly. We care about expected utilities, not expected dollar values, and those two concepts aren’t the same at all!

Nonetheless, the combined preferences 1A > 1B and 2A < 2B are not compatible with any coherent utility function. We cannot simultaneously have:

This was one of the earliest experiments seeming to demonstrate that actual human beings were not expected utility maximizers—a very tame idea nowadays, to be sure, but the *first definite* demonstration of that was a big deal at the time. Hence the term, “Allais Paradox”.

Now, by the general idea behind coherence theorems, since we can’t *view this behavior* as corresponding to expected utilities, we ought to be able to show that it corresponds to a dominated strategy somehow—derive some way in which this behavior corresponds to shooting off your own foot.

In this case, the relevant idea seems non-obvious enough that it doesn’t seem reasonable to demand that you think of it on your own; but if you like, you can pause and try to think of it anyway. Otherwise, just continue reading.

Again, the gambles are as follows:

1A: A certainty of $1,000,000.

1B: 90% chance of winning $5,000,000, 10% chance of winning nothing.

2A: 50% chance of winning $1,000,000; 50% chance of winning $0.

2B: 45% chance of winning $5,000,000; 55% chance of winning $0.

Now observe that Scenario 2 corresponds to a 50% chance of playing Scenario 1, and otherwise getting $0.

This, in fact, is why the combination 1A > 1B; 2A < 2B is incompatible with expected utility. In terms of one set of axioms frequently used to describe expected utility, it violates the Independence Axiom: if a gamble is preferred to (that is, ), then we ought to be able to take a constant probability and another gamble and have .

To put it another way, if I flip a coin to decide whether or not to play some entirely different game , but otherwise let you choose or , you ought to make the same choice as if I just ask you whether you prefer or . Your preference between and should be ‘independent’ of the possibility that, instead of doing anything whatsoever with or , we will do something else instead.

And since this is an axiom of expected utility, any violation of that axiom ought to correspond to a dominated strategy somehow.

In the case of the Allais Paradox, we do the following:

First, I show you a switch that can be set to A or B, currently set to A.

In one minute, I tell you, I will flip a coin. If the coin comes up heads, you will get nothing. If the coin comes up tails, you will play the gamble from Scenario 1.

From your current perspective, that is, we are playing Scenario 2: since the switch is set to A, you have a 50% chance of getting nothing and a 50% chance of getting $1 million.

I ask you if you’d like to pay a penny to throw the switch from A to B. Since you prefer gamble 2B to 2A, and some quite large amounts of money are at stake, you agree to pay the penny. From your perspective, you now have a 55% chance of ending up with nothing and a 45% chance of getting $5M.

I then flip the coin, and luckily for you, it comes up tails.

From your perspective, you are now in Scenario 1B. Having observed the coin and updated on its state, you now think you have a 90% chance of getting $5 million and a 10% chance of getting nothing. By hypothesis, you would prefer a certainty of $1 million.

So I offer you a chance to pay another penny to flip the switch back from B to A. And with so much money at stake, you agree.

I have taken your two cents on the subject.

That is: You paid a penny to flip a switch and then paid another penny to switch it back, and this is dominated by the strategy of just leaving the switch set to A.

And that’s at least a glimpse of why, if you’re not using dominated strategies, the thing you do with relative utilities is multiply them by probabilities in a consistent way, and prefer the choice that leads to a greater expectation of the variable representing utility.

**From the Allais Paradox to real life**

The real-life lesson about what to do when faced with Allais’s dilemma might be something like this:

There’s *some* amount that $1 million would improve your life compared to $0.

There’s some amount that an additional $4 million would further improve your life after the first $1 million.

You ought to visualize these two improvements as best you can, and decide whether another $4 million can produce at least *one-ninth* as much improvement, as much true value to you, as the first $1 million.

If it can, you should consistently prefer 1B > 1A; 2B > 2A. And if not, you should consistently prefer 1A > 1B; 2A > 2B.

The standard ‘paradoxical’ preferences in Allais’s experiment are standardly attributed to a certainty effect: people value the *certainty* of having $1 million, while the difference between a 50% probability and a 55% probability looms less large. (And this ties in to a number of other results about certainty, need for closure, prospect theory, and so on.)

It may sound intuitive, in an Allais-like scenario, to say that you ought to derive some value from being *certain *about the outcome. In fact this is just the reasoning the experiment shows people to be using, so of course it might sound intuitive. But that does, inescapably, correspond to a kind of thinking that produces dominated strategies.

One possible excuse might be that certainty is valuable if you need to make plans about the future; knowing the exact future lets you make better plans. This is admittedly true and a phenomenon within expected utility, though it applies in a smooth way as confidence increases rather than jumping suddenly around 100%. But in the particular dilemma as described here, you only have 1 minute before the game is played, and no time to make other major life choices dependent on the outcome.

Another possible excuse for certainty bias might be to say: “Well, I value the emotional feeling of certainty.”

In real life, we do have emotions that are directly about probabilities, and those little flashes of happiness or sadness are worth something if you care about people being happy or sad. If you say that you value the emotional feeling of being *certain* of getting $1 million, the freedom from the fear of getting $0, for the minute that the dilemma lasts and you are experiencing the emotion—well, that may just be a fact about what you value, even if it exists outside the expected utility formalism.

And this genuinely does not fit into the expected utility formalism. In an expected utility agent, probabilities are just thingies-you-multiply-utilities-by. If those thingies start generating their own utilities once represented inside the mind of the person who is an object of ethical value, you really are going to get results that are incompatible with the formal decision theory.

However, *not* being viewable as an expected utility agent does always correspond to employing dominated strategies. You are giving up *something* in exchange, if you pursue that feeling of certainty. You are potentially losing all the real value you could have gained from another $4 million, if that realized future actually would have gained you more than one-ninth the value of the first $1 million. Is a fleeting emotional sense of certainty over 1 minute, worth *automatically* discarding the potential $5-million outcome? Even if the correct answer given your values is that you properly ought to take the $1 million, treasuring 1 minute of emotional gratification doesn’t seem like the wise reason to do that. The wise reason would be if the first $1 million really was worth that much more than the next $4 million.

The danger of saying, “Oh, well, I attach a lot of utility to that comfortable feeling of certainty, so my choices are coherent after all” is not that it’s mathematically improper to value the emotions we feel while we’re deciding. Rather, by saying that the *most valuable* stakes are the emotions you feel during the minute you make the decision, what you’re saying is, “I get a huge amount of value by making decisions however humans instinctively make their decisions, and that’s much more important than the thing I’m making a decision *about.*” This could well be true for something like buying a stuffed animal. If millions of dollars or human lives are at stake, maybe not so much.

# Conclusion

The demonstrations we’ve walked through here aren’t the professional-grade coherence theorems as they appear in real math. Those have names like “Cox’s Theorem” or “the complete class theorem”; their proofs are difficult; and they say things like “If seeing piece of information A followed by piece of information B leads you into the same epistemic state as seeing piece of information B followed by piece of information A, plus some other assumptions, I can show an isomorphism between those epistemic states and classical probabilities” or “Any decision rule for taking different actions depending on your observations either corresponds to Bayesian updating given some prior, or else is strictly dominated by some Bayesian strategy”.

But hopefully you’ve seen enough concrete demonstrations to get a general idea of what’s going on with the actual coherence theorems. We have multiple spotlights all shining on the same core mathematical structure, saying dozens of different variants on, “If you aren’t running around in circles or stepping on your own feet or wantonly giving up things you say you want, we can see your behavior as corresponding to this shape. Conversely, if we can’t see your behavior as corresponding to this shape, you must be visibly shooting yourself in the foot.” Expected utility is the only structure that has this great big family of discovered theorems all saying that. It has a scattering of academic competitors, because academia is academia, but the competitors don’t have anything like that mass of spotlights all pointing in the same direction.

So if we need to pick an interim answer for “What kind of quantitative framework should I try to put around my own decision-making, when I’m trying to check if my thoughts make sense?” or “By default and barring special cases, what properties might a sufficiently advanced machine intelligence *look to us* like it possessed, at least approximately, if we couldn’t see it *visibly* running around in circles?”, then there’s pretty much one obvious candidate: Probabilities, utility functions, and expected utility.

# Further reading

To learn more about agents and AI: Consequentialist cognition; the orthogonality of agents’ utility functions and capabilities; epistemic and instrumental efficiency; instrumental strategies sufficiently capable agents tend to converge on; properties of sufficiently advanced agents.

To learn more about decision theory: The controversial counterfactual at the heart of the expected utility formula.

**¹ **It could be that somebody’s pizza preference is real, but so weak that they wouldn’t pay one penny to get the pizza they prefer. In this case, imagine we’re talking about some stronger preference instead. Like your willingness to pay at least one penny not to have your house burned down, or something.

² This does assume that the agent prefers to have more money rather than less money. “Ah, but why is it bad if one person has a penny instead of another?” you ask. If we insist on pinning down every point of this sort, then you can also imagine the $0.01 as standing in for the *time* I burned in order to move the pizza slices around in circles. That time was burned, and nobody else has it now. If I’m an effective agent that goes around pursuing my preferences, I should in general be able to sometimes convert time into other things that I want. In other words, my circular preference can lead me to incur an opportunity cost denominated in the sacrifice of other things I want, and not in a way that benefits anyone else.

**³ **There are more than six possibilities if you think it’s possible to be absolutely indifferent between two kinds of pizza.

**⁴ ** We can omit the ‘better doctors’ item from consideration: The supply of doctors is mostly constrained by regulatory burdens and medical schools rather than the number of people who want to become doctors; so bidding up salaries for doctors doesn’t much increase the total number of doctors; so bidding on a talented doctor at one hospital just means some other hospital doesn’t get that talented doctor. It’s also illegal to pay for livers, but let’s ignore that particular issue with the problem setup or pretend that it all takes place in a more sensible country than the United States or Europe.

**⁵ **Or maybe a tiny bit less than , in case the coin lands on its edge or something.

**⁶** Nothing we’re walking through here is really a coherence theorem *per se*, more like intuitive arguments that a coherence theorem ought to exist. Theorems require proofs, and nothing here is what real mathematicians would consider to be a ‘proof’.

**⁷** In real life this leads to a problem of ‘adversarial selection’, where somebody who knows more about the environment than you can decide whether to buy or sell from you. To put it another way, from a Bayesian standpoint, if an *intelligent* counterparty is deciding whether to buy or sell from you a bet on , the fact that they choose to buy (or sell) should cause you to update in favor (or against) actually happening. After all, they wouldn’t be taking the bet unless they thought they knew something you didn’t!

**⁸** The quick but advanced argument would be to say that the left-hand-side must look like a singular matrix, whose determinant must therefore be zero.

- On green by 21 Mar 2024 17:38 UTC; 261 points) (
- Ngo and Yudkowsky on alignment difficulty by 15 Nov 2021 20:31 UTC; 250 points) (
- Humans provide an untapped wealth of evidence about alignment by 14 Jul 2022 2:31 UTC; 203 points) (
- Why Subagents? by 1 Aug 2019 22:17 UTC; 174 points) (
- Biology-Inspired AGI Timelines: The Trick That Never Works by 1 Dec 2021 22:35 UTC; 165 points) (
- Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI by 26 Jan 2024 7:22 UTC; 161 points) (
- why assume AGIs will optimize for fixed goals? by 10 Jun 2022 1:28 UTC; 137 points) (
- Superintelligent AI is necessary for an amazing future, but far from sufficient by 31 Oct 2022 21:16 UTC; 132 points) (
- Why Not Subagents? by 22 Jun 2023 22:16 UTC; 130 points) (
- There are no coherence theorems by 20 Feb 2023 21:25 UTC; 128 points) (
- AI Alignment 2018-19 Review by 28 Jan 2020 2:19 UTC; 126 points) (
- Selection Theorems: A Program For Understanding Agents by 28 Sep 2021 5:03 UTC; 123 points) (
- Why should ethical anti-realists do ethics? by 16 Feb 2023 16:27 UTC; 118 points) (EA Forum;
- What do coherence arguments actually prove about agentic behavior? by 1 Jun 2024 9:37 UTC; 116 points) (
- Why The Focus on Expected Utility Maximisers? by 27 Dec 2022 15:49 UTC; 116 points) (
- There are no coherence theorems by 20 Feb 2023 21:52 UTC; 106 points) (EA Forum;
- 2019 Review: Voting Results! by 1 Feb 2021 3:10 UTC; 99 points) (
- “Deep Learning” Is Function Approximation by 21 Mar 2024 17:50 UTC; 97 points) (
- Meaning & Agency by 19 Dec 2023 22:27 UTC; 91 points) (
- What are the coolest topics in AI safety, to a hopelessly pure mathematician? by 7 May 2022 7:18 UTC; 89 points) (EA Forum;
- Rationality Exercises Prize of September 2019 ($1,000) by 11 Sep 2019 0:19 UTC; 89 points) (
- Unnatural Categories Are Optimized for Deception by 8 Jan 2021 20:54 UTC; 89 points) (
- LessWrong FAQ by 14 Jun 2019 19:03 UTC; 88 points) (
- Commentary on AGI Safety from First Principles by 23 Nov 2020 21:37 UTC; 81 points) (
- Bayesian Mindset by 21 Dec 2021 19:54 UTC; 73 points) (EA Forum;
- Ngo and Yudkowsky on alignment difficulty by 15 Nov 2021 22:47 UTC; 71 points) (EA Forum;
- Consequentialism & corrigibility by 14 Dec 2021 13:23 UTC; 66 points) (
- On green by 21 Mar 2024 17:38 UTC; 61 points) (EA Forum;
- Measuring Coherence of Policies in Toy Environments by 18 Mar 2024 17:59 UTC; 59 points) (
- Some Existing Selection Theorems by 30 Sep 2021 16:13 UTC; 54 points) (
- When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives by 9 Aug 2021 17:22 UTC; 53 points) (
- The Shutdown Problem: Incomplete Preferences as a Solution by 23 Feb 2024 16:01 UTC; 50 points) (
- The Fundamental Theorem of Asset Pricing: Missing Link of the Dutch Book Arguments by 1 Jun 2019 20:34 UTC; 42 points) (
- Broad Picture of Human Values by 20 Aug 2022 19:42 UTC; 42 points) (
- 4. Existing Writing on Corrigibility by 10 Jun 2024 14:08 UTC; 41 points) (
- Understanding Selection Theorems by 28 May 2022 1:49 UTC; 41 points) (
- Contra “Strong Coherence” by 4 Mar 2023 20:05 UTC; 39 points) (
- Convergence Towards World-Models: A Gears-Level Model by 4 Aug 2022 23:31 UTC; 38 points) (
- Why should ethical anti-realists do ethics? by 16 Feb 2023 16:27 UTC; 38 points) (
- Selection processes for subagents by 30 Jun 2022 23:57 UTC; 36 points) (
- Superintelligent AI is necessary for an amazing future, but far from sufficient by 31 Oct 2022 21:16 UTC; 35 points) (EA Forum;
- Direction of Fit by 2 Oct 2023 12:34 UTC; 34 points) (
- Value Formation: An Overarching Model by 15 Nov 2022 17:16 UTC; 34 points) (
- ‘Consequentialism’ is being used to mean several different things by 11 Jun 2022 15:43 UTC; 33 points) (EA Forum;
- 13 Dec 2019 0:51 UTC; 32 points) 's comment on Coherence arguments do not entail goal-directed behavior by (
- Characterizing Real-World Agents as a Research Meta-Strategy by 8 Oct 2019 15:32 UTC; 29 points) (
- Two Tales of AI Takeover: My Doubts by 5 Mar 2024 15:51 UTC; 29 points) (
- 2 Oct 2019 18:09 UTC; 29 points) 's comment on What are we assuming about utility functions? by (
- The Shutdown Problem: Incomplete Preferences as a Solution by 23 Feb 2024 16:01 UTC; 26 points) (EA Forum;
- 16 Mar 2022 18:43 UTC; 25 points) 's comment on Book Launch: The Engines of Cognition by (
- In Defence of Temporal Discounting in Longtermist Ethics by 13 Nov 2022 21:54 UTC; 25 points) (
- 17 Dec 2020 18:43 UTC; 24 points) 's comment on Why Subagents? by (
- On expected utility, part 1: Skyscrapers and madmen by 16 Mar 2022 21:58 UTC; 24 points) (
- Some reasons why a predictor wants to be a consequentialist by 15 Apr 2022 15:02 UTC; 23 points) (
- Turning Some Inconsistent Preferences into Consistent Ones by 18 Jul 2022 18:40 UTC; 23 points) (
- On expected utility, part 1: Skyscrapers and madmen by 16 Mar 2022 21:54 UTC; 22 points) (EA Forum;
- Biology-Inspired AGI Timelines: The Trick That Never Works by 1 Dec 2021 22:44 UTC; 22 points) (EA Forum;
- Aligning AI by optimizing for “wisdom” by 27 Jun 2023 15:20 UTC; 22 points) (
- Why Do I Think I Have Values? by 3 Feb 2022 13:35 UTC; 22 points) (
- [AN #167]: Concrete ML safety problems and their relevance to x-risk by 20 Oct 2021 17:10 UTC; 19 points) (
- 28 Jan 2023 0:40 UTC; 18 points) 's comment on Selection Theorems: A Program For Understanding Agents by (
- Why does AGI need a utility function? by 23 Aug 2022 19:58 UTC; 18 points) (
- Why not tool AI? by 19 Jan 2019 22:18 UTC; 18 points) (
- In Defence of Temporal Discounting in Longtermist Ethics by 13 Nov 2022 21:30 UTC; 17 points) (EA Forum;
- Consistencies as (meta-)preferences by 3 May 2021 15:10 UTC; 17 points) (
- What are we assuming about utility functions? by 2 Oct 2019 15:11 UTC; 17 points) (
- Framing approaches to alignment and the hard problem of AI cognition by 15 Dec 2021 19:06 UTC; 16 points) (
- Embedded Agency: Not Just an AI Problem by 27 Jun 2019 0:35 UTC; 15 points) (
- 8 Dec 2022 9:40 UTC; 14 points) 's comment on Take 7: You should talk about “the human’s utility function” less. by (
- 14 Dec 2021 18:23 UTC; 13 points) 's comment on Consequentialism & corrigibility by (
- 27 Jun 2019 14:53 UTC; 13 points) 's comment on Embedded Agency: Not Just an AI Problem by (
- Can you define “utility” in utilitarianism without using words for specific human emotions? by 21 Sep 2022 3:29 UTC; 13 points) (
- Progress links & tweets, 2022-09-08 by 8 Sep 2022 20:43 UTC; 13 points) (
- 28 Feb 2023 20:30 UTC; 13 points) 's comment on DragonGod’s Shortform by (
- Halpern’s paper—A refutation of Cox’s theorem? by 11 Aug 2021 9:25 UTC; 11 points) (
- On utility functions by 10 Feb 2023 1:22 UTC; 11 points) (
- [AN #73]: Detecting catastrophic failures by learning how agents tend to break by 13 Nov 2019 18:10 UTC; 11 points) (
- 1 Mar 2023 9:35 UTC; 9 points) 's comment on Contra “Strong Coherence” by (
- 16 Dec 2022 19:01 UTC; 9 points) 's comment on wrapper-minds are the enemy by (
- 22 Feb 2024 18:21 UTC; 7 points) 's comment on A Case for the Least Forgiving Take On Alignment by (
- 21 Dec 2022 3:36 UTC; 7 points) 's comment on Value Formation: An Overarching Model by (
- 6 Aug 2021 4:18 UTC; 5 points) 's comment on Why Subagents? by (
- The Engines of Cognition (cont.) - Los Angeles LW/ACX Meetup #175 (Wednesday, March 9th) by 9 Mar 2022 19:09 UTC; 5 points) (
- 1 Mar 2023 21:55 UTC; 5 points) 's comment on Contra “Strong Coherence” by (
- 13 Jun 2019 1:08 UTC; 5 points) 's comment on Let’s talk about “Convergent Rationality” by (
- 4 Nov 2022 21:57 UTC; 4 points) 's comment on Thread on LT/ut’s preference for billions of imminent deaths by (EA Forum;
- Jason’s links and tweets, 2022-09-08 by 8 Sep 2022 20:42 UTC; 4 points) (Progress Forum;
- Is there a “coherent decisions imply consistent utilities”-style argument for non-lexicographic preferences? by 29 Jun 2021 19:14 UTC; 4 points) (
- 11 Oct 2021 9:20 UTC; 4 points) 's comment on Selection Theorems: A Program For Understanding Agents by (
- 10 Oct 2021 14:31 UTC; 4 points) 's comment on Selection Theorems: A Program For Understanding Agents by (
- 3 Dec 2022 3:20 UTC; 4 points) 's comment on Alignment allows “nonrobust” decision-influences and doesn’t require robust grading by (
- 21 May 2023 16:33 UTC; 3 points) 's comment on GPT as an “Intelligence Forklift.” by (
- 25 Jun 2023 21:02 UTC; 3 points) 's comment on Why am I Me? by (
- 7 Aug 2022 21:11 UTC; 2 points) 's comment on How would Logical Decision Theories address the Psychopath Button? by (
- 30 Dec 2020 3:58 UTC; 2 points) 's comment on Review Voting Thread by (
- 8 Apr 2023 19:20 UTC; 2 points) 's comment on All AGI Safety questions welcome (especially basic ones) [April 2023] by (
- 3 Mar 2023 15:06 UTC; 2 points) 's comment on DragonGod’s Shortform by (
- 5 Jun 2023 12:45 UTC; 1 point) 's comment on The Control Problem: Unsolved or Unsolvable? by (EA Forum;
- 1 Mar 2023 21:48 UTC; 1 point) 's comment on Contra “Strong Coherence” by (
- 16 Apr 2020 17:19 UTC; 1 point) 's comment on The Moral Void by (
- The Impossibility of a Rational Intelligence Optimizer by 6 Jun 2024 16:14 UTC; -9 points) (

I don’t particularly like dragging out the old coherence discussions, but the annual review is partly about building common knowledge, so it’s the right time to bring it up.

This currently seems to be the canonical reference post on the subject. On the one hand, I think there are major problems/missing pieces with it. On the other hand, looking at the top “objection”-style comment (i.e. Said’s), it’s clear that the commenter didn’t even finish reading the post and doesn’t understand the pieces involved. I think this is pretty typical among people who object to coherence results: most of them have only dealt with the VNM theorem, and correctly complain about the assumptions of that theorem being too strong, but don’t know about the existence of all the other coherence theorems (including the complete class theorem mentioned in the post, and Savage’s theorem mentioned in the comments). The “real” coherence theorems do have problems with them, but they’re not the problems which a lot of people point to in VNM.

I’ll leave a more detailed review later. The point of this nomination is to build common knowledge: I’d like to get to the point where the objections to coherence theorems are the

rightobjections, rather than objections based in ignorance, and this post (and reviews of it) seem like a good place for that.This is the second nomination in order to get this in the official Review pool, in order for John S. Wentworth’s future “more detailed review” to be in the official Review pool.

I have used this post quite a few times as a citation when I want to motivate the use of expected utility theory as an ideal for making decisions, because it explains how it’s not just an elegant decisionmaking procedure from nowhere but a mathematical inevitability of the requirements to not leave money on the table or to accept guaranteed losses. I find the concept of coherence theorems a better foundation than the normal way this is explained, by pointing at the von Neumann-Morgensten axioms and saying “they look true”.