# Coherent decisions imply consistent utilities

(*Written for Arbital in 2017.*)

# Introduction to the introduction: Why expected utility?

So we’re talking about how to make good decisions, or the idea of ‘bounded rationality’, or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of ‘expected utility’ or ‘utility functions’.

And before we even ask what those are, we might first ask, *Why?*

There’s a mathematical formalism, ‘expected utility’, that some people invented to talk about making decisions. This formalism is very academically popular, and appears in all the textbooks.

But so what? Why is that *necessarily* the best way of making decisions under every kind of circumstance? Why would an Artificial Intelligence care what’s academically popular? Maybe there’s some better way of thinking about rational agency? Heck, why is this formalism popular in the first place?

We can ask the same kinds of questions about probability theory:

Okay, we have this mathematical formalism in which the chance that X happens, aka , plus the chance that X doesn’t happen, aka , must be represented in a way that makes the two quantities sum to unity: .

That formalism for probability has some neat mathematical properties. But so what? Why should the best way of reasoning about a messy, uncertain world have neat properties? Why shouldn’t an agent reason about ‘how likely is that’ using something completely unlike probabilities? How do you *know* a sufficiently advanced Artificial Intelligence would reason in probabilities? You haven’t seen an AI, so what do you think you know and how do you think you know it?

That entirely reasonable question is what this introduction tries to answer. There are, indeed, excellent reasons beyond academic habit and mathematical convenience for why we would by default invoke ‘expected utility’ and ‘probability theory’ to think about good human decisions, talk about rational agency, or reason about sufficiently advanced AIs.

The broad form of the answer seems easier to show than to tell, so we’ll just plunge straight in.

# Why not circular preferences?

*De gustibus non est disputandum,* goes the proverb; matters of taste cannot be disputed. If I like onions on my pizza and you like pineapple, it’s not that one of us is right and one of us is wrong. We just prefer different pizza toppings.

Well, but suppose I declare to you that I *simultaneously*:

Prefer onions to pineapple on my pizza.

Prefer pineapple to mushrooms on my pizza.

Prefer mushrooms to onions on my pizza.

If we use to denote my pizza preferences, with denoting that I prefer X to Y, then I am declaring:

That sounds strange, to be sure. But is there anything *wrong* with that? Can we disputandum it?

We used the math symbol which denotes an ordering. If we ask whether can be an ordering, it naughtily violates the standard transitivity axiom .

Okay, so then maybe we shouldn’t have used the symbol or called it an ordering. Why is that necessarily bad?

We can try to imagine each pizza as having a numerical score denoting how much I like it. In that case, there’s no way we could assign consistent numbers to those three pizza toppings such that .

So maybe I don’t assign numbers to my pizza. Why is that so awful?

Are there any grounds besides “we like a certain mathematical formalism and your choices don’t fit into our math,” on which criticize my three simultaneous preferences?

(Feel free to try to answer this yourself before continuing...)

Click here to reveal and continue:

Suppose I tell you that I prefer pineapple to mushrooms on my pizza. Suppose you’re about to give me a slice of mushroom pizza; but by paying one penny () I can instead get a slice of pineapple pizza (which is just as fresh from the oven). It seems realistic to say that most people with a pineapple pizza preference would probably pay the penny, if they happened to have a penny in their pocket.**¹**

After I pay the penny, though, and just before I’m about to get the pineapple pizza, you offer me a slice of onion pizza instead—no charge for the change! If I was telling the truth about preferring onion pizza to pineapple, I should certainly accept the substitution if it’s free.

And then to round out the day, you offer me a mushroom pizza instead of the onion pizza, and again, since I prefer mushrooms to onions, I accept the swap.

I end up with exactly the same slice of mushroom pizza I started with… and one penny poorer, because I previously paid $0.01 to swap mushrooms for pineapple.

This seems like a *qualitatively* bad behavior on my part. By virtue of my incoherent preferences which cannot be given a consistent ordering, I have shot myself in the foot, done something self-defeating. We haven’t said *how* I ought to sort out my inconsistent preferences. But no matter how it shakes out, it seems like there must be *some *better alternative—some better way I could reason that wouldn’t spend a penny to go in circles. That is, I could at least have kept my original pizza slice and not spent the penny.

In a phrase you’re going to keep hearing, I have executed a ‘dominated strategy’: there exists some other strategy that does strictly better.**²**

Or as Steve Omohundro put it: If you prefer being in Berkeley to being in San Francisco; prefer being in San Jose to being in Berkeley; and prefer being in San Francisco to being in San Jose; then you’re going to waste a lot of time on taxi rides.

None of this reasoning has told us that a non-self-defeating agent must prefer Berkeley to San Francisco or vice versa. There are at least six possible consistent orderings over pizza toppings, like etcetera, and *any* consistent ordering would avoid paying to go in circles.**³ **We have not, in this argument, used pure logic to derive that pineapple pizza must taste better than mushroom pizza to an ideal rational agent. But we’ve seen that eliminating a certain kind of shoot-yourself-in-the-foot behavior, corresponds to imposing a certain *coherence* or *consistency* requirement on whatever preferences are there.

It turns out that this is just one instance of a large family of *coherence theorems* which all end up pointing at the same set of core properties. All roads lead to Rome, and all the roads say, “If you are not shooting yourself in the foot in sense X, we can view you as having coherence property Y.”

There are some caveats to this general idea.

For example: In complicated problems, perfect coherence is usually impossible to compute—it’s just too expensive to consider *all* the possibilities.

But there are also caveats to the caveats! For example, it may be that if there’s a powerful machine intelligence that is not *visibly to us humans* shooting itself in the foot in way X, then *from our perspective* it must look like the AI has coherence property Y. If there’s some sense in which the machine intelligence is going in circles, because *not* going in circles is too hard to compute, well, *we* won’t see that either with our tiny human brains. In which case it may make sense, from our perspective, to think about the machine intelligence *as if* it has some coherent preference ordering.

We are not going to go through all the coherence theorems in this introduction. They form a very large family; some of them are a *lot* more mathematically intimidating; and honestly I don’t know even 5% of the variants.

But we can hopefully walk through enough coherence theorems to at least start to see the reasoning behind, “Why expected utility?” And, because the two are a package deal, “Why probability?”

# Human lives, mere dollars, and coherent trades

An experiment in 2000—from a paper titled “The Psychology of the Unthinkable: Taboo Trade-Offs, Forbidden Base Rates, and Heretical Counterfactuals”—asked subjects to consider the dilemma of a hospital administrator named Robert:

Robert can save the life of Johnny, a five year old who needs a liver transplant, but the transplant procedure will cost the hospital $1,000,000 that could be spent in other ways, such as purchasing better equipment and enhancing salaries to recruit talented doctors to the hospital. Johnny is very ill and has been on the waiting list for a transplant but because of the shortage of local organ donors, obtaining a liver will be expensive. Robert could save Johnny’s life, or he could use the $1,000,000 for other hospital needs.

The main experimental result was that most subjects got angry at Robert for even considering the question.

After all, you can’t put a dollar value on a human life, right?

But better hospital equipment also saves lives, or at least one hopes so.**⁴** It’s not like the other potential use of the money saves zero lives.

Let’s say that Robert has a total budget of $100,000,000 and is faced with a long list of options such as these:

$100,000 for a new dialysis machine, which will save 3 lives

$1,000,000 for a liver for Johnny, which will save 1 life

$10,000 to train the nurses on proper hygiene when inserting central lines, which will save an expected 100 lives

...

Now suppose—this is a supposition we’ll need for our theorem—that Robert *does not care at all about money,* not even a tiny bit. Robert *only* cares about maximizing the total number of lives saved. Furthermore, we suppose for now that Robert cares about every human life equally.

If Robert does save as many lives as possible, given his bounded money, then Robert must *behave like *somebody assigning some consistent dollar value to saving a human life.

We should be able to look down the long list of options that Robert took and didn’t take, and say, e.g., “Oh, Robert took all the options that saved more than 1 life per $500,000 and rejected all options that saved less than 1 life per $500,000; so Robert’s behavior is *consistent* with his spending $500,000 per life.”

Alternatively, if we can’t view Robert’s behavior as being coherent in this sense—if we cannot make up *any* dollar value of a human life, such that Robert’s choices are consistent with that dollar value—then it must be possible to move around the same amount of money, in a way that saves more lives.

We start from the qualitative criterion, “Robert must save as many lives as possible; it shouldn’t be possible to move around the same money to save more lives.” We end up with the quantitative coherence theorem, “It must be possible to view Robert as trading dollars for lives at a consistent price.”

We haven’t proven that dollars have some intrinsic worth that trades off against the intrinsic worth of a human life. By hypothesis, Robert doesn’t care about money at all. It’s just that every dollar has an *opportunity cost* in lives it could have saved if deployed differently; and this opportunity cost is the same for every dollar because money is fungible.

An important caveat to this theorem is that there may be, e.g., an option that saves a hundred thousand lives for $200,000,000. But Robert only has $100,000,000 to spend. In this case, Robert may fail to take that option even though it saves 1 life per $2,000. It was a good option, but Robert didn’t have enough money in the bank to afford it. This does mess up the elegance of being able to say, “Robert must have taken *all* the options saving at least 1 life per $500,000”, and instead we can only say this with respect to options that are in some sense small enough or granular enough.

Similarly, if an option costs $5,000,000 to save 15 lives, but Robert only has $4,000,000 left over after taking all his other best opportunities, Robert’s last selected option might be to save 8 lives for $4,000,000 instead. This again messes up the elegance of the reasoning, but Robert is still doing exactly what an agent *would* do if it consistently valued lives at 1 life per $500,000—it would buy all the best options *it could afford* that purchased at least that many lives per dollar. So that part of the theorem’s conclusion still holds.

Another caveat is that we haven’t proven that there’s some specific dollar value in Robert’s head, as a matter of psychology. We’ve only proven that Robert’s outward behavior can be *viewed as if* it prices lives at *some *consistent value, assuming Robert saves as many lives as possible.

It could be that Robert accepts every option that spends less than $500,000/life and rejects every option that spends over $600,000, and there aren’t any available options in the middle. Then Robert’s behavior can equally be *viewed as* consistent with a price of $510,000 or a price of $590,000. This helps show that we haven’t proven anything about Robert explicitly *thinking* of some number. Maybe Robert never lets himself think of a specific threshold value, because it would be taboo to assign a dollar value to human life; and instead Robert just fiddles the choices until he can’t see how to save any more lives.

We naturally have not proved by pure logic that Robert must want, in the first place, to save as many lives as possible. Even if Robert is a good person, this doesn’t follow. Maybe Robert values a 10-year-old’s life at 5 times the value of a 70-year-old’s life, so that Robert will sacrifice five grandparents to save one 10-year-old. A lot of people would see that as entirely consistent with valuing human life in general.

Let’s consider that last idea more thoroughly. If Robert considers a preteen equally valuable with 5 grandparents, so that Robert will shift $100,000 from saving 8 old people to saving 2 children, then we can no longer say that Robert wants to save as many ‘lives’ as possible. That last decision would decrease by 6 the total number of ‘lives’ saved. So we can no longer say that there’s a qualitative criterion, ‘Save as many lives as possible’, that produces the quantitative coherence requirement, ‘trade dollars for lives at a consistent rate’.

Does this mean that coherence might as well go out the window, so far as Robert’s behavior is concerned? Anything goes, now? Just spend money wherever?

“Hm,” you might think. “But… if Robert trades 8 old people for 2 children *here*… and then trades 1 child for 2 old people *there*...”

To reduce distraction, let’s make this problem be about apples and oranges instead. Suppose:

Alice starts with 8 apples and 1 orange.

Then Alice trades 8 apples for 2 oranges.

Then Alice trades away 1 orange for 2 apples.

Finally, Alice trades another orange for 3 apples.

Then in this example, Alice is using a strategy that’s *strictly dominated* across all categories of fruit. Alice ends up with 5 apples and one orange, but could’ve ended with 8 apples and one orange (by not making any trades at all). Regardless of the *relative* value of apples and oranges, Alice’s strategy is doing *qualitatively* worse than another possible strategy, if apples have any positive value to her at all.

So the fact that Alice can’t be viewed as having any coherent relative value for apples and oranges, corresponds to her ending up with qualitatively less of some category of fruit (without any corresponding gains elsewhere).

This remains true if we introduce more kinds of fruit into the problem. Let’s say the set of fruits Alice can trade includes {apples, oranges, strawberries, plums}. If we can’t look at Alice’s trades and make up some relative quantitative values of fruit, such that Alice could be trading consistently with respect to those values, then Alice’s trading strategy must have been dominated by some other strategy that would have ended up with strictly more fruit across all categories.

In other words, we need to be able to look at Alice’s trades, and say something like:

“Maybe Alice values an orange at 2 apples, a strawberry at 0.1 apples, and a plum at 0.5 apples. That would explain why Alice was willing to trade 4 strawberries for a plum, but not willing to trade 40 strawberries for an orange and an apple.”

And if we *can’t* say this, then there must be some way to rearrange Alice’s trades and get *strictly more fruit across all categories* in the sense that, e.g., we end with the same number of plums and apples, but one more orange and two more strawberries. This is a bad thing if Alice *qualitatively* values fruit from each category—prefers having more fruit to less fruit, ceteris paribus, for each category of fruit.

Now let’s shift our attention back to Robert the hospital administrator. *Either* we can view Robert as consistently assigning some *relative* value of life for 10-year-olds vs. 70-year-olds, *or* there must be a way to rearrange Robert’s expenditures to save either strictly more 10-year-olds or strictly more 70-year-olds. The same logic applies if we add 50-year-olds to the mix. We must be able to say something like, “Robert is consistently behaving as if a 50-year-old is worth a third of a ten-year-old”. If we *can’t* say that, Robert must be behaving in a way that pointlessly discards some saveable lives in some category.

Or perhaps Robert is behaving in a way which implies that 10-year-old girls are worth more than 10-year-old boys. But then the relative values of those subclasses 10-year-olds need to be viewable as consistent; or else Robert must be qualitatively failing to save one more 10-year-old boy than could’ve been saved otherwise.

If you can denominate apples in oranges, and price oranges in plums, and trade off plums for strawberries, all at consistent rates… then you might as well take it one step further, and factor out an abstract unit for ease of notation.

Let’s call this unit *1 utilon,* and denote it €1. (As we’ll see later, the letters ‘EU’ are appropriate here.)

If we say that apples are worth €1, oranges are worth €2, and plums are worth €0.5, then this tells us the relative value of apples, oranges, and plums. Conversely, if we *can* assign consistent relative values to apples, oranges, and plums, then we can factor out an abstract unit at will—for example, by arbitrarily declaring apples to be worth €100 and then calculating everything else’s price in apples.

Have we proven by pure logic that all apples have the same utility? Of course not; you can prefer some particular apples to other particular apples. But when you’re done saying which things you qualitatively prefer to which other things, if you go around making tradeoffs in way that can be *viewed as* not qualitatively leaving behind some things you said you wanted, we can *view you* as assigning coherent quantitative utilities to everything you want.

And that’s one coherence theorem—among others—that can be seen as motivating the concept of *utility* in decision theory.

Utility isn’t a solid thing, a separate thing. We could multiply all the utilities by two, and that would correspond to the same outward behaviors. It’s meaningless to ask how much utility you scored at the end of your life, because we could subtract a million or add a million to that quantity while leaving everything else conceptually the same.

You could pick anything you valued—say, the joy of watching a cat chase a laser pointer for 10 seconds—and denominate everything relative to that, without needing any concept of an extra abstract ‘utility’. So (just to be extremely clear about this point) we have not proven that there is a separate thing ‘utility’ that you should be pursuing instead of everything else you wanted in life.

The coherence theorem says nothing about which things to value more than others, or how much to value them relative to other things. It doesn’t say whether you should value your happiness more than someone else’s happiness, any more than the notion of a consistent preference ordering tells us whether .

(The notion that we should assign equal value to all human lives, or equal value to all sentient lives, or equal value to all Quality-Adjusted Life Years, is *utilitarianism.* Which is, sorry about the confusion, a whole ’nother separate different philosophy.)

The conceptual gizmo that maps thingies to utilities—the whatchamacallit that takes in a fruit and spits out a utility—is called a ‘utility function’. Again, this isn’t a separate thing that’s written on a stone tablet. If we multiply a utility function by 9.2, that’s conceptually the same utility function because it’s consistent with the same set of behaviors.

But in general: If we can sensibly view any agent as doing as well as qualitatively possible at *anything*, we must be able to view the agent’s behavior as consistent with there being some coherent relative quantities of wantedness for all the thingies it’s trying to optimize.

# Probabilities and expected utility

We’ve so far made no mention of *probability.* But the way that probabilities and utilities interact, is where we start to see the full structure of *expected utility* spotlighted by all the coherence theorems.

The basic notion in expected utility is that some choices present us with uncertain outcomes.

For example, I come to you and say: “Give me 1 apple, and I’ll flip a coin; if the coin lands heads, I’ll give you 1 orange; if the coin comes up tails, I’ll give you 3 plums.” Suppose you relatively value fruits as described earlier: 2 apples / orange and 0.5 apples / plum. Then *either* possible outcome gives you something that’s worth more to you than 1 apple. Turning down a so-called ‘gamble’ like that… why, it’d be a dominated strategy.

In general, the notion of ‘expected utility’ says that we assign certain quantities called *probabilities* to each possible outcome. In the example above, we might assign a ‘probability’ of to the coin landing heads (1 orange), and a ‘probability’ of to the coin landing tails (3 plums). Then the total value of the ‘gamble’ we get by trading away 1 apple is:

Conversely, if we just keep our 1 apple instead of making the trade, this has an expected utilty of . So indeed we ought to trade (as the previous reasoning suggested).

“But wait!” you cry. “Where did these probabilities come from? Why is the ‘probability’ of a fair coin landing heads and not, say, or ? Who says we ought to multiply utilities by probabilities in the first place?”

If you’re used to approaching this problem from a Bayesian standpoint, then you may now be thinking of notions like prior probability and Occam’s Razor and universal priors...

But from the standpoint of coherence theorems, that’s putting the cart before the horse.

From the standpoint of coherence theorems, we don’t *start with* a notion of ‘probability’.

Instead we ought to prove something along the lines of: if you’re not using qualitatively dominated strategies, then you must *behave as if* you are multiplying utilities by certain quantitative thingies.

We might then furthermore show that, for non-dominated strategies, these utility-multiplying thingies must be between and rather than say or .

Having determined what coherence properties these utility-multiplying thingies need to have, we decide to call them ‘probabilities’. And *then*—once we know in the first place that we need ‘probabilities’ in order to not be using dominated strategies—we can start to worry about exactly what the numbers ought to be.

## Probabilities summing to 1

Here’s a taste of the kind of reasoning we might do:

Suppose that—having already accepted some previous proof that non-dominated strategies dealing with uncertain outcomes, must multiply utilities by quantitative thingies—you then say that you are going to assign a probability of to the coin coming up heads, and a probability of to the coin coming up tails.

If you’re already used to the standard notion of probability, you might object, “But those probabilities sum to when they ought to sum to !”**⁵** But now we are in coherence-land; we don’t ask “Did we violate the standard axioms that all the textbooks use?” but “What rules must non-dominated strategies obey?” *De gustibus non est disputandum;* can we *disputandum* somebody saying that a coin has a 60% probability of coming up heads and a 70% probability of coming up tails? (Where these are the only 2 possible outcomes of an uncertain coinflip.)

Well—assuming you’ve already accepted that we need utility-multiplying thingies—I might then offer you a gamble. How about you give me one apple, and if the coin lands heads, I’ll give you 0.8 apples; while if the coin lands tails, I’ll give you 0.8 apples.

According to you, the expected utility of this gamble is:

You’ve just decided to trade your apple for 0.8 apples, which sure sounds like one of ’em dominated strategies.

And that’s why *the thingies you multiply probabilities by*—the thingies that you use to weight uncertain outcomes in your imagination, when you’re trying to decide how much you want one branch of an uncertain choice—must sum to 1, whether you call them ‘probabilities’ or not.

Well… actually we just argued**⁶** that probabilities for mutually exclusive outcomes should sum to *no more than 1.* What would be an example showing that, for non-dominated strategies, the probabilities for exhaustive outcomes should sum to no less than 1?

Why exhaustive outcomes should sum to at least 1:

Suppose that, in exchange for 1 apple, I credibly offer:

* To pay you 1.1 apples if a coin comes up heads.

* To pay you 1.1 apples if a coin comes up tails.

* To pay you 1.1 apples if anything else happens.

If the probabilities you assign to these three outcomes sum to say 0.9, you will refuse to trade 1 apple for 1.1 apples.

(This is strictly dominated by the strategy of agreeing to trade 1 apple for 1.1 apples.)

## Dutch book arguments

Another way we could have presented essentially the same argument as above, is as follows:

Suppose you are a market-maker in a prediction market for some event . When you say that your price for event is , you mean that you will sell for a ticket which pays if happens (and pays out nothing otherwise). In fact, you will sell any number of such tickets!

Since you are a market-maker (that is, you are trying to encourage trading in for whatever reason), you are also willing to *buy* any number of tickets at the price . That is, I can say to you (the market-maker) “I’d like to sign a contract where you give me now, and in return I must pay you iff happens;” and you’ll agree. (We can view this as you selling me a negative number of the original kind of ticket.)

Let and denote two events such that *exactly one* of them must happen; say, is a coin landing heads and is the coin not landing heads.

Now suppose that you, as a market-maker, are motivated to avoid combinations of bets that lead into *certain *losses for you—not just losses that are merely probable, but combinations of bets such that *every* possibility leads to a loss.

Then if exactly one of and must happen, your prices and must sum to exactly . Because:

If , I buy both an -ticket and a -ticket and get a guaranteed payout of minus costs of . Since this is a guaranteed profit for me, it is a guaranteed loss for you.

If , I sell you both tickets and will at the end pay you after you have already paid me . Again, this is a guaranteed profit for me of .

This is more or less exactly the same argument as in the previous section, with trading apples. Except that: (a) the scenario is more crisp, so it is easier to generalize and scale up much more complicated similar arguments; and (b) it introduces a whole lot of assumptions that people new to expected utility would probably find rather questionable.

“What?” one might cry. “What sort of crazy bookie would buy and sell bets at exactly the same price? Why ought *anyone* to buy and sell bets at exactly the same price? Who says that I must value a gain of $1 exactly the opposite of a loss of $1? Why should the price that I put on a bet represent my degree of uncertainty about the environment? What does all of this argument about gambling have to do with real life?”

So again, the key idea is not that we are assuming anything about people valuing every real-world dollar the same; nor is it in real life a good idea to offer to buy or sell bets at the same prices.**⁷** Rather, Dutch book arguments can stand in as shorthand for some longer story in which we only assume that you prefer more apples to less apples.

The Dutch book argument above has to be seen as one more added piece in the company of all the *other *coherence theorems—for example, the coherence theorems suggesting that you ought to be quantitatively weighing events in your mind in the first place.

## Conditional probability

With more complicated Dutch book arguments, we can derive more complicated ideas such as ‘conditional probability’.

Let’s say that we’re pricing three kinds of gambles over two events and :

A ticket that costs , and pays if happens.

A ticket that doesn’t cost anything or pay anything if doesn’t happen (the ticket price is refunded); and if does happen, this ticket costs , then pays if happens.

A ticket that costs , and pays if and both happen.

Intuitively, the idea of conditional probability is that the probability of and both happening, should be equal to the probability of happening, times the probability that happens assuming that happens:

To exhibit a Dutch book argument for this rule, we want to start from the assumption of a qualitatively non-dominated strategy, and derive the quantitative rule .

So let’s give an example that violates this equation and see if there’s a way to make a guaranteed profit. Let’s say somebody:

Prices at the first ticket, aka .

Prices at the second ticket, aka .

Prices at the third ticket, aka , which ought to be assuming the first two prices.

The first two tickets are priced relatively high, compared to the third ticket which is priced relatively low, suggesting that we ought to sell the first two tickets and buy the third.

Okay, let’s ask what happens if we sell 10 of the first ticket, sell 10 of the second ticket, and buy 10 of the third ticket.

If doesn’t happen, we get $6, and pay $2. Net +$4.

If happens and doesn’t happen, we get $6, pay $10, get $7, and pay $2. Net +$1.

If happens and happens, we get $6, pay $10, get $7, pay $10, pay $2, and get $10. Net: +$1.

That is: we can get a guaranteed positive profit over all three possible outcomes.

More generally, let be the (potentially negative) amount of each ticket that is being bought (buying a negative amount is selling). Then the prices can be combined into a ‘Dutch book’ whenever the following three inequalities can be simultaneously true, with at least one inequality strict:

For this is impossible exactly iff . The proof via a bunch of algebra is left as an exercise to the reader.**⁸**

## The Allais Paradox

By now, you’d probably like to see a glimpse of the sort of argument that shows in the first place that we need expected utility—that a non-dominated strategy for uncertain choice must behave as if multiplying utilities by some kinda utility-multiplying thingies (‘probabilities’).

As far as I understand it, the real argument you’re looking for is Abraham Wald’s complete class theorem, which I must confess I don’t know how to reduce to a simple demonstration.

But we can catch a glimpse of the general idea from a famous psychology experiment that became known as the Allais Paradox (in slightly adapted form).

Suppose you ask some experimental subjects which of these gambles they would rather play:

1A: A certainty of $1,000,000.

1B: 90% chance of winning $5,000,000, 10% chance of winning nothing.

Most subjects say they’d prefer 1A to 1B.

Now ask a separate group of subjects which of these gambles they’d prefer:

2A: 50% chance of winning $1,000,000; 50% chance of winning $0.

2B: 45% chance of winning $5,000,000; 55% chance of winning $0.

In this case, most subjects say they’d prefer gamble 2B.

Note that the $ sign here denotes real dollars, not utilities! A gain of five million dollars isn’t, and shouldn’t be, worth exactly five times as much to you as a gain of one million dollars. We can use the € symbol to denote the expected utilities that are abstracted from how much you relatively value different outcomes; $ is just money.

So we certainly aren’t claiming that the first preference is paradoxical because 1B has an expected dollar value of $4.5 million and 1A has an expected dollar value of $1 million. That would be silly. We care about expected utilities, not expected dollar values, and those two concepts aren’t the same at all!

Nonetheless, the combined preferences 1A > 1B and 2A < 2B are not compatible with any coherent utility function. We cannot simultaneously have:

This was one of the earliest experiments seeming to demonstrate that actual human beings were not expected utility maximizers—a very tame idea nowadays, to be sure, but the *first definite* demonstration of that was a big deal at the time. Hence the term, “Allais Paradox”.

Now, by the general idea behind coherence theorems, since we can’t *view this behavior* as corresponding to expected utilities, we ought to be able to show that it corresponds to a dominated strategy somehow—derive some way in which this behavior corresponds to shooting off your own foot.

In this case, the relevant idea seems non-obvious enough that it doesn’t seem reasonable to demand that you think of it on your own; but if you like, you can pause and try to think of it anyway. Otherwise, just continue reading.

Again, the gambles are as follows:

1A: A certainty of $1,000,000.

1B: 90% chance of winning $5,000,000, 10% chance of winning nothing.

2A: 50% chance of winning $1,000,000; 50% chance of winning $0.

2B: 45% chance of winning $5,000,000; 55% chance of winning $0.

Now observe that Scenario 2 corresponds to a 50% chance of playing Scenario 1, and otherwise getting $0.

This, in fact, is why the combination 1A > 1B; 2A < 2B is incompatible with expected utility. In terms of one set of axioms frequently used to describe expected utility, it violates the Independence Axiom: if a gamble is preferred to (that is, ), then we ought to be able to take a constant probability and another gamble and have .

To put it another way, if I flip a coin to decide whether or not to play some entirely different game , but otherwise let you choose or , you ought to make the same choice as if I just ask you whether you prefer or . Your preference between and should be ‘independent’ of the possibility that, instead of doing anything whatsoever with or , we will do something else instead.

And since this is an axiom of expected utility, any violation of that axiom ought to correspond to a dominated strategy somehow.

In the case of the Allais Paradox, we do the following:

First, I show you a switch that can be set to A or B, currently set to A.

In one minute, I tell you, I will flip a coin. If the coin comes up heads, you will get nothing. If the coin comes up tails, you will play the gamble from Scenario 1.

From your current perspective, that is, we are playing Scenario 2: since the switch is set to A, you have a 50% chance of getting nothing and a 50% chance of getting $1 million.

I ask you if you’d like to pay a penny to throw the switch from A to B. Since you prefer gamble 2B to 2A, and some quite large amounts of money are at stake, you agree to pay the penny. From your perspective, you now have a 55% chance of ending up with nothing and a 45% chance of getting $5M.

I then flip the coin, and luckily for you, it comes up tails.

From your perspective, you are now in Scenario 1B. Having observed the coin and updated on its state, you now think you have a 90% chance of getting $5 million and a 10% chance of getting nothing. By hypothesis, you would prefer a certainty of $1 million.

So I offer you a chance to pay another penny to flip the switch back from B to A. And with so much money at stake, you agree.

I have taken your two cents on the subject.

That is: You paid a penny to flip a switch and then paid another penny to switch it back, and this is dominated by the strategy of just leaving the switch set to A.

And that’s at least a glimpse of why, if you’re not using dominated strategies, the thing you do with relative utilities is multiply them by probabilities in a consistent way, and prefer the choice that leads to a greater expectation of the variable representing utility.

**From the Allais Paradox to real life**

The real-life lesson about what to do when faced with Allais’s dilemma might be something like this:

There’s *some* amount that $1 million would improve your life compared to $0.

There’s some amount that an additional $4 million would further improve your life after the first $1 million.

You ought to visualize these two improvements as best you can, and decide whether another $4 million can produce at least *one-ninth* as much improvement, as much true value to you, as the first $1 million.

If it can, you should consistently prefer 1B > 1A; 2B > 2A. And if not, you should consistently prefer 1A > 1B; 2A > 2B.

The standard ‘paradoxical’ preferences in Allais’s experiment are standardly attributed to a certainty effect: people value the *certainty* of having $1 million, while the difference between a 50% probability and a 55% probability looms less large. (And this ties in to a number of other results about certainty, need for closure, prospect theory, and so on.)

It may sound intuitive, in an Allais-like scenario, to say that you ought to derive some value from being *certain *about the outcome. In fact this is just the reasoning the experiment shows people to be using, so of course it might sound intuitive. But that does, inescapably, correspond to a kind of thinking that produces dominated strategies.

One possible excuse might be that certainty is valuable if you need to make plans about the future; knowing the exact future lets you make better plans. This is admittedly true and a phenomenon within expected utility, though it applies in a smooth way as confidence increases rather than jumping suddenly around 100%. But in the particular dilemma as described here, you only have 1 minute before the game is played, and no time to make other major life choices dependent on the outcome.

Another possible excuse for certainty bias might be to say: “Well, I value the emotional feeling of certainty.”

In real life, we do have emotions that are directly about probabilities, and those little flashes of happiness or sadness are worth something if you care about people being happy or sad. If you say that you value the emotional feeling of being *certain* of getting $1 million, the freedom from the fear of getting $0, for the minute that the dilemma lasts and you are experiencing the emotion—well, that may just be a fact about what you value, even if it exists outside the expected utility formalism.

And this genuinely does not fit into the expected utility formalism. In an expected utility agent, probabilities are just thingies-you-multiply-utilities-by. If those thingies start generating their own utilities once represented inside the mind of the person who is an object of ethical value, you really are going to get results that are incompatible with the formal decision theory.

However, *not* being viewable as an expected utility agent does always correspond to employing dominated strategies. You are giving up *something* in exchange, if you pursue that feeling of certainty. You are potentially losing all the real value you could have gained from another $4 million, if that realized future actually would have gained you more than one-ninth the value of the first $1 million. Is a fleeting emotional sense of certainty over 1 minute, worth *automatically* discarding the potential $5-million outcome? Even if the correct answer given your values is that you properly ought to take the $1 million, treasuring 1 minute of emotional gratification doesn’t seem like the wise reason to do that. The wise reason would be if the first $1 million really was worth that much more than the next $4 million.

The danger of saying, “Oh, well, I attach a lot of utility to that comfortable feeling of certainty, so my choices are coherent after all” is not that it’s mathematically improper to value the emotions we feel while we’re deciding. Rather, by saying that the *most valuable* stakes are the emotions you feel during the minute you make the decision, what you’re saying is, “I get a huge amount of value by making decisions however humans instinctively make their decisions, and that’s much more important than the thing I’m making a decision *about.*” This could well be true for something like buying a stuffed animal. If millions of dollars or human lives are at stake, maybe not so much.

# Conclusion

The demonstrations we’ve walked through here aren’t the professional-grade coherence theorems as they appear in real math. Those have names like “Cox’s Theorem” or “the complete class theorem”; their proofs are difficult; and they say things like “If seeing piece of information A followed by piece of information B leads you into the same epistemic state as seeing piece of information B followed by piece of information A, plus some other assumptions, I can show an isomorphism between those epistemic states and classical probabilities” or “Any decision rule for taking different actions depending on your observations either corresponds to Bayesian updating given some prior, or else is strictly dominated by some Bayesian strategy”.

But hopefully you’ve seen enough concrete demonstrations to get a general idea of what’s going on with the actual coherence theorems. We have multiple spotlights all shining on the same core mathematical structure, saying dozens of different variants on, “If you aren’t running around in circles or stepping on your own feet or wantonly giving up things you say you want, we can see your behavior as corresponding to this shape. Conversely, if we can’t see your behavior as corresponding to this shape, you must be visibly shooting yourself in the foot.” Expected utility is the only structure that has this great big family of discovered theorems all saying that. It has a scattering of academic competitors, because academia is academia, but the competitors don’t have anything like that mass of spotlights all pointing in the same direction.

So if we need to pick an interim answer for “What kind of quantitative framework should I try to put around my own decision-making, when I’m trying to check if my thoughts make sense?” or “By default and barring special cases, what properties might a sufficiently advanced machine intelligence *look to us* like it possessed, at least approximately, if we couldn’t see it *visibly* running around in circles?”, then there’s pretty much one obvious candidate: Probabilities, utility functions, and expected utility.

# Further reading

To learn more about agents and AI: Consequentialist cognition; the orthogonality of agents’ utility functions and capabilities; epistemic and instrumental efficiency; instrumental strategies sufficiently capable agents tend to converge on; properties of sufficiently advanced agents.

To learn more about decision theory: The controversial counterfactual at the heart of the expected utility formula.

**¹ **It could be that somebody’s pizza preference is real, but so weak that they wouldn’t pay one penny to get the pizza they prefer. In this case, imagine we’re talking about some stronger preference instead. Like your willingness to pay at least one penny not to have your house burned down, or something.

² This does assume that the agent prefers to have more money rather than less money. “Ah, but why is it bad if one person has a penny instead of another?” you ask. If we insist on pinning down every point of this sort, then you can also imagine the $0.01 as standing in for the *time* I burned in order to move the pizza slices around in circles. That time was burned, and nobody else has it now. If I’m an effective agent that goes around pursuing my preferences, I should in general be able to sometimes convert time into other things that I want. In other words, my circular preference can lead me to incur an opportunity cost denominated in the sacrifice of other things I want, and not in a way that benefits anyone else.

**³ **There are more than six possibilities if you think it’s possible to be absolutely indifferent between two kinds of pizza.

**⁴ ** We can omit the ‘better doctors’ item from consideration: The supply of doctors is mostly constrained by regulatory burdens and medical schools rather than the number of people who want to become doctors; so bidding up salaries for doctors doesn’t much increase the total number of doctors; so bidding on a talented doctor at one hospital just means some other hospital doesn’t get that talented doctor. It’s also illegal to pay for livers, but let’s ignore that particular issue with the problem setup or pretend that it all takes place in a more sensible country than the United States or Europe.

**⁵ **Or maybe a tiny bit less than , in case the coin lands on its edge or something.

**⁶** Nothing we’re walking through here is really a coherence theorem *per se*, more like intuitive arguments that a coherence theorem ought to exist. Theorems require proofs, and nothing here is what real mathematicians would consider to be a ‘proof’.

**⁷** In real life this leads to a problem of ‘adversarial selection’, where somebody who knows more about the environment than you can decide whether to buy or sell from you. To put it another way, from a Bayesian standpoint, if an *intelligent* counterparty is deciding whether to buy or sell from you a bet on , the fact that they choose to buy (or sell) should cause you to update in favor (or against) actually happening. After all, they wouldn’t be taking the bet unless they thought they knew something you didn’t!

**⁸** The quick but advanced argument would be to say that the left-hand-side must look like a singular matrix, whose determinant must therefore be zero.

Obligatory: Coherence arguments do not imply goal-directed behavior

Also Coherent behaviour in the real world is an incoherent concept

(Note: This comment mostly concerns the material in the first three sections of the post. I have not yet read, but only skimmed, the section titled “Probabilities and expected utility”. It seems to cover material I am familiar with, but I will read it in detail when I have more time.)Eliezer, you speak here of reasons why an agent ought to behave as if its preferences satisfy the

transitivityaxiom, specifically; you discuss circular preferences, and their unfortunate effects, as the consequences of transitivity violations. You also discuss theindependenceaxiom in the latter half of the post. You have discussed reasons to accept these two axioms before, in the Sequences.However, the Von Neumann–Morgenstern utility theorem (the most commonly used, as far as I am aware, formalization of decision-theoretic utility) has four axioms; and an agent’s preferences must satisfy all four in order for a utility function to be constructable from them.

It so happens that the two axioms you do

notdiscuss are precisely the two axioms that I (and many economists; see below) find most suspect. The case for transitivity is obvious; the case for independence is not obvious but nonetheless reasonably solid. What I should like to see you discuss is the case forcontinuityand, especially, forcompleteness.Why “especially” completeness? Well, Robert Aumann had this to say on the matter:

(From his 1962 paper “Utility Theory without the Completeness Axiom”.)

Also relevant is “Expected Utility Theory without the Completeness Axiom” (Dubra et. al., 2001):

(See a previous comment of mine for a longer quote.)

What unfortunate consequences follow from violations of completeness or continuity? What coherence theorems dictate the acceptance of those axioms? These, it seems to me, would make useful topics for any further posts along these lines…

Incidentally, there are also reasons for hesitation to accept the independence axiom.

For continuity, it’s reasonable to assume this because all computable functions are continuous. See theorem 4.4 of https://eccc.weizmann.ac.il/resources/pdf/ica.pdf

Edit: I realized that the continuity assumption is different (though related) from assuming the utility function is continuous. My guess is that computability is still a good justification for this, but I’d have to check that that actually follows.

“Because all computable functions are continuous”—how does this make any sense? Why can’t I just pick a value x=1 and if it’s left limit and right limit are p, set the function to p+1 at x=1.

Because equality of (computable) real numbers is uncomputable. So is calculating the limit of an infinite sequence of them.

In more detail: a computable real number must be represented by a Turing machine (or your favorite Turing-equivalent model) that generates some representation of it as a possibly infinite string. Equality of the infinite output of two Turing machines is uncomputable.

In fact, picking a representation for “computable” real numbers, and implementing basic arithmetic on them is non-trivial. The usual decimal or binary strings of digits won’t work.

Hmm, I’m still not following. Limits are uncomputable in general, but I just need one computational function where I know the limits at one point and then I can set it to p+1 instead. Why wouldn’t this function still be computable? Maybe “computable function” is being defined differently than I would expect.

To compute that function for an unknown argument x, you would have to determine whether x is equal to 1. But if real numbers are encoded as infinite strings, there is no way to tell whether x=1 in finite time. If x happens to be 1, then however long an initial segment of that representation you examined, you could never be sure that x was not very slightly different from 1. In the usual decimal representation, if you see 1.00000.… if the number is greater than 1 you will eventually know that, but if the zeroes go on forever, you can never know that. Similarly if you see 0.99999.....

I’m not sure how relevant this is to the original context, see other replies to Adele Lopez’s ancestor comment.

Okay, so there is an additional assumption that these strings are all encoded as infinite sequences. Instead, they could be encoded with a system that starts by listing the number of digits or −1 if the sequence if infinite, then provide those digits. That’s a pretty key property to not mention (then again, I can’t criticise too much as I was too lazy to read the PDF). Thanks for the explanation!

This seems to be a

non sequitur.Suppose it were true that preferences that violate the continuity axiom imply a utility function that is uncomputable. (This hardly seems worse or less convenient than the case—which is, in fact, the actual state of affairs—where continuity violations imply that

noutility function can represent one’s preferences, computable or otherwise… but let’s set that aside, for now.)How would this constitute a reason to have one’s preferences conform to the continuity axiom…?

Presumably, any agent which we manage to build will be computable. So to the extent our agent is using utility functions, they will be continuous.

If an agent is only capable of computable observations, but has a discontinuous utility function, then if the universe is in a state where the utility function is discontinuous, the agent will need to spend an infinite amount of time (or as long as the universe state remains at such a point) determining the utility of the current state. I think it might be possible to use this to create a more concrete exploit.

There are several objections one could make to this line of reasoning. Here are two.

First:do you believe thatwe, humansare uncomputable? If we are uncomputable, then it is clearly possible to construct an uncomputable agent. If, conversely, we are computable, then whatever reasoning you apply to an agent we build can be applied to us as well. Do you think it does apply to us?Second:supposing your reasoning holds, why should it not be a reason for our constructed agent not to use utility functions, rather than a reason for said agent to have continuous preferences?(This is a good time to mention, again, that this entire tangent is moot, as violating the continuity axiom—or any of the axioms—means that

noutility function, computable or not, can be constructed from your preferences. But even if that weren’t the case, the above objections apply.)While I don’t find completeness so problematic, I got quite confused by Eliezer’s post. Firstly, it would make much more sense to first explain what “utility” is, in the sense that it is used here. Secondly, the justification of transitivity is common, but using a word like “dominated strategy” there does not make much sense, because you can only evaluate strategies if you know the utility functions (and it also mixes up words). Thirdly, it’s necessary to discuss all axioms and their implications. For example, in standard preferences theory under certainty, it’s possible to have preferences that are complete and transitive but you cannot get a utility function from. Fourthly, I am still confused whether this talk about expected utility is only normative or also a positive description of humans, or kinda both.

He is referring to decision-theoretic utility, in the sense in which the term is used in economics and game theory.

Such (“lexicographic”) preferences violate the continuity axiom.

Eliezer is definitely speaking normatively; none of the VNM axioms reliably apply to humans in a descriptive sense. Eliezer is concerned with the design of artificial agents, for which task it is necessary to determine what axioms their preferences ought to conform to (among other things).

Meta: I’m unclear about the context in which this post is to be read, and its purpose. Googling phrases from below the fold tells me that it appears both here and on Arbital, although there is no indication here or there that this is a cross-post, and Arbital posts are not routinely posted here. It includes a link to a blog post of 2018, so appears to be of recent composition, but reads like a posting from the Sequences now long in the past, and I am not sure it contains any ideas not present there. It begins

in medias res(“So, we’re talking...”), yet does not refer back to any of the implied predecessors.I notice that I am confused.

Huh, you’re right: this is just a re-post of an Arbital article.

I must say I feel rather cheated. When I saw this, I was under the impression that Eliezer had composed this post for Less Wrong, and had posted it to Less Wrong; I assumed that there was therefore some chance he might respond to comments. But that seems not to be the case. (Is it even Eliezer who posted it? Or someone else using his account, as happened, IIRC, with

Inadequate Equilibria?)I, too, would like to know what the purpose of this post is.

I asked Eliezer if it made sense to cross-post this from Arbital, and did the cross-posting when he approved. I’m sorry it wasn’t clear that this was a cross-post! I intended to make this clearer, but my idea was bad (putting the information on the sequence page) and I also implemented it wrong (the sequence didn’t previously display on the top of this post).

This post was originally written as a nontechnical introduction to expected utility theory and coherence arguments. Although it begins

in media resstylistically, it doesn’t have any prereqs or context beyond “this is part of a collection of introductory resources covering a wide variety of technical and semitechnical topics.”Per the first sentence, the main purpose is for this to be a linkable resource for conversations/inquiry about human rationality and conversations/inquiry about AGI:

There have been loose plans for a while to cross-post content from Arbital to LW (maybe all of it; maybe just the best or most interesting stuff), but as I mentioned downthread, we’re doing more cross-post experiments sooner than we would have because Arbital’s been having serious performance issues.

I see, thanks. That does explain things.

Some questions occur to me, which I don’t expect you necessarily to answer at once, but hope you (and/or whoever is responsible for the Arbital content or the decisions to post it to LW) will consider:

In your opinion, does this post (still? ever?) work well as a “linkable resource for conversations about human rationality and … AGI”?

Are there plans (by Eliezer, or by anyone else) to revise this content? Or is it meant to stand unchanged, as a matter of “historical interest” only, so to speak?

Relatedly to #2, is it productive to engage with this post, by commenting, discussing, critiquing? (In any sense other than “it’s fun and/or personally edifying to do so”?) That is: is there anyone “on the other end”, so to speak, who might read (and possibly even participate in) such discussions, and take action (such as writing an updated version of this material, to pick a simple example) as a result?

For whom is this post intended, and by whom? Whose purposes does it serve, whom is it meant to benefit, and who may reasonably judge whether it is serving its purpose?

Presumably to keep morale up by making it look like the rightful Caliph is still alive and producing output.

I believe the intention was for this post to appear as part of a sequence that more clearly situated it as part of a series of re-posts from Arbital, but there were some mix-ups that made the sequence title not show up by default. I agree the current implementation is confusing.

Meta: (some of?) the linked Arbital pages do not seem to work. For example, https://arbital.com/p/probability_theory/ shows me a blank page:

(There was also some sort of red box with something about a “pipeline error” or something, but it disappeared.)

I am using Chrome 74.0.3729.131 (the latest as of this writing) on a Mac.

Arbital has been getting increasingly slow and unresponsive. The LW team is looking for fixes or work-arounds, but they aren’t familiar with the Arbital codebase. In the meantime, I’ve been helping cross-post some content from Arbital to LW so it’s available at all.

Is it possible to create, and make available, a dump of the Arbital content? I’ve no doubt that there are people who’d be willing to host the entire thing, or convert it en masse into another format, etc.

Edit:Actually, if you could just post a complete list of Arbital page names, I could extract the content myself, as the API to request page content seems sufficiently straightforward.We’d talked about getting a dump out as well, and your plan sounds great to me! The LW team should get back to you with a list at some point (unless they think of a better idea).

While we have a long-term plan of importing Arbital’s content into LessWrong (after LessWrong acquires some wiki-like features to make it make sense), we have not taken responsibility for the maintenance of Arbital itself.

I assume you mean ‘no one has this responsibility for Arbital anymore’, and not that there’s someone else who has this responsibility.

A week or so ago Arbital was working but had load times of several minutes.

I find it confusing that the only thing that matters to a rational agent is the

expectationof utility, i.e., that the details of the probability distribution of utilities do not matter.I understand that VNM theorem proves that from what seem reasonable axioms, but on the other hand it seems to me that there is nothing irrational about having different risk preferences. Consider the following two scenarios

A: you gain utility 1 with probability 1

B: you gain utility 0 with probability

^{1}⁄_{2}or utility 2 with probability^{1}⁄_{2}According to expected utility, it is irrational to be anything but indifferent to between A and B. This seems wrong to me. I can even go a bit further, consider a third option:

C: you gain utility 0.9 with probability 1

Expected utility says it is irrational to prefer C to B, but this seems perfectly reasonable to me. It’s optimizing for the worst-case instead of the average case. Is there a direct way of showing that preferring B to C is irrational?

This is part of the meaning of ‘utility’. In real life we often have risk-averse strategies where, for example, 100% chance at 100 dollars is preferred to 50% chance of losing 100 dollars and 50% chance of gaining 350 dollars. But, under the assumption that our risk-averse tendencies satisfy the coherence properties from the post, this simply means that our utility is not linear in dollars. As far as I know this captures most of the situations where risk-aversion comes into play: often you simply cannot tolerate extremely negative outliers, meaning that your expected utility is mostly dominated by some large negative terms, and the best possible action is to minimize the probability that these outcomes occur.

Also there is the following: consider the case where you are repeatedly offered bets of the example you give (B versus C). You know this in advance, and are allowed to redesign your decision theory from scratch (but you cannot change the definition of ‘utility’ or the bets being offered). What criteria would you use to determine if B is preferable to C? The law of large numbers(/central limit theorem) states that in the long run with probability 1 the option with higher expected value will give you more utilons, and in fact that this number is the only number you need to figure out which option is the better pick in the long run.

The tricky bit is the question whether this also applies to one-shot problems or not. Maybe there are rational strategies that use, say, the aggregate median instead of the expected value, which has the same limit behaviour. My intuition is that this clashes with what we mean with ‘probability’ - even if this particular problem is a one-off, at least our strategy should generalise to all situations where we talk about probability

^{1}⁄_{2}, and then the law of large numbers applies again. I also suspect that any agent that uses more information to make this decision than the expected value to decide (in particular, occasionally deliberately chooses the option with lower expected utility) can be cheated out of utilons with clever adversarial selections of offers, but this is just a guess.This is the crux. It seems to me that the expected utility frame work means that if you prefer A to B in one time choice, then you must also prefer n repetitions of A to n repetitions of B, because the fact that you have larger variance for n=1 does not matter. This seems intuitively wrong to me.

I’d hold that it’s the reverse that seems more questionable. If n is a large number then the Law of Large Numbers may be applicable (“the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.”).

You may be interested in reading this series of posts.

Robyn Dawes makes a more detailed version of precisely this argument in

Rational Choice in an Uncertain World. I summarize his argument in an old comment of mine. (The axiom you must reject, incidentally, if you find this sort of reasoning convincing, is the independence axiom.)Thanks, I looked at the discussion you linked with interest. I think I understand my confusion a little better, but I am still confused.

I can walk through the proof of the VNM theorem and see where the independence axiom comes in and how it leads to u(A)=u(B) in my example. The axiom of independence itself feels unassailable to me and I am not quite sure this is a strong enough argument against it. Maybe having a more direct argument from axiom of independence to unintuitive result would be more convincing.

Maybe the answer is to read Dawes book, thanks for the reference.

Well, the axiom of independence is just that: an axiom. It doesn’t need to be assailed; we can take it as axiomatic, or not. If we do take it as axiomatic, certain interesting analyses become possible (depending on what other axioms we adopt). If we refuse to do so, then bad things happen—or so it’s claimed.

In any case, Dawes’ argument (and related ones) about the independence axiom fundamentally concerns the question of

what properties of an outcome distribution we should concern ourselves with. (Here “outcome distribution” can refer to a probability distribution, or to some set of outcomes, distributed across time, space, individuals, etc., that is generated by some policy, which we may perhaps view as the output of a generator with some probability distribution.)A VNM-compliant agent behaves as if it is maximizing the

expectationof the utility of its outcome distribution. It is not concerned at all with other properties of that distribution, such as dispersion (i.e., standard deviation or some related measure) or skewness. (Or, to put it another way, a VNM-compliant agent is unconcerned with theformof the outcome distribution.)What Dawes is saying is simply that, contra the assumptions of VNM-rationality, there seems to be ample reason to concern ourselves with, for instance, the skewness of the outcome distribution, and not just its expectation. But if we do prefer one outcome distribution to another, where the dis-preferred distribution has a higher expectation (but a “better” skewness), then we violate the independence axiom.

I get what you are saying. You have convinced me that the following two statements are contradictory:

Axiom of Independence: preferring A to B implies preferring ApC to BpC for any p and C.

The variance and higher moments of utility matter, not just the expected value.

My confusion is that it intuitively it seems both must be true for a rational agent but I guess my intuition is just wrong.

Thanks for your comments, they were very illuminating.

I think you are not allowed to refer explicitly to utility in the options. That is an option of “I do not choose this option” is selfdefeating and illformed. In another post I posited a risk-averse utility function that references amount of paperclips. Maximising the utility function doesn’t maximise expected amount of paperclips. Even if the physical objects of interest are paperclips and we value them linearly a paperclip is not synonymous with utilon. It’s not a thing you can give out in an option.

I was going to answer that I can easily reword my example to not explicitly mention any utility values, but when I tried to that it very quickly led to something where it is obvious that u(A) = u(C). I guess my rewording was basically going through the steps of the proof of VNM theorem.

I am still not sure I am convinced by your objection, as I don’t think there’s anything self-referential in my example, but that did give me some pause.

In a case where you are going to pick less variance less expected value over more variance more expected value it will mean that option needs to have a bigger “utility number”. In order to get that you need to mess with how utility is calculated. Then it becomes ambigious whether the “utility-fruits” are redefined in the same go as we redefine how we compare options. If we name them “paperclips” it’s clear that they are not touched by such redefining.

It triggerred a “type-unsafety” trigger but the operation overall might be safe as it doesn’t actualise the danger. For example having an option of “plum + 2 utility” could give one agent “plum + apple” if it valued apples and “plum + pear” if it valued pears. I guess if you consistenly replace all physical items for their utility values it doesn’t happen.

In the case of “gain 1 utility with probability 1” if your agent is risk-seeking it might give this option “actual” utility less than 1. In general if we lose the distribution independence we might need to retain the information of our suboutcomes rather than collapsing it to he a single number. For if an agent is risk-seeking it’s clear that it would prefer A=( 5% 0,90% 1, 5% 2) to B=(100%, 1). But same risk-seeking in combined lotteries would make it prefer C=(5% , 90% A, 5% A+A) over A. When comparing C and A it’s not sufficent to know that their expected utilities are 1.

The hypothetical person with circular preferences in where to be reminds me of the hero of The Phantom Tollbooth, Milo, whose own location preferences are described this way: “When he was in school he longed to be out, and when he was out he longed to be in. On the way he thought about coming home, and coming home he thought about going. Wherever he was he wished he were somewhere else, and when he got there he wondered why he’d bothered.”

My understanding of the arguments against using a utility maximiser is that proponents accept that this will lead to sub-optimal or dominated outcomes, but they are happy to accept this because they believe that these AIs will be easier to align. This seems like a completely reasonable trade-off to me. For example, imagine that choosing option A is worth 1 utility. Option B is worth 1.1 utility if 100 mathematical statements are all correct, but −1000 otherwise (we are ignoring the costs of reading through and thinking about all 100 mathematical statements). Even if each of the statements seems obviously correct, there is a decent chance that you messed up on at least 1 of them, so you’ll most likely want to take the outside view and pick option A. So I don’t think it’s necessarily an issue if the AI is doing things that are obviously stupid from an inside view.

There is one other explanations for the results of those experiments.

In a real world, it’s quite uncommon that somebody tells you exact probabilities—no you need to infer them from the situation around you. And we the people, we pretty much suck at assigning numeric values to probabilities. When I say 99%, it probably means something like 90%. When I say 90%, I’d guess 70% corresponds to that.

But that doesn’t mean that people behave irrationally. If you view the proposed scenarios through the described lens, it’s more like:

a) Certainty of million or ~60% chance on getting 5 millions.

b) Slightly higher probability of getting a million but the difference is much smaller than the actual error in the estimation of probabilities themselves.

With this in mind, the actual behaviour of people makes much more sense.

I think you’re right that this is part of where the intuition comes from. But it’s still irrational in a context where you actually know the probabilities accurately enough.

True, but that’s usually very artificial context. Often when someone claims they know the probabilities accurately enough, they are mistaken or lying.

Formatting request: can the footnote numbers be augmented with links that jump to the footnote text? (I presume this worked in Arbital but broke when it was moved here.)

I have to trade off the cost of following high complexity decision theory against the risk of being dominated*the badness of being dominated.

“Is a fleeting emotional sense of certainty over 1 minute, worth automatically discarding the potential $5-million outcome?”—I know it’s mostly outside of what is being modelled here, but suspect that someone who takes the 90% bet and wins nothing might experience much more than just a fleeting sense of disappointment, much more than someone who takes the 45% chance and doesn’t win.

The food preferrence example is rather self defeating. Most people don’t mechanically and predictably choose X over y and z when all are available...they also have preferences for variety, trying new things, impressing people they are with, and and so on. People whose preferences are both predictable and incoherent can be gamed… but that doesnt mean everyone has coherent preferences, because coherent preferences need to be defined against a limited framework (without randomness or meta preferences).. and because having messy, unpredictable preferences protects you against being gamed as well as predictable and coherent preferences.

A similar pattern emerges when considering ethics. Under the assumption that ethics=utilitarianism, the ethical person needs to have consistent preferences… but the assumption is doing a lot of the lifting. Utitarianism is a latecomer, and WEIRD. Most people run on a mishmash of virtue theory and deontology.

Proveable consistency and proveable inconsistency both require unrealistically precise and predictable behaviour. There’s an argument that if you are going to be precise and predictable, you should also be coherent, but it doesn’t show that people actually have UFs.

The “damage” from shooting your own foot is defined in the terms of the utility-number.

Say I pick a dominated strategy that nets me 2 apples and the dominating strategy nets me 3 apples. If on another level of modelling I can know that the first apples are clean and the 2 apples in the dominating arrangement have worms I might be happy to be dominated. Apple-level damage is okay (while nutritional level damage might not be). All deductive results are tautologies but “if you can’t model the agent as trying to achieve goal X then it’s inefficient at achieving X” seems very far from “incoherent agents are stupid”.

If some of the apples are clean and others have worms, then that is modeled in your preference ordering: you prefer clean apples to wormy ones, perhaps at some exchange rate, etc. We then stipulate that

allthe apples are clean (or all are wormy, or all have an equal chance of being clean vs. wormy, etc.), and the analysis proceeds as before.That said, your general point is worth exploring. If we suppose, as Eliezer says, that

… and if we further suppose that her preferences are intransitive, then we conclude that Alice’s strategy is strictly dominated by some other.

That is—Alice’s strategy is strictly dominated in terms of

apples(or fruit in general). It can’t be dominated inutility, of course, because we cannot construct a utility function from Alice’s preferences (on account of their intransitivity)!Well, and so what? Is this bad

according to Alice’s own preferences? Can we show this? How would we do that? By asking Alice whether she prefers the outcome (5 apples and 1 orange) to the initial state (8 apples and 1 orange)? But what good is that? If Alice’s preferences are circular, then it’s entirely possible (in fact, it’s true) that the outcome (5 apples and 1 orange) bothdominates, andis dominated by, the initial state (8 apples and 1 orange).(More accurately, that’s true

ifwe’re permitted to say that if strategy X dominates Y, and Y dominates Z, then X dominates Z. It’s not possible for an agent to prefer X to Y and, simultaneously, Y to X, however intransitive their preferences are, if they still obey the completeness axiom. Of course, if an agent’s preferences are intransitiveandincomplete, then it can prefer X to Y, and also Y to X.)The point is this: it’s not so easy to show that an agent’s strategy is sub-optimal

according to its own preferencesif those preferences violate the axioms. We can gesture at some intuitive considerations like “well, that’s obviously stupid”, but these amount to little more than the fact thatwefind the violated axioms intuitively attractive in the given case.I was thinking of another agent judging my strategies and making a backed argument why I am wrong. If someone said “you were suboptimal on fruit front, I fixed that mistake for you” and I arrive at a table with 2 worm apples, I would be annoyed/pissed. I am assuming that the other agent can’t evaluate their cleanness—it’s all fruit to them. Moreover it might be that worm apples are rare and observing my trade activity it might be inductively well supported that I seem to value “fruit-maximization” a great deal (nutrition maximisation with clean fruit is just fruit maximisation). And it might be important to understand that he didn’t mean to cause wormy apples (he isn’t even capable of meaning that) but his actions might have infact caused it.

In the case that wormy apples are frequent the hypothesis that I am a fruit-maximiser is violated clearly enough that he knows to be on shaky grounds on modelling me as a fruitmaximiser. For some very unskilled traders they might confuse one type of fruit with another and be inconsistent because they can’t get their fruit categories straight. At some midskill “fruitmaximisement” peaks and those that don’t understand things beyond that point will confuse those that are yet to get to fruitmaximization and those that are past that. Expecting super-intelligent things to be consistent kind of assumes that if a metric ever becomes a good goal higher levels will never be weaker on that metric, that maximation strictly grows and never decreases with ability for all submetrics.