# A summary of Savage’s foundations for probability and utility.

Edit: I think the P2c I wrote origi­nally may have been a bit too weak; fixed that. Nev­er­mind, recheck­ing, that wasn’t needed.

More ed­its (now con­soli­dated): Edited non­triv­ial­ity note. Edited to­tal­ity note. Added in the defi­ni­tion of nu­mer­i­cal prob­a­bil­ity in terms of qual­i­ta­tive prob­a­bil­ity (though not the proof that it works). Also slight clar­ifi­ca­tions on im­pli­ca­tions of P6′ and P6‴ on par­ti­tions into equiv­a­lent and al­most-equiv­a­lent parts, re­spec­tively.

One very late edit, June 2: Even though we don’t get countable ad­di­tivity, we still want a σ-alge­bra rather than just an alge­bra (this is needed for some of the proofs in the “par­ti­tion con­di­tions” sec­tion that I don’t go into here). Also noted nonempti­ness of gam­bles.

The idea that ra­tio­nal agents act in a man­ner iso­mor­phic to ex­pected-util­ity max­i­miz­ers is of­ten used here, typ­i­cally jus­tified with the Von Neu­mann-Mor­gen­stern the­o­rem. (The last of Von Neu­mann and Mor­gen­stern’s ax­ioms, the in­de­pen­dence ax­iom, can be grounded in a Dutch book ar­gu­ment.) But the Von Neu­mann-Mor­gen­stern the­o­rem as­sumes that the agent already mea­sures its be­liefs with (finitely ad­di­tive) prob­a­bil­ities. This in turn is of­ten jus­tified with Cox’s the­o­rem (valid so long as we as­sume a “large world”, which is im­plied by e.g. the ex­is­tence of a fair coin). But Cox’s the­o­rem as­sumes as an ax­iom that the plau­si­bil­ity of a state­ment is taken to be a real num­ber, a very large as­sump­tion! I have also seen this jus­tified here with Dutch book ar­gu­ments, but these all seem to as­sume that we are already us­ing some no­tion of ex­pected util­ity max­i­miza­tion (which is not only some­what cir­cu­lar, but also a con­sid­er­ably stronger as­sump­tion than that plau­si­bil­ities are mea­sured with real num­bers).

There is a way of ground­ing both (finitely ad­di­tive) prob­a­bil­ity and util­ity si­mul­ta­neously, how­ever, as de­tailed by Leonard Sav­age in his Foun­da­tions of Statis­tics (1954). In this ar­ti­cle I will state the ax­ioms and defi­ni­tions he gives, give a sum­mary of their log­i­cal struc­ture, and sug­gest a slight mod­ifi­ca­tion (which is equiv­a­lent math­e­mat­i­cally but slightly more philo­soph­i­cally satis­fy­ing). I would also like to ask the ques­tion: To what ex­tent can these ax­ioms be grounded in Dutch book ar­gu­ments or other more ba­sic prin­ci­ples? I warn the reader that I have not worked through all the proofs my­self and I sug­gest sim­ply find­ing a copy of the book if you want more de­tail.

Peter Fish­burn later showed in Utility The­ory for De­ci­sion Mak­ing (1970) that the ax­ioms set forth here ac­tu­ally im­ply that util­ity is bounded.

(Note: The ver­sions of the ax­ioms and defi­ni­tions in the end pa­pers are for­mu­lated slightly differ­ently from the ones in the text of the book, and in the 1954 ver­sion have an er­ror. I’ll be us­ing the ones from the text, though in some cases I’ll re­for­mu­late them slightly.)

## Prim­i­tive no­tions; prefer­ence given a set of states

We will use the fol­low­ing prim­i­tive no­tions. Firstly, there is a set S of “states of the world”; the ex­act cur­rent state of the world is un­known to the agent. Se­condly, there is a set F of “con­se­quences”—things that can hap­pen as a re­sult of the agent’s ac­tions. Ac­tions or acts will be in­ter­preted as func­tions f:S→F, as two ac­tions which have the same con­se­quences re­gard­less of the state of the world are in­dis­t­in­guish­able and hence con­sid­ered equal. While the agent may be un­cer­tain as to the ex­act re­sults of its ac­tions, this can be folded into his un­cer­tainty about the state of the world. Fi­nally, we in­tro­duces as prim­i­tive a re­la­tion ≤ on the set of ac­tions, in­ter­preted as “is not preferred to”. I.e., f≤g means that given a choice be­tween ac­tions f and g, the agent will ei­ther pre­fer g or be in­differ­ent. As usual, sets of states will be referred to as “events”, and for the usual rea­sons we may want to re­strict the set of ad­mis­si­ble events to a boolean σ-sub­alge­bra of ℘(S), though I don’t know if that’s re­ally nec­es­sary here (Sav­age doesn’t seem to do so, though he does dis­cuss it some).

In any case, we then have the fol­low­ing ax­iom:

P1. The re­la­tion ≤ is a to­tal pre­order.

The in­tu­ition here for tran­si­tivity is pretty clear. For to­tal­ity, if the agent is pre­sented with a choice of two acts, it must choose one of them! Or be in­differ­ent. Per­haps we could in­stead use a par­tial pre­order (or or­der?), though this would give us two differ­ent in­dis­t­in­guish­able fla­vors of in­differ­ence, which seems prob­le­matic. But this could be use­ful if we wanted in­tran­si­tive in­differ­ence. So long as in­differ­ence is tran­si­tive, though, we can col­lapse this into a to­tal pre­order.

As usual we can then define f≥g, f<g (mean­ing “it is false that g≤f”), and g>f. I will use f≡g to mean “f≤g and g≤f”, i.e., the agent is in­differ­ent be­tween f and g. (Sav­age uses an equals sign with a dot over it.)

Note that though ≤ is defined in terms of how the agent chooses when pre­sented with two op­tions, Sav­age later notes that there is a con­struc­tion of W. Allen Wal­lis that al­lows one to ad­duce the agent’s prefer­ence or­der­ing among a finite set of more than two op­tions (mod­ulo in­differ­ence): Sim­ply tell the agent to rank the op­tions given, and that af­ter­ward, two of them will be cho­sen uniformly at ran­dom, and it will get whichever one it ranked higher.

The sec­ond ax­iom states that if two ac­tions have the same con­se­quences in some situ­a­tion, just what that equal con­se­quence is does not af­fect their rel­a­tive or­der­ing:

P2. Sup­pose f≤g, and B is a set of states such f and g agree on B. If f’ and g’ are an­other pair of acts which, out­side of B, agree with f and g re­spec­tively, and on B, agree with each other, then f’≤g’.

In other words, to de­cide be­tween two ac­tions, only the cases where they ac­tu­ally have differ­ent con­se­quences mat­ter.

With this ax­iom, we can now define:

D1. We say “f≤g given B” to mean that if f’ and g’ are ac­tions such that f’ agrees with f on B, g’ agrees with g on B, and f’ and g’ agree with each other out­side of B, then f’≤g’.

Due to ax­iom P2, this is well-defined.

Here is where I would like to sug­gest a small mod­ifi­ca­tion to this setup. The no­tion of “f≤g given B” is im­plic­itly taken to be how the agent makes de­ci­sions if it knows that B ob­tains. How­ever it seems to me that we should ac­tu­ally take “f≤g given B”, rather than f≤g, to be the prim­i­tive no­tion, ex­plic­itly in­ter­peted as “the agent does not pre­fer f to g if it knows that B ob­tains”. The agent always has some state of prior knowl­edge and this way we have ex­plic­itly speci­fied de­ci­sions un­der a given state of knowl­edge—the acts we are con­cerned with—as the ba­sis of our the­ory. Rather than defin­ing f≤g given B in terms of ≤, we can define f≤g to mean “f≤g given S” and then add ad­di­tional ax­ioms gov­ern­ing the re­la­tion be­tween “≤ given B” for vary­ing B, which in Sav­age’s setup are the­o­rems or part of the defi­ni­tion D1.

(Speci­fi­cally, I would mod­ify P1 and P2 to talk about “≤ given B” rather than ≤, and add the fol­low­ing the­o­rems as ax­ioms:

P2a. If f and g agree on B, then f≡g given B.

P2b. If B⊆C, f≤g given C, and f and g agree out­side B, then f≤g given B.

P2c. If B and C are dis­joint, and f≤g given B and given C, then f≤g given B∪C.

This is a lit­tle un­wieldy and per­haps there is an eas­ier way—these might not be min­i­mal. But they do seem to be suffi­cient.)

In any case, re­gard­less which way we do it, we’ve now es­tab­lished the no­tion of prefer­ence given that a set of states ob­tains, as well as prefer­ence with­out ad­di­tional knowl­edge, so hence­forth I’ll freely use both as Sav­age does with­out wor­ry­ing about which makes a bet­ter foun­da­tion, since they are equiv­a­lent.

## Order­ing on preferences

The next defi­ni­tion is sim­ply to note that we can sen­si­bly talk about f≤b, b≤f, b≤c where here b and c are con­se­quences rather than ac­tions, sim­ply by in­ter­pret­ing con­se­quences as con­stant func­tions. (So the agent does have a prefer­ence or­der­ing on con­se­quences, it’s just in­duced from its or­der­ing on ac­tions. We do it this way since it’s its choices be­tween ac­tions we can ac­tu­ally see.)

How­ever, the third ax­iom reifies this in­duced or­der­ing some­what, by de­mand­ing that it be in­var­i­ant un­der gain­ing new in­for­ma­tion.

P3′. If b and c are con­se­quences and b≤c, then b≤c given any B.

Thus the fact that the agent may change prefer­ences given new in­for­ma­tion, just re­flects its un­cer­tainty about the re­sults of their ac­tions, rather than ac­tu­ally prefer­ring differ­ent con­se­quences in differ­ent states (any such prefer­ences can be done away with by sim­ply ex­pand­ing the set of con­se­quences).

Really this is not strong enough, but to state the ac­tual P3 we will first need a defi­ni­tion:

D3. An event B is said to be null if f≤g given B for any ac­tions f and g.

Null sets will cor­re­spond to sets of prob­a­bil­ity 0, once nu­mer­i­cal prob­a­bil­ity is in­tro­duced. Prob­a­bil­ity here is to be ad­duced from the agent’s prefer­ences, so we can­not dis­t­in­guish be­tween “the agent is cer­tain that B will not hap­pen” and “if B ob­tains, the agent doesn’t care what hap­pens”.

Now we can state the ac­tual P3:

P3. If b and c are con­se­quences and B is not null, then b≤c given B if and only if b≤c.

P3′, by con­trast, al­lowed some col­laps­ing of prefer­ence on gain­ing new in­for­ma­tion; here we have dis­al­lowed that ex­cept in the case where the new in­for­ma­tion is enough to col­lapse all prefer­ences en­tirely (a sort of “end of the world” or “fatal er­ror” sce­nario).

## Qual­i­ta­tive probability

We’ve in­tro­duced above the idea of “prob­a­bil­ity 0” (and hence im­plic­itly prob­a­bil­ity 1; ob­serve that “¬B is null” is equiv­a­lent to “for any f and g, f≤g given B if and only if f≤g”). Now we want to ex­pand this to prob­a­bil­ity more gen­er­ally. But we will not ini­tially get num­bers out of it; rather we will first just get an­other to­tal pre­order­ing, A≤B, “A is at most as prob­a­ble as B”.

How can we de­ter­mine which of two events the agent thinks is more prob­a­ble? Have it bet on them, of course! First, we need a non­triv­ial­ity ax­iom so it has some things to bet on.

P5. There ex­ist con­se­quences b and c such that b>c.

(I don’t know what the re­sults would be if in­stead we used the weaker non­triv­ial­ity ax­iom “there ex­ist ac­tions f and g such that f<g”, i.e., “S is not null”. That we even­tu­ally get that ex­pected util­ity for com­par­ing all acts sug­gests that this should work, but I haven’t checked.)

So let us now con­sider a class of ac­tions which I will call “wa­gers”. (Sav­age doesn’t have any spe­cial term for these.) Define “the wa­ger on A for b over c” to mean the ac­tion that, on A, re­turns b, and oth­er­wise, re­turns c. Denote this by wA,b,c. Then we pos­tu­late:

P4. Let b>b’ be a pair of con­se­quences, and c>c’ an­other such pair. Then for any events A and B, wA,b,b’≤wB,b,b’ if and only if wA,c,c’≤wB,c,c’.

That is to say, if the agent is given the choice be­tween bet­ting on event A and bet­ting on event B, and the prize and booby prize are the same re­gard­less of which it bets on, then it shouldn’t just mat­ter just what the prize and booby prize are—it should just bet on whichever it thinks is more prob­a­ble. Hence we can define:

D4. For events A and B, we say “A is at most as prob­a­ble as B”, de­noted A≤B, if wA,b,b’≤wB,b,b’, where b>b’ is a pair of con­se­quences.

By P4, this is well-defined. We can then show that the re­la­tion on events ≤ is a to­tal pre­order, so we can use the usual no­ta­tion when talk­ing about it (again, ≡ will de­note equiv­alence).

In fact, ≤ is not only a to­tal pre­order, but a qual­i­ta­tive prob­a­bil­ity:

1. ≤ is a to­tal preorder

2. ∅≤A for any event A

3. ∅<S

4. Given events B, C, and D with D dis­joint from B and C, then B≤C if and only if B∪D≤C∪D.

(There is no con­di­tion cor­re­spond­ing to countable ad­di­tivity; as men­tioned above, we sim­ply won’t get countable ad­di­tivity out of this.) Note also that un­der this, A≡∅ if and only if A is null in the ear­lier sense. Also, we can define “A≤B given C” by com­par­ing the wa­gers given C; this is equiv­a­lent to the con­di­tion that A∩C≤B∩C. This re­la­tion is too a qual­i­ta­tive prob­a­bil­ity.

## Par­ti­tion con­di­tions and nu­mer­i­cal probability

In or­der to get real num­bers to ap­pear, we are of course go­ing to have to make some sort of Archimedean as­sump­tion. In this sec­tion I dis­cuss what some of these look like and then ul­ti­mately state P6, the one Sav­age goes with.

First, defi­ni­tions. We will be con­sid­er­ing finitely-ad­di­tive prob­a­bil­ity mea­sures on the set of states, i.e. a func­tion P from the set of events to the in­ter­val [0,1] such that P(S)=1, and for dis­joint B and C, P(B∪C)=P(B)+P(C). We will say “P agrees with ≤” if for ev­ery A and B, A≤B if and only if P(A)≤P(B); and we will say “P al­most agrees with ≤” if for ev­ery A and B, A≤B im­plies P(A)≤P(B). (I.e., in the lat­ter case, nu­mer­i­cal prob­a­bil­ity is al­lowed to col­lapse some dis­tinc­tions be­tween events that the agent might not ac­tu­ally be in­differ­ent be­tween.)

We’ll be con­sid­er­ing here par­ti­tions of the set of states S. We’ll say a par­ti­tion of S is “uniform” if the parts are all equiv­a­lent. More gen­er­ally we’ll say it is “al­most uniform” if, for any r, the union of any r parts is at most as prob­a­ble as the union of any r+1 parts. (This is us­ing ≤, re­mem­ber; we don’t have nu­mer­i­cal prob­a­bil­ities yet!) (Note that any uniform par­ti­tion is al­most uniform.) Then it turns out that the fol­low­ing are equiv­a­lent:

1. There ex­ist al­most-uniform par­ti­tions of S into ar­bi­trar­ily large num­bers of parts.

2. For any B>∅, there ex­ists a par­ti­tion of S with each part less prob­a­ble than B.

3. There ex­ists a (nec­es­sar­ily unique) finitely ad­di­tive prob­a­bil­ity mea­sure P that al­most agrees with ≤, which has the prop­erty that for any B and any 0≤λ≤1, there is a C⊆B such that P(C)=λP(B).

(Definitely not go­ing into the proof of this here. How­ever, the ac­tual defi­ni­tion of the nu­mer­i­cal prob­a­bil­ity P(A) is not so com­pli­cated: Let k(A,n) de­note the largest r such that there ex­ists an al­most-uniform par­ti­tion of S into n parts, for which there is some union of r parts, C, such that C≤A. Then the se­quence k(A,n)/​n always con­verges, and we can define P(A) to be its limit.)

So we could use this as our 6th ax­iom:

P6‴. For any B>∅, there ex­ists a par­ti­tion of S with each part less prob­a­ble than B.

Sav­age notes that other au­thors have as­sumed the stronger

P6″. There ex­ist uniform par­ti­tions of S into ar­bi­trar­ily large num­bers of parts.

since there’s an ob­vi­ous jus­tifi­ca­tion for this: the ex­is­tence of a fair coin! If a fair coin ex­ists, then we can gen­er­ate a uniform par­ti­tion of S into 2n parts sim­ply by flip­ping it n times and con­sid­er­ing the re­sult. We’ll ac­tu­ally end up as­sum­ing some­thing even stronger than this.

So P6‴ does get us nu­mer­i­cal prob­a­bil­ities, but they don’t nec­es­sar­ily re­flect all of the qual­i­ta­tive prob­a­bil­ity; P6‴ is only strong enough to force al­most agree­ment. Though it is stronger than that when ∅ is in­volved—it does turn out that P(B)=0 if and only if B≡∅. (And hence also P(B)=1 if and only if B≡S.) But more gen­er­ally it turns out that P(B)=P(C) if and only if B and C are “al­most equiv­a­lent”, which I will de­note B≈C (Sav­age uses a sym­bol I haven’t seen el­se­where), which is defined to mean that for any E>∅ dis­joint from B, B∪E≥C, and for any E>∅ dis­joint from C, C∪E≥B.

(It’s not ob­vi­ous to me that ≈ is in gen­eral an equiv­alence re­la­tion, but it cer­tainly is in the pres­ence of P6‴; Sav­age seems to use this im­plic­itly. Note also that an­other con­se­quence of P6‴ is that for any n there ex­ists a par­ti­tion of S into n al­most-equiv­a­lent parts; such a par­ti­tion is nec­es­sar­ily al­most-uniform.)

How­ever the fol­low­ing stronger ver­sion of P6‴ gets rid of this dis­tinc­tion:

P6′. For any B>C, there ex­ists a par­ti­tion of S, each part D of which satis­fies C∪D<B.

(Ob­serve that P6‴ is just P6′ for C=∅.) Un­der P6′, al­most equiv­alence is equiv­alence, and so nu­mer­i­cal prob­a­bil­ity agrees with qual­i­ta­tive prob­a­bil­ity, and we fi­nally have what we wanted. (So by ear­lier, P6′ im­plies P6″, not just P6‴. In­deed by above it im­plies the ex­is­tence of uniform par­ti­tions into n parts for any n, not just ar­bi­trar­ily large n.)

In ac­tu­al­ity, Sav­age as­sumes an even stronger ax­iom, which is needed to get util­ity and not just prob­a­bil­ity:

P6. For any acts g<h, and any con­se­quence b, there is a par­ti­tion of S such that if g is mod­ified on any one part to be con­stantly b there, we would still have g<h; and if h is mod­ified on any one part to be con­stantly b there, we would also still have g<h.

Ap­ply­ing P6 to wa­gers yields the weaker P6′.

We can now also get con­di­tional prob­a­bil­ity—if P6′ holds, it also holds for the pre­order­ings “≤ given C” for non-null C, and hence we can define P(B|C) to be the prob­a­bil­ity of B un­der the quan­ti­ta­tive prob­a­bil­ity we get cor­re­spond­ing to the qual­i­ta­tive prob­a­bilty “≤ given C”. Us­ing the unique­ness of agree­ing prob­a­bil­ity mea­sures, it’s easy to check that in­deed, P(B|C)=P(B∩C)/​P(C).

## Utility for finite gambles

Now that we have nu­mer­i­cal prob­a­bil­ity, we can talk about finite gam­bles. If we have con­se­quences b1, …, bn, and prob­a­bil­ities λ1, …, λn sum­ming to 1, we can con­sider the gam­ble ∑λibi, rep­re­sented by any ac­tion which yields b1 with prob­a­bil­ity λ1, b2 with prob­a­bil­ity λ2, etc. (And with prob­a­bil­ity 0 does any­thing; we don’t care about events with prob­a­bil­ity 0.) Note that by above such an ac­tion nec­es­sar­ily ex­ists. It can be proven that any two ac­tions rep­re­sent­ing the same gam­ble are equiv­a­lent, and hence we can talk about com­par­ing gam­bles. We can also sen­si­bly talk about mix­ing gam­bles—tak­ing ∑λifi where the fi are finite gam­bles, and the λi are prob­a­bil­ities sum­ming to 1 - in the ob­vi­ous fash­ion.

With these defi­ni­tions, it turns out that Von Neu­mann and Mor­gen­stern’s in­de­pen­dence con­di­tion holds, and, us­ing ax­iom P6, Sav­age shows that the con­ti­nu­ity (i.e. Archimedean) con­di­tion also holds, and hence there is in­deed a util­ity func­tion, a func­tion U:F→R such that for any two finite gam­bles rep­re­sented by f and g re­spec­tively, f≤g if and only if the ex­pected util­ity of the first gam­ble is less than or equal to that of the sec­ond. Fur­ther­more, any two such util­ity func­tions are re­lated via an in­creas­ing af­fine trans­for­ma­tion.

We can also take ex­pected value know­ing that a given event C ob­tains, since we have nu­mer­i­cal prob­a­bil­ity; and in­deed this agrees with the prefer­ence or­der­ing on gam­bles given C.

## Ex­pected util­ity in gen­eral and bound­ed­ness of utility

Fi­nally, Sav­age shows that if we as­sume one more ax­iom, P7, then we have that for any es­sen­tially bounded ac­tions f and g, we have f≤g if and only if the ex­pected util­ity of f is at most that of g. (It is pos­si­ble to define in­te­gra­tion with re­spect to a finitely ad­di­tive mea­sure similarly to how one does with re­spect to a countably ad­di­tive mea­sure; the re­sult is lin­ear and mono­tonic but doesn’t satisfy con­ver­gence prop­er­ties.) Similarly with re­spect to a given event C.

The ax­iom P7 is:

P7. If f and g are acts and B is an event such that f≤g(s) given B for ev­ery s∈B, then f≤g given B. Similarly, if f(s)≤g given B for ev­ery s in B, then f≤g given B.

So this is just an­other var­i­ant on the “sure-thing prin­ci­ple” that I ear­lier la­beled P2c.

Now in fact it turns out as men­tioned above that P7, when taken to­gether with the rest, im­plies that util­ity is bounded, and hence that we do in­deed have that for any f and g, f≤g if and only if the ex­pected util­ity of f is at most that of g! This is due to Peter Fish­burn and post­dates the first edi­tion of Foun­da­tions of Statis­tics, so in there Sav­age sim­ply notes that it would be nice if this worked for f and g not nec­es­sar­ily es­sen­tially bounded (so long as their ex­pected val­ues ex­ist, and al­low­ing them to be ±∞), but that he can’t prove this, and then adds a foot­note giv­ing a refer­ence for bounded util­ity. (Though he does prove us­ing P7 that if you have two acts f and g such that f,g≤b for all con­se­quences b, then f≡g; similarly if f,g≥b for all b. Ac­tu­ally, this is a key lemma in prov­ing that util­ity is bounded; Fish­burn’s proof works by show­ing that if util­ity were un­bounded, you could con­struct two ac­tions that con­tra­dict this.)

Of course, if you re­ally don’t like the con­clu­sion that util­ity is bounded, you could throw out ax­iom 7! It’s pretty in­tu­itive, but it’s not clear that ig­nor­ing it could ac­tu­ally get you Dutch-booked. After all, the first 6 ax­ioms are enough to han­dle finite gam­bles, 7 is only needed for more gen­eral situ­a­tions. So long as your Dutch bookie is limited to finite gam­bles, you don’t need this.

## Ques­tions on fur­ther justification

So now that I’ve laid all this out, here’s the ques­tion I origi­nally meant to ask: To what ex­tent can these ax­ioms be grounded in more ba­sic prin­ci­ples, e.g. Dutch book ar­gu­ments? It seems to me that most of these are too ba­sic for that to ap­ply—Dutch book ar­gu­ments need more work­ing in the back­ground. Still, it seems to me ax­ioms P2, P3, and P4 might plau­si­bly be grounded this way, though I have not yet at­tempted to figure out how. P7 pre­sum­ably can’t, for the rea­sons noted in the pre­vi­ous sec­tion. P1 I as­sume is too ba­sic. P5 ob­vi­ously can’t (if the agent doesn’t care about any­thing, that’s its own prob­lem).

P6 is an Archimedean con­di­tion. Typ­i­cally I’ve seen those (speci­fi­cally Von Neu­mann and Mor­gen­stern’s con­ti­nu­ity con­di­tion) jus­tified on this site with the idea that in­finites­i­mals will never be rele­vant in any prac­ti­cal situ­a­tion—if c has only in­finites­i­mally more util­ity than b, the only case when the dis­tinc­tion would be rele­vant is if the prob­a­bil­ities of ac­com­plish­ing them were ex­actly equal, which is not re­al­is­tic. I’m guess­ing in­finites­i­mal prob­a­bil­ities can prob­a­bly be done away with in a similar man­ner?

Or are these not good ax­ioms in the first place? You all are more fa­mil­iar with these sorts of things than me. Ideas?

• Off topic but amus­ing:

I per­son­ally con­sider it more prob­a­ble that a Repub­li­can pres­i­dent will be elected in 1996 than that it will snow in Chicago some­time in the month of May, 1994.

Leonard Sav­age, Foun­da­tions of Statis­tics, page 27

See this and this.

• Without con­text, I can’t tell whether he was try­ing to say the chances were high, low, ex­traor­di­nar­ily differ­ent, or slightly differ­ent.

Chance of snow 40% and chance of Repub­li­can win 45% satis­fies the quote.

• I would have guessed more like 20% and 45%, but the point was that he was un­lucky, not mis­cal­ibrated.

• Great sum­mary!

P7 looks self-ev­i­dent to me. I’m less com­fortable with the P6. Un­bounded util­ity de­pends on P6 re­quiring a par­ti­tion into an ar­bi­trar­ily large num­ber of parts—is this used in the proof of bounded util­ity? In gen­eral, I don’t think Archi­me­dian ax­ioms are safe given our cur­rent level of un­der­stand­ing of Pas­cal’s mug­ger-like prob­lems.

EDIT: P7 doesn’t look as self-ev­i­dent any­more. Con­sider the St. Peters­burg lot­tery. Any par­tic­u­lar pay­out from buy­ing a ticket for \$2 is worse than the ex­pected value of buy­ing a ticket for \$1, but ob­vi­ously it is prefer­able to buy the ticket for \$1. Again, I don’t think we can judge this given our cur­rent level of un­der­stand­ing of Pas­cal’s mug­ger-like prob­lems.

• Un­bounded util­ity de­pends on P6 re­quiring a par­ti­tion into an ar­bi­trar­ily large num­ber of parts

Numer­i­cal util­ity at all re­lies on this, so I’m not sure what you mean here.

• Any par­tic­u­lar finite gam­ble re­quires a finite num­ber of parts. A St. Peters­burg lot­tery re­quires an in­finite num­ber of parts.

• Con­struct­ing a St. Peters­burg lot­tery re­lies on this, but I don’t see why that means “un­bounded util­ity” de­pends on it; un­bounded util­ity isn’t even a con­se­quence of these ax­ioms, in­deed the op­po­site is so.

In any case I don’t even see how you state the no­tion of bounded (or un­bounded) util­ity with­out nu­mer­i­cal util­ity; we don’t mean bounded in the sense of hav­ing in­ter­nally a max­i­mum and min­i­mum, we mean cor­re­spond­ing to a bounded set of real num­bers. No in­ter­nal max­i­mum or min­i­mum is needed; how do you state that with­out set­ting up the cor­re­spon­dence? And to get real num­bers you need an Archimedean as­sump­tion of some sort.

• Con­struct­ing a St. Peters­burg lot­tery re­lies on this, but I don’t see why that means “un­bounded util­ity” de­pends on it

If there are only a finite num­ber of op­tions, util­ity can only be un­bounded if at least one of the op­tions has the pos­si­bil­ity of util­ities with ar­bi­trar­ily large ab­solute value. It is hard to deal with an in­finite num­ber of op­tions, but it might be pos­si­ble de­pend­ing on how that works with the other ax­ioms, but this is ir­rele­vant be­cause P6 was not con­nected to the proof of bounded util­ity.

un­bounded util­ity isn’t even a con­se­quence of these ax­ioms, in­deed the op­po­site is so.

That is why I was ini­tially con­cerned about P6.

to get real num­bers you need an Archimedean as­sump­tion of some sort.

I don’t think real num­bers are the best field to use for util­ity be­cause of Pas­cal’s mug­ging, some of the stuff de­scribed here, and this pa­per.

• I don’t think real num­bers are the best field to use for util­ity be­cause of Pas­cal’s mug­ging...

Okay, what field do you think works for util­ity that’s bet­ter than real num­bers?

The ob­vi­ous can­di­dates are sur­real num­bers or non-stan­dard re­als. Wikipe­dia says that the former doesn’t have omega plus 1, where omega is the num­ber of or­di­nary in­te­gers, but IIRC the lat­ter does, so I’d try the lat­ter first. I do not feel con­fi­dent that it solves the prob­lem, though.

• The sur­re­als do have ω+1 - see the ”..And Beyond” sec­tion of the wiki page. If this is con­tra­dicted any­where else on the page, tell me where and I’ll cor­rect it.

The sur­re­als are prob­a­bly the best to use for this, though they’ll need to emerge nat­u­rally from some ax­ioms, not just be pro­claimed cor­rect. From WP: “In a rigor­ous set the­o­retic sense, the sur­real num­bers are the largest pos­si­ble or­dered field; all other or­dered fields, such as the ra­tio­nals, the re­als, the ra­tio­nal func­tions, the Levi-Civita field, the su­per­real num­bers, and the hy­per­real num­bers, are sub­fields of the sur­re­als.”, so even if the sur­re­als are not nec­es­sary, they will prob­a­bly be suffi­cient.

• Con­way used sur­real num­bers for go util­ities. I dis­cussed the virtues of sur­real util­ities here.

• Con­way used sur­real num­bers for go util­ities.

Those aren’t re­ally util­ities be­cause they aren’t made for tak­ing ex­pec­ta­tions, though any to­tally or­dered set can be em­bed­ded in the sur­re­als, so they are perfect for choos­ing from pos­si­bly-in­finite sets of cer­tain out­comes.

• Check­ing with the defi­ni­tion of util­ity ex­pec­ta­tions do not seem crit­i­cal.

Con­way’s move val­ues may use­fully be seen as util­ities as­so­ci­ated with pos­si­ble moves.

• Okay, that is not the kind of util­ity dis­cussed in the post, but it is still a util­ity.

• I don’t think real num­bers are the best field to use for util­ity be­cause of Pas­cal’s mugging

Are we agreed that bounded real-val­ued (or ra­tio­nal-val­ued) util­ity gets rid of Pas­cal’s mug­ging?

• Yes. Bounded util­ity solves tons of prob­lems, it just doesn’t, AFAICT, de­scribe my prefer­ences.

• The bound would also have to be sub­stan­tially less than 3^^^^3.

• The bound would also have to be sub­stan­tially less than 3^^^^3.

As you know, if there is a bound, with­out loss of gen­er­al­ity we can say all util­ities go from 0 to 1.

Re­pairing your claim to take that into ac­count, if you’re be­ing mugged for \$5, and the plau­si­bil­ity of the mug­ger’s claim is 1/​X where X is large, and the util­ity the mug­ger promises you is about 1, then you get mugged if your util­ity for \$5 is less than 1/​X, roughly. So I agree that there are util­ity func­tions that would re­sult in the mug­ging, but they don’t ap­pear es­pe­cially sim­ple or es­pe­cially con­sis­tent with ob­served hu­man be­hav­ior, so the mug­ging doesn’t seem likely.

Now, if the pro­gram­ming lan­guage used to com­pute the prior on the util­ity func­tions has a spe­cial in­struc­tion that loads 3^^^^3 into an ac­cu­mu­la­tor with one byte, maybe the mug­ging will look likely. I don’t see any way around that.

• This strikes me as very similar to Fish­burn’s proof that P7 im­plies util­ity is bounded. (Maybe it’s es­sen­tially the same? Need to read more care­fully; point is, his proof also works by com­par­ing two St. Peters­burg lot­ter­ies.). Of course, we only get the prob­lem if we imag­ine that the St. Peters­burg lot­tery is for util­ity, rather than for money with de­creas­ing marginal (and in this the­ory, ul­ti­mately bounded) util­ity...

• Yes, this is Fish­burn’s proof, just as a modus tol­lens rather than a modus po­nens.

• Here is a small coun­terex­am­ple to P2. States = { Red, Green, Blue }. Out­comes = { Win, Lose }. Since there are only two out­comes, we can write ac­tions as the sub­set of states that Win. My prefer­ences are: {} < { Green } = { Blue } < { Red } < { Red,Green } = { Red,Blue } < { Green,Blue } < { Red,Green,Blue }

This con­tra­dicts P2 be­cause { Green } < { Red } but { Red,Blue } < { Green,Blue }.

Here is a situ­a­tion where this may ap­ply: There is an urn with 300 balls. 100 of them are red. The rest are ei­ther green or blue. You draw a ball from this urn.

So Red rep­re­sents definite prob­a­bil­ity 13, while Green and Blue are un­knowns. Depend­ing on con­text, it sure looks like these are the right prefer­ences to have. This is called the Ells­berg para­dox.

Even if you in­sist this is some­how wrong, it is not go­ing to be Dutch booked. Even if we ex­tend the state space to in­clude ar­bi­trar­ily many fair coins (as P6 may re­quire), and even if we ex­tend the re­sult space to al­low for mul­ti­ple draws or other pay­outs, we can define var­i­ous con­sis­tent ob­jec­tive func­tions (that are not ex­pected util­ity) which show this be­havi­our.

• This can be Dutch booked. As de­scribed on this Wikipe­dia page, you are asked to set prices for promises to pay \$1 con­di­tional on events and an ad­ver­sary chooses whether to buy these from you or sell them to you at that price. If Price( {Green} ) + Price( {Red, Blue} ) ≠ \$1, the ad­ver­sary can en­sure you lose money, and the same holds for {Red} and {Green, Blue}. How­ever, this is in­com­pat­i­ble with {Green} < {Red} and {Red, Blue} < {Green, Blue}.

• I’m aware of this. In this case my “op­er­a­tional sub­jec­tive prob­a­bil­ity”, as de­scribed on that same page, is nec­es­sar­ily not con­sis­tent with my prefer­ences.

To put this an­other way, sup­pose that I do put the same price on Red, Green, and Blue when faced with that par­tic­u­lar choice (i.e. know­ing that I will have to buy or sell at the price I name). Why does it fol­low that I should not choose Red over Green in other cir­cum­stances? Or more to the point, how can I be Dutch booked if I then choose Red over Green in other cir­cum­stances?

• You’re com­pletely right. Dutch book ar­gu­ments prove al­most noth­ing in­ter­est­ing. Your prefer­ence is ra­tio­nal.

• I re­al­ize it has been a while, but can you an­swer some ques­tion about your prefer­ences?

1. In the hy­po­thet­i­cal world where all prob­a­bil­ities that you were asked to bet on were known, would you be a Bayesian?

2. How sta­ble is your prefer­ence for Knigh­tian risk over un­cer­tainty? In other words, how much more would win­ning on green have to be worth for you to pre­fer it to red (feel free to in­ter­pret this as much as is nec­es­sary to make it pre­cise)?

• I’m not re­ally clear on the first ques­tion. But since the sec­ond ques­tion asks how much some­thing is worth, I take it the first ques­tion is ask­ing about a util­ity func­tion. Do I be­have as if I were max­imis­ing ex­pected util­ity, ie. obey the VNM pos­tu­lates as far as known prob­a­bil­ities go? A yes an­swer then makes the sec­ond ques­tion go some­thing like this: given a bet on red whose pay­off has util­ity 1, and a bet on green whose pay­off has util­ity N, what is the crit­i­cal N where I am in­differ­ent be­tween the two?

For ev­ery N>1, there are de­ci­sion pro­ce­dures for which the an­swer to the first is yes, the an­swer to the sec­ond is N, and which dis­plays the Ells­berg-para­dox­i­cal be­havi­our. Ells­berg him­self had pro­posed one. I did have a thought on how one of these could be well illus­trated in not too tech­ni­cal terms, and maybe it would be ap­pro­pri­ate to post it here, but I’d have to get around to writ­ing it up. In the mean­time I can also illus­trate in­ter­ac­tively: 1) yes, 2) you can give me an N>1 and I’ll go with it.

• Okay. Let N = 2 for sim­plic­ity and let \$ de­note utilons like you would use for de­ci­sions in­volv­ing just risk and no un­cer­tainty.

P(Red) = 13, so you are in­differ­ent be­tween \$-1 un­con­di­tion­ally and (\$-3 if Red, \$0 oth­er­wise). You are also in­differ­ent be­tween \$-3 iff Red and \$-3N (= \$-6) iff Green (or equiv­a­lently Blue). By tran­si­tivity, you are there­fore in­differ­ent be­tween \$-1 un­con­di­tion­ally and \$-6 iff Green. Also, you are ob­vi­ously in­differ­ent be­tween \$4 un­con­di­tion­ally and \$6 iff (die ≥ 3).

I would think that you would al­low a `pure risk’ bet to be added to an un­cor­re­lated un­cer­tainty bet—cor­rect me if that is wrong. In that case, you would be in­differ­ent be­tween \$3 un­con­di­tion­ally and \$6 iff (die ≥ 3) - \$6 iff Green, but you would not be in­differ­ent be­tween \$3 un­con­di­tion­ally and \$6 iff (Green ∨ Blue) - \$6 iff Green, which is the same as \$6 iff Blue, which you value at \$1.

This seems like a strange set of prefer­ences to have, es­pe­cially since both (die ≥ 3) and (Green ∨ Blue) are both pure risk, but it could be cor­rect.

• That’s right.

I take it what is strange is that I could be in­differ­ent be­tween A and B, but not in­differ­ent be­tween A+C and B+C.

For a sim­pler ex­am­ple let’s add a fair coin (and again let N=2). I think \$1 iff Green is as good as \$1 iff (Heads and Red), but \$1 iff (Green or Blue) is bet­ter than \$1 iff ((Heads and Red) or Blue). (All pay­offs are the same, so we can ac­tu­ally for­get the util­ity func­tion.) So again: A is as good as B, but A+C is bet­ter than B+C. Is this the same strangeness?

• Not quite.

I think that the situ­a­tion that you de­scribed in less strange then the one that I de­scribed. In yours, you are com­bin­ing two ‘un­known prob­a­bil­ities’ to pro­duce ‘known prob­a­bil­ities’.

I find my situ­a­tion stranger be­cause the only differ­ence be­tween a choice that you are in­differ­ent about and one that you do have a prefer­ence about is the sub­sti­tu­tion of (Green ∨ Blue) for (die ≥ 3). Both of these have clear prob­a­bil­ities and are equiv­a­lent in al­most any situ­a­tion. To put this an­other way, you would be in­differ­ent be­tween \$3 un­con­di­tion­ally and \$6 iff (Green ∨ Blue) - \$6 iff Green if the two bets on coloured balls were taken to re­fer to differ­ent draws from the (same) urn. This looks a lot like risk aver­sion, and men­tally feels like risk aver­sion to me, but it is not risk aver­sion since you would not make these bets if all prob­a­bil­ities were known to be 13.

• Ohh, I see. Well done! Yes, I lose.

If I had a do-over on my last an­swer, I would not agree that \$-6 iff Green is worth \$-1. It’s \$-3.

But, given that I can’t seem to get it straight, I have to ad­mit I haven’t given LW read­ers much rea­son to be­lieve that I do know what I’m talk­ing about here, and at least one good rea­son to be­lieve that I don’t.

In case any­one’s still hu­mour­ing me, if an event has un­known prob­a­bil­ity, so does its nega­tion; I pre­fer a bet on Red to a bet on Green, but I also pre­fer a bet against Red to a bet against Green. This is ac­tu­ally the same thing as com­bin­ing two un­known prob­a­bil­ities to pro­duce a known one: both Green and (not Green) are un­known, but (Green or not Green) is known to be 100%.

\$-6 iff Green is ac­tu­ally iden­ti­cal to \$-6 + \$6 iff (not Green). (not Green) is iden­ti­cal to (Red or Blue), and Red is a known prob­a­bil­ity of 13. \$6 iff Blue is as good as \$6 iff Green, which, for N=2, is worth \$1. \$-6 iff Green is ac­tu­ally worth \$-3, rather than \$-1.

• Hmm. Now we have that \$6 iff Green is worth \$1 and \$-6 iff Green is worth \$-3, but \$6-6 = \$0 iff Green is not equiv­a­lent to \$1-3 = \$-2.

In par­tic­u­lar, if you have \$6 con­di­tional on Green, you will trade that to me for \$1. Then, we agree that if Green oc­curs, I will give you \$6 and you will give me \$6, since this adds up to no change. How­ever, then I agree to waive your hav­ing to pay me the \$6 back if you give me \$3. You now have your origi­nal \$6 iff Green back, but are down an un­con­di­tional \$2, an in­dis­putable net loss.

Also, this made me re­al­ize that I could have just added an un­con­di­tional \$6 in my pre­vi­ous ex­am­ple rather than com­pli­cat­ing things by mak­ing the \$6 first con­di­tional on (die ≥ 3) and then on (Green ∨ Blue). That would be much clearer.

• I pay you \$1 for the waiver, not \$3, so I am down \$0.

In state A, I have \$6 iff Green, that is worth \$1.

In state B, I have no bet, that is worth \$0.

In state C, I have \$-6 iff Green, that is worth \$-3.

To go from A to B I would want \$1. I will go from B to B for free. To go from B to A I would pay \$1. State C does not oc­cur in this ex­am­ple.

• Wouldn’t you then pre­fer \$0 to \$1 iff (Green ∧ Heads) - \$1 iff (Green ∧ Tails)?

• In­differ­ent. This is a known bet.

Ear­lier I said \$-6 iff Green is iden­ti­cal to \$-6 + \$6 iff (not Green), then I de­com­posed (not Green) into (Red or Blue).

Similarly, I say this ex­am­ple is iden­ti­cal to \$-1 + \$2 iff (Green and Heads) + \$1 iff (not Green), then I de­com­pose (not Green) into (Red or (Blue and Heads) or (Blue and Tails)).

\$1 iff ((Green and Heads) or (Blue and Heads)) is a known bet. So is \$1 iff ((Green and Heads) or (Blue and Tails)). There are no lef­tover un­knowns.

• Look at it an­other way.

Con­sider \$6 iff (Green ∧ Heads) - \$6 iff (Green ∧ Tails) + \$4 iff Tails. This bet is equiv­a­lent to \$0 + \$2 = \$2, so you would be will­ing to pay \$2 for this bet.

If the coin comes out heads, the bet will be­come \$6 iff Green, with a value of \$1. If the coin comes out tails, the bet will be­come \$4 - \$6 iff Green = \$4 - \$3 = \$1. There­fore, as­sum­ing that the out­come of the coin is re­vealed first, you will, with cer­tainty, re­gret hav­ing payed any amount over \$1 for this bet. This is not a ra­tio­nal de­ci­sion pro­ce­dure.

Con­sider \$6 iff ((Green and Heads) or (Blue and Tails)). This is a known bet (1/​3) so worth \$2. But if the coin is flipped first, and comes up Heads, it be­comes \$6 iff Green, and if it comes up tails, it be­comes \$6 iff Blue, in ei­ther case worth \$1. And that’s silly.

Is that the same as your ob­jec­tion?

• Yes, that is equiv­a­lent.

• Very well done! I con­cede. Now that I see it, this is ac­tu­ally quite gen­eral.

My point wasn’t just that I had a de­ci­sion pro­ce­dure, but an ex­pla­na­tion for it. And it seems that, no mat­ter what, I would have to explain

A) Why ((Green and Heads) or (Blue and Tails)) is not a known bet, equiprob­a­ble with Red, or

B) Why I change my mind about the urn af­ter a coin flip.

Ear­lier, some oth­ers sug­gested non-causal/​mag­i­cal ex­pla­na­tions. Th­ese are still in­tact. If the coin is sub­ject to the Force, then (A), and if not, then (B). I re­jected that sort of thing. I thought I had an in­tu­itive non-mag­i­cal ex­pla­na­tion. But, it doesn’t ex­plain (B). So, FAIL.

• You are also in­differ­ent be­tween \$-3 iff Red and \$-3N (= \$-6) iff Green (or equiv­a­lently Blue).

Isn’t this −1 and −4, not −1 and −1? I think you want −3/​N = −1.5.

• I’m not quite sure what you’re first sen­tence is refer­ring to, but fool prefers risk to un­cer­tainty. From his post:

given a bet on red whose pay­off has util­ity 1, and a bet on green whose pay­off has util­ity N, what is the crit­i­cal N where I am in­differ­ent be­tween the two?

. . .

you can give me an N>1 and I’ll go with it.

• The prob­a­bil­ity of draw­ing a blue ball is 13, as is that of draw­ing a green ball.

I’d in­sist that my prefer­ences are {} < {Red} = {Green} = {Blue} < {Red, Green} = {Red, Blue} = {Blue, Green} < {Red, Green, Blue}. There’s no rea­son to pre­fer Red to Green: the pos­si­bil­ity of there be­ing few Green balls is coun­ter­bal­anced by the pos­si­bil­ity of there be­ing close to 200 of them.

ETA: Well, there are situ­a­tions in which your prefer­ence or­der is a good idea, such as when there is an ad­ver­sary chang­ing the colours of the balls in or­der to make you lose. They can’t touch red with­out be­ing found out, they can only change the rel­a­tive num­bers of Blue and Green. But in that case, choos­ing the colour that makes you win isn’t the only effect of an ac­tion—it also af­fects the colours of the balls, so you need to take that into ac­count.

So the true state space would be {Ball Drawn = i} for each value of i in [1..300]. The con­tents of the urn are cho­sen by the ad­ver­sary, to be {Red = 100, Green = n, Blue = 200 - n} for n in [0..200]. When you take the ac­tion {Green}, the ad­ver­sary sets n to 0, so that ac­tion maps all {Ball Drawn = i} to {Lose}. And so on. Any­way, I don’t think this is a counter-ex­am­ple for that rea­son: you’re not just de­cid­ing the win­ning set, you’re af­fect­ing the balls in the urn.

• I see. No, that’s not the kind of ad­ver­sary I had in mind when I said that.

The out­comes are { Win, Lose }. I won’t list all 16 ac­tions, just to say that by P1 you must rank them all. In par­tic­u­lar, you must rank the ac­tions X = { (A,Heads), (A,Tails) }, Y = { (B,Heads), (B,Tails) }, U = { (A,Heads), (B,Tails) }, and V = { (A,Tails), (B,Heads) }. Again I’m writ­ing ac­tions as events, since there are only two out­comes.

To mo­ti­vate this, con­sider the game where you and your (non-psy­chic, non-telekinetic etc) ad­ver­sary are to si­mul­ta­neously re­veal A or B; if you pick the same, you win, if not, your ad­ver­sary wins. You are at a point in time where your ad­ver­sary has writ­ten “A” or “B” on a piece of pa­per face down, and you have not. You have also flipped a coin, which you have not looked at (and are not re­quired to look at, or show your ad­ver­sary). There­fore the above four states do in­deed cap­ture all the state in­for­ma­tion, and the four ac­tions I’m singling out cor­re­spond to: you ig­nore the coin and write “A”, or ig­nore and write “B”; or else you de­cide to base what you write on the flip of the coin, one way, or the other. As I say, by P1, you must rank these.

Me, I’ll take the coin, thanks. I rank X=Y<U=V. I just vi­o­lated P2. Am I re­ally ir­ra­tional?

And even if you think I am, one of the ques­tions origi­nally asked was how things could be jus­tified by Dutch book ar­gu­ments or the like. So the Ells­berg para­dox and var­i­ants is still rele­vant to that ques­tion, nor­ma­tive ar­gu­ments aside.

• So P2 doesn’t ap­ply in this ex­am­ple. Why not? Well, the rea­son you pre­fer to use the coin is be­cause you sus­pect the ad­ver­sary to be some kind of pre­dic­tor, who is slightly more likely to write down a B if you just write down A (ig­nor­ing the coin). That’s not some­thing cap­tured by the state in­for­ma­tion here. You clearly don’t think that (A,Tails) is si­mul­ta­neously more and less likely than (B,Tails), just that the ac­tion you choose can have some in­fluence on the out­come. I think it might be that if you ex­panded the state space to in­clude a pre­dic­tor with all the pos­si­bil­ities of what it could do, P2 would hold again.

• That isn’t the is­sue. At the point in time I am talk­ing about, the ad­ver­sary has already made his non-re­vealed choice (and he is not telekinetic). There is no other state.

Tails ver­sus Heads is ob­jec­tively 1:1 re­sult­ing from the toss of a fair coin, whereas A ver­sus B has an un­cer­tainty that re­sults from my ad­ver­sary’s choice. I may not have rea­son to think that he will choose A over B, so I can still call it 1:1, but there is still a qual­i­ta­tive dis­tinc­tion be­tween un­cer­tainty and ran­dom­ness, or am­bi­guity and risk, or ob­jec­tive and sub­jec­tive prob­a­bil­ity, or what­ever you want to call it, and it is not ir­ra­tional to take it into ac­count.

• I have to ad­mit, this or­der­ing seem rea­son­able… for the rea­sons nshep­perd sug­gests. Just say­ing that he’s not tele­pathic isn’t enough to say he’s not any sort of pre­dic­tor—af­ter all, I’m a hu­man, I’m bad at ran­dom­iz­ing, maybe he’s played this game be­fore and com­piled statis­tics. Or he just has a good idea how peope tend to think about this sort of thing. So I’m not sure you’re cor­rect in your con­clu­sion that this isn’t the is­sue.

• Then I claim that a non-psy­chic pre­dic­tor, no mat­ter how good, is very differ­ent from a psy­chic.

The pow­ers of a non-psy­chic pre­dic­tor are en­tirely nat­u­ral and causal. Once he has writ­ten down his hid­den choice, then he be­comes ir­rele­vant. If this isn’t clear, then we can make an anal­ogy with the urn ex­am­ple. After the ball is drawn but be­fore its colour is re­vealed, the con­tents of the urn are ir­rele­vant. As I pointed out, the urn could even be de­stroyed be­fore the colour of the ball is re­vealed, so that the ball’s colour truly is the only state. Similarly, af­ter the pre­dic­tor writes his choice but be­fore it is re­vealed, he might ac­ci­den­tally be­head him­self while shav­ing.

Now of course your be­liefs about the tal­ents of the late pre­dic­tor might in­form your be­liefs about his hid­den choice. But that’s the only way they can pos­si­bly be rele­veant. The coin and the pre­dic­tor’s hid­den choice on the pa­per re­ally are the only states of the world now, and your own choice is free and has no effect on the state. So, if you dis­play a strict prefer­ence for the coin, then your un­cer­tainty is still not cap­tured by sub­jec­tive prob­a­bil­ity. You still vi­o­late P2.

To get around this, it seems you would have to posit some resi­d­ual en­tan­gle­ment be­tween your choice and the ex­ter­nal state. To me this sounds like a strange thing to ar­gue. But I sup­pose you could say your cog­ni­tion is flawed in a way that is in­visi­ble to you, yet was visi­ble to the clever but de­parted pre­dic­tor. So, you might ar­gue that, even though there is no ac­tual psy­chic effect, your choice is not re­ally free, and you have to take into ac­count your in­ter­nal­ities in ad­di­tion to the ex­ter­nal states.

My ques­tion then would be, does this en­tan­gle­ment pre­vent you from hav­ing a to­tal or­der­ing over all maps from states (in­ter­nal and ex­ter­nal) to out­comes? If yes, then P1 is vi­o­lated. If no, then can I not just ask you about the or­der­ing of the maps which only de­pend on the ex­ter­nal states, and don’t we just wind up where we were?

• Well, that sounds ir­ra­tional. Why would you pay to switch from X to U, a change that makes no differ­ence to the prob­a­bil­ity of you win­ning?

• Be­cause there might be more to un­cer­tainty than sub­jec­tive prob­a­bil­ity.

Let’s take a step back.

Yes, if you as­sume that un­cer­tainty is en­tirely cap­tured by sub­jec­tive prob­a­bil­ity, then you’re com­pletely right. But if you as­sume that, then you wouldn’t need the Sav­age ax­ioms in the first place. The Sav­age ax­ioms are one way of jus­tify­ing this as­sump­tion (as well as ex­pected util­ity). So, what jus­tifies the Sav­age ax­ioms?

One sug­ges­tion the origi­nal poster made was to use Dutch book ar­gu­ments, or the like. But now here’s a situ­a­tion where there does seem to be a qual­i­ta­tive differ­ence be­tween a ran­dom event and an un­cer­tain event, where there is a “rea­son­able” thing to do that vi­o­lates P2, and where noth­ing like a Dutch book ar­gu­ment seems to be available to show that it is sub­op­ti­mal.

I hope that clar­ifies the con­text.

EDIT: I put “rea­son­able” in scare-quotes. It is rea­son­able, and I am pre­pared to defend that. But it isn’t nec­es­sary to be­lieve it is rea­son­able to see why this ex­am­ple mat­ters in this con­text.

• Depend­ing on con­text, it sure looks like these are the right prefer­ences to have.

Sorry, but that is highly nonob­vi­ous! Why do you claim that?

Note BTW that your state space is wrong in that it doesn’t in­clude differ­ing states of how many green balls there are, but I as­sume you’re just re­strict­ing your or­der­ing to those ac­tions which de­pend only on the color of the ball (since other ac­tions would not be pos­si­ble in this con­text).

con­sis­tent ob­jec­tive functions

“Con­sis­tent” in what sense?

• As to the state space, as you say, we could ex­pand the state space and re­strict the ac­tions as you sug­gest, and it wouldn’t mat­ter. But if you pre­fer we could draw a ball from the urn, set it aside, and de­stroy the urn be­fore re­veal­ing the colour of the ball. At that point colour re­ally is the only state, as I un­der­stand the word “state”.

As to why it looks right: red is a known prob­a­bil­ity, green and blue aren’t. It seems quite rea­son­able to choose the known risk over the un­known one. Espe­cially un­der ad­ver­sar­ial con­di­tions. This is some­times called am­bi­guity aver­sion or un­cer­tainty aver­sion, which is sort of or­thog­o­nal to risk aver­sion.

As for con­sis­tency, if you’re max­imis­ing a sin­gle func­tion, you’re not go­ing to end up in a lower state via up­ward-mov­ing steps.

Beyond that I can point to liter­a­ture on the Ells­berg para­dox. The wikipe­dia page has some info and some re­sources.

• FWIW, it doesn’t seem right to me to men­tion ad­ver­sar­ial situ­a­tions when that’s not given in the prob­lem. Prefer­ring safer bets seems right in the pres­ence of an ad­ver­sary, but this ex­am­ple isn’t dis­play­ing that rea­son­ing.

• FWIW, agreed, “not given in the prob­lem”. My bad.

• Bet­ting gen­er­ally in­cludes an ad­ver­sary who wants you to lose money so they win it. Pos­si­bly in psy­chol­ogy ex­per­i­ments, bet­ting against the ex­per­i­menter, you are more likely to have a bet­ting part­ner who is happy to lose money on bets. And there was a case of a bet hap­pen­ing on Less Wrong re­cently where the per­son offer­ing the bet had an­other mo­ti­va­tion, demon­strat­ing con­fi­dence in their sus­pi­cion. But gen­er­ally, ig­nor­ing the pos­si­bil­ity of some­one want­ing to win money off you when they offer you a bet is a bad idea.

Now bet­ting is sup­posed to be a metaphor for op­tions with pos­si­bly un­known re­sults. In which case some­times you still need to ac­count for the pos­si­bil­ity that the op­tions were made available by an ad­ver­sary who wants you to choose badly, but less of­ten. And you also should ac­count for the pos­si­bil­ity that they were from other peo­ple who wanted you to choose well, or that the op­tions were not de­ter­mined by any in­tel­li­gent be­ing or pro­cess try­ing to pre­dict your choices, so you don’t need to ac­count for an an­ti­cor­re­la­tion be­tween your choice and the best choice. Ex­cept for your own bi­ases.

• Ex­cel­lent sum­mary. Sav­age’s found­ing of statis­tics is nice be­cause it only as­sumes that agents have to make choices be­tween ac­tions, mak­ing no as­sump­tions about whether they have to have be­liefs or goals. This is im­por­tant be­cause agents in gen­eral don’t have to use be­liefs or goals, but they do all have to chose ac­tions.

Thanks for the info about bound­ed­ness, I didn’t no­tice that on my quick skim through the book.

• This is im­por­tant be­cause agents in gen­eral don’t have to use be­liefs, but they do all have to have goals.

I think you mean that agents don’t have to use be­liefs or goals, but they do all have to choose be­tween ac­tions.

If you re­ally meant what you said, then you drew some deep bizarre coun­ter­in­tu­itive con­clu­sion there that I can’t un­der­stand, and I’d re­ally like to see an ar­gu­ment for it.

• Yep, my mis­take. Fixed.

• Yeah, ob­vi­ously in the 1954 edi­tion he didn’t know that; in the 1972 edi­tion, he leaves all the ob­so­lete dis­cus­sion in and just adds a foot­note say­ing that FIsh­burn proved bound­ed­ness and giv­ing a refer­ence! Had to look that up sep­a­rately. Didn’t no­tice it ei­ther un­til late in writ­ing this.

For­tu­nately (since I’m away from uni­ver­sity right now) I found a PDF of Fish­burn’s book on­line: http://​​oai.dtic.mil/​​oai/​​oai?verb=getRecord&meta­dataPre­fix=html&iden­ti­fier=AD0708563

• Peter Wakker ap­par­ently thinks he found a way to have un­bounded util­ities and obey most of Sav­age’s ax­ioms. See Un­bounded util­ity for Sav­age’s “Foun­da­tions of Statis­tics,” and other mod­els. I’ll say more if and when I un­der­stand that pa­per.

• I don’t think P2 can be jus­tified by Dutch Book type ar­gu­ments. I don’t think it can be jus­tified, as a ra­tio­nal re­quire­ment of choice, at all. My reser­va­tions are similar to Ed­ward McClen­nen’s in “Sure Thing Doubts”.

• So you would ar­gue that, know­ing a fact, your prefer­ences can de­pend on what would have hap­pened had that fact been false?

• Right. (Not my prefer­ence nec­es­sar­ily, but a ra­tio­nal per­son’s.) The facts in ques­tion in­clude past ac­tions, which can form the ba­sis of re­grets. The value of an event can de­pend on its his­tor­i­cal con­text—that doesn’t seem un­rea­son­able.

• Would we be able to write the out­comes as full his­to­ries?

• I don’t see why not. How­ever, I haven’t seen many (any?) de­ci­sion the­ory treat­ments that do so.

• This is very nice.

One point I find less than perfectly con­vinc­ing: the mo­ti­va­tion of the “to­tal” part of P1 by say­ing that if our pre­order were par­tial then we’d have two differ­ent kinds of in­differ­ence.

First off, I don’t see any­thing bad about that in terms of math­e­mat­i­cal el­e­gance. Con­sider, e.g., Con­way’s beau­tiful the­ory of num­bers and (two player perfect-in­for­ma­tion) games, in which the former turn out to be a spe­cial case of the lat­ter. When you ex­tend the ⇐ re­la­tion on num­bers to games, you get a par­tial pre­order with, yes, two kinds of “in­differ­ence”. One means “these two games are ba­si­cally the same game; they are in­ter­change­able in al­most all con­texts”. The other means “these are quite differ­ent games, but nei­ther is un­am­bigu­ously a bet­ter game to find your­self play­ing”.

This sort of thing also seems em­i­nently plau­si­ble to me on (so to speak) psy­cholog­i­cal grounds. Real agents do have mul­ti­ple kinds of in­differ­ence. Some­times two situ­a­tions just don’t differ in any way we care about. Some­times they differ a great deal but nei­ther seems clearly prefer­able.

It would prob­a­bly be much harder to ex­tract von Neu­mann /​ Mor­gen­stern from a ver­sion of the ax­ioms with P1 weak­ened to per­mit non-to­tal­ity. But I won­der whether what you would get (per­haps with some other strength­en­ings some­where) might end up be­ing a bet­ter match for real agents’ real prefer­ences.

(That would not nec­es­sar­ily be a good thing; per­haps our ex­pe­rience of in­ter­nal con­flicts from mul­ti­ple in­com­men­su­rable-feel­ing val­ues merely in­di­cates a sub­op­ti­mal­ity in our think­ing. After all, agents do have to de­cide what to do in any given situ­a­tion.)

• I ba­si­cally agree with you on this. Sav­age doesn’t seem to ac­tu­ally jus­tify to­tal­ity much; that was my own thought as I was writ­ing this. The real ques­tion, I sup­pose, is not “are there two fla­vors of in­differ­ence” but “is in­differ­ence tran­si­tive”, since that’s equiv­a­lent to to­tal­ity. I didn’t bother talk­ing about to­tal­ity any fur­ther be­cause, while I’m not en­tirely com­fortable with it my­self, it seems to be a stan­dard as­sump­tion here.

I’ll add a note to the post about how to­tal­ity can be con­sid­ered as tran­si­tivity of in­differ­ence.

• Yes, that’s a good way of look­ing at it.

If we (1) look at the way our prefer­ences ac­tu­ally are and (2) con­sider “aargh, con­flict of in­com­men­su­rable val­ues, can’t de­cide” to be a kind of in­differ­ence, then in­differ­ence cer­tainly isn’t tran­si­tive. But, again, maybe we’d do bet­ter to con­sider ideal­ized agents that don’t have such con­fu­sions.

• In par­tic­u­lar be­cause agents which do have such con­fu­sions should leave money on the table—they are in­ca­pable of dutch-book­ing peo­ple who can be dutch-booked.

• How so? (It looks to me as though the abil­ity to dutch-book some­one dutch-book-able doesn’t de­pend at all on one’s value sys­tem. In par­tic­u­lar, the in­di­vi­d­ual trans­ac­tions that go to make up the d.b. don’t need to be of pos­i­tive util­ity on their own, be­cause the dutch-book-er knows that the dutch-book-ee is go­ing to be will­ing to con­tinue through to the end of the pro­cess. I think. What am I miss­ing?)

• Hmm. That’s not quite the right de­scrip­tion of the illogic but some­thing very odd is go­ing on:

Sup­pose I find A and B in­com­pa­rable and B and C in­com­pa­rable but A is prefer­able to C.

Joe is will­ing to trade C=>B and B=>A.

I trade C into B know­ing that I will even­tu­ally get A.

Then, I re­fuse to trade B to A!

• 8 Aug 2013 19:45 UTC
1 point

I don’t think tran­si­tivity is a rea­son­able as­sump­tion.

Sup­pose an agent is com­posed of sim­pler sub­mod­ules—this, to a very rough ap­prox­i­ma­tion, is how ac­tual brains seem to func­tion—and its ex­pressed prefer­ences (i.e. ac­tions) are as­sem­bled by pol­ling its sub­mod­ules.

Bam, vot­ing para­dox. Tran­si­tivity is out.

• Neu­ral sig­nals rep­re­sent things car­di­nally rather than or­di­nally, so those vot­ing para­doxes prob­a­bly won’t ap­ply.

Even con­di­tional on hu­mans not hav­ing tran­si­tive prefer­ences even in an ap­prox­i­mate sense, I find it likely that it would be use­ful to come up with some ‘transativiza­tion’ of hu­man prefer­ences.

Agreed that there’s a good chance that game-the­o­retic rea­son­ing about in­ter­act­ing sub­mod­ules will be im­por­tant for clar­ify­ing the struc­ture of hu­man prefer­ences.

• Neu­ral sig­nals rep­re­sent things car­di­nally rather than ordinally

I’m not sure what you mean by this. In the gen­eral case, re­s­olu­tion of sig­nals is highly non­lin­ear, i.e. vastly more com­pli­cated than any sim­ple or­di­nal or weighted rank­ing method. Sig­nals at synapses are nearly digi­tal, though: to first or­der, a synapse is ei­ther firing or it isn’t. Sig­nals along in­di­vi­d­ual nerves are also digi­tal-ish—bursts of high-fre­quency con­stant-am­pli­tude waves in­ter­spersed with silence.

My point, though, is that it’s not rea­son­able to as­sume that tran­si­tivity holds ax­io­mat­i­cally when it’s sim­ple to con­struct a toy model where it doesn’t.

On a macro level, I can imag­ine a per­son with diet­ing prob­lems prefer­ring starv­ing > a hot fudge sun­dae, cel­ery > starv­ing, and a hot fudge sun­dae > cel­ery.

• On a macro level, I can imag­ine a per­son with diet­ing prob­lems prefer­ring starv­ing > a hot fudge sun­dae, cel­ery > starv­ing, and a hot fudge sun­dae > cel­ery.

My ex­pe­rience is that this is gen­er­ally be­cause of a mea­sure­ment prob­lem, not a re­flec­tively en­dorsed state­ment.

• Well, it’s clearly patholog­i­cal in some sense, but the space of ac­tions to be (pre)or­dered is as­tro­nom­i­cally big and re­flec­tive en­dorse­ment is slow, so you can’t use­fully er­ror-check the space that way. cf. Love­craft’s com­ment about “the in­abil­ity of the hu­man mind to cor­re­late all its con­tents”.

I don’t think it will do to sim­ply as­sume that an ac­tu­ally in­stan­ti­ated agent will have a tran­si­tive set of ex­pressed prefer­ences. Bit like as­sum­ing your code is bugfree.

• The agent is al­lowed to ask it’s sub­mod­ules how they would feel about var­i­ous gam­bles e.g. “Would you pre­fer B or a 50% prob­a­bil­ity of A and a 50% prob­a­bil­ity of C”. Equipped with this ex­tra in­for­ma­tion a vot­ing para­dox can be avoided. This is be­cause the prefer­ences over gam­bles tell you not just which or­der the sub­mod­ule would rank the can­di­dates in, but quan­ti­ta­tively how much it cares about each of them.

As­sum­ing the sub­mod­ules are ra­tio­nal (which they had bet­ter be if we want the over­all agent to be ra­tio­nal) then their prefer­ences over gam­bles can be ex­pressed as a util­ity func­tion on the out­comes. So then the main agent can make its util­ity func­tion a weighted sum of theirs. This avoids non-tran­si­tivity.

A prefer­ence or­der which says just what or­der the can­di­dates come in is called an “or­di­nal util­ity func­tion”.

A util­ity func­tion that ac­tu­ally de­scribes the rel­a­tive val­ues of the can­di­dates is a “car­di­nal util­ity func­tion”.

• Here is where I would like to sug­gest a small mod­ifi­ca­tion to this setup

I do not like this mod­ifi­ca­tion. His way is more el­e­gant be­cause it starts with less in­for­ma­tion. It is less ro­bust be­cause it does not de­pend on an ini­tial con­cept of “knowl­edge”. “knowl­edge” does not make sense in all in­stances.

P1 is Dutch Book jus­tifi­able, I think. For in­stance x has to not be preferred to x, or else trad­ing x for x would be a benefit.

• This thought isn’t origi­nal to me, but it’s prob­a­bly worth mak­ing. It feels like there are two sorts of ax­ioms. I am fol­low­ing tra­di­tion in de­scribing them as “ra­tio­nal­ity ax­ioms” and “struc­ture ax­ioms”. The ra­tio­nal­ity ax­ioms (like the tran­si­tivity of the or­der among acts) are norms on ac­tion. The struc­ture ax­ioms (like P6) aren’t nor­ma­tive at all. (It’s about struc­ture on the world, how bizarre is it to say “The world ought to be such that P6 holds of it”?)

Given this, and given the ne­ces­sity of the struc­ture ax­ioms for the proof, it feels like Sav­age’s the­o­rem can’t serve as a jus­tifi­ca­tion of Bayesian episte­molgy as a norm of ra­tio­nal be­havi­our.

• P6 is re­ally both. Struc­turally, it forces there to be some­thing like a coin that we can flip as many times as we want. But nor­ma­tively, we can say that if the agent has blah blah blah prefer­ence, it shall be able to name a par­ti­tion such that blah blah blah. See e.g. [rule 4]. This of course doesn’t ad­dress why we think such a thing is nor­ma­tive, but that’s an­other is­sue.

• But why ought the world be such that such a par­ti­tion ex­ists for us to name? That doesn’t seem nor­ma­tive. I guess there’s a minor nor­ma­tive el­e­ment in that it de­mands “If the world con­spires to al­low us to have par­ti­tions like the ones needed in P6, then the agent must be able to know of them and rea­son about them” but that still seems sec­ondary to the de­mand that the world is thus and so.

• Agreed, the struc­tural com­po­nent is not nor­ma­tive. But to me, it is the struc­tural part that seems be­nign.

If we as­sume the agent lives for­ever, and there’s always some un­cer­tainty, then surely the world is thus and so. If the agent doesn’t live for­ever, then we’re into bounded ra­tio­nal­ity ques­tions, and even tran­si­tivity is up in the air.

• P6 en­tails that there are (un­countably) in­finitely many events. It is at least com­pat­i­ble with mod­ern physics that the world is fun­da­men­tally dis­crete both spa­tially and tem­po­rally. The visi­ble uni­verse is bounded. So it may be that there are only finitely many pos­si­ble con­figu­ra­tions of the uni­verse. It’s a big num­ber sure, but if it’s finite, then Sav­age’s the­o­rem is ir­rele­vant. It doesn’t tell us any­thing about what to be­lieve in our world. This is per­haps a silly point, and there’s prob­a­bly a nearby the­o­rem that works for “ap­pro­pri­ately large finite wor­lds”, but still. I don’t think you can just un­crit­i­cally say “surely the world is thus and so”.

If this is sup­posed to say some­thing nor­ma­tive about how I should struc­ture my be­liefs, then the struc­tural premises should be true of the world I have be­liefs about.

• I don’t think you can just un­crit­i­cally say “surely the world is thus and so”.

But it was a con­di­tional state­ment. If the uni­verse is dis­crete and finite, then ob­vi­ously there are no im­mor­tal agents ei­ther.

Ba­si­cally I don’t see that as­pect of P6 as more prob­le­matic than the un­bounded re­source as­sump­tion. And when we ques­tion that as­sump­tion, we’ll be ques­tion­ing a lot more than P6.

• Here is a small coun­terex­am­ple to P2. States = { Red, Green, Blue }. Out­comes = { Win, Lose }. Since there are only two out­comes, we can write ac­tions as the sub­set of states that Win. My prefer­ences are: {} < { Green } = { Blue } < { Red } < { Red,Green } = { Red,Blue } < { Green,Blue } < { Red,Green,Blue }

Here is a situ­a­tion where this may ap­ply: There is an urn with 300 balls. 100 of them are red. The rest are ei­ther green or blue. You draw a ball from this urn.

So Red rep­re­sents definite prob­a­bil­ity 13, while Green and Blue are un­knowns. Depend­ing on con­text, it sure looks like these are the right prefer­ences to have. This is called the Ells­berg para­dox.

Even if you in­sist this is some­how wrong, it is not go­ing to be Dutch booked. Even if we ex­tend the state space to in­clude ar­bi­trar­ily many fair coins (as P6 may re­quire), and even if we ex­tend the re­sult space to al­low for mul­ti­ple draws or other pay­outs, we can define var­i­ous con­sis­tent ob­jec­tive func­tions (that are not ex­pected util­ity) which show this be­havi­our.

• the first 6 ax­ioms are enough to han­dle finite gam­bles, 7 is only needed for more gen­eral situations

You mean P7 is im­plied already by P1-6 for finite B, I as­sume.

• No, I meant that P1-P6 im­ply the ex­pected util­ity hy­poth­e­sis for finite gam­bles, i.e., if f and g each only take on finitely many val­ues (out­side a set of prob­a­bil­ity 0). They there­fore also im­ply P7 for finite gam­bles, and hence in par­tic­u­lar for finite B, but “finite B” is a very strict con­di­tion—un­der P1-P6, any finite B will always be null, so P7 will be true for them triv­ially!

• Okay. I was con­sid­er­ing finite gam­bles backed by a finite S, al­though of course that need not be the case. Do these ax­ioms only ap­ply to in­finite S? If so, I didn’t no­tice where that was stated—is it a con­se­quence I missed? I’m also cu­ri­ous why P1-P6 im­ply that any finite B must be null:

D3: An event B is said to be null if f≤g given B for any ac­tions f and g.

• A finite B nec­es­sar­ily has only finitely many sub­sets, while any non­null B nec­es­sar­ily has at least con­tinuum-many sub­sets, since there is always a sub­set of any given prob­a­bil­ity at most P(B).

Ba­si­cally one of the effects of P6 is to en­sure we’re not in a “small world”. See all that stuff about uniform par­ti­tions into ar­bi­trar­ily many parts, etc.

• Yes, P6 very clearly says that. Some­how I skipped it on first read­ing. So when you add P6, S is prov­ably in­finite. Thanks.