# Probability is Real, and Value is Complex

(This post idea is due en­tirely to Scott Garrabrant, but it has been sev­eral years and he hasn’t writ­ten it up.)

In 2009, Vladimir Nesov ob­served that prob­a­bil­ity can be mixed up with util­ity in differ­ent ways while still ex­press­ing the same prefer­ences. The ob­ser­va­tion was con­cep­tu­ally similar to one made by Jeffrey and Bolker in the book The Logic of De­ci­sion, so I give them in­tel­lec­tual pri­or­ity, and re­fer to the re­sult as “Jeffrey-Bolker ro­ta­tion”.

Based on Nesov’s post, Scott came up with a way to rep­re­sent prefer­ences as vec­tor-val­ued mea­sures, which makes the re­sult ge­o­met­ri­cally clear and math­e­mat­i­cally el­e­gant.

## Vec­tor Valued Preferences

As usual, we think of a space of events which form a sigma alge­bra. Each event has a prob­a­bil­ity and an ex­pected util­ity as­so­ci­ated with it. How­ever, rather than deal­ing with di­rectly, we define . Vladimir Nesov called “should­ness”, but that’s fairly mean­ingless. Since it is graphed on the y-axis, rep­re­sents util­ity times prob­a­bil­ity, and is oth­er­wise fairly mean­ingless, a good name for it is “up”. Here is a graph of prob­a­bil­ity and up­ness for some events, rep­re­sented as vec­tors:

(The post ti­tle is a pun on the fact that this looks like the com­plex plane: events are com­plex num­bers with real com­po­nent P and imag­i­nary com­po­nent Q. How­ever, it is bet­ter to think of this as a generic 2D vec­tor space rather than the com­plex plane speci­fi­cally.)

If we as­sume and are mu­tu­ally ex­clu­sive events (that is, ), then calcu­lat­ing the P and Q of their union is sim­ple. The prob­a­bil­ity of the union of two mu­tu­ally ex­clu­sive events is just the sum:

The ex­pected util­ity is the weighted sum of the com­po­nent parts, nor­mal­ized by the sum of the prob­a­bil­ities:

The nu­mer­a­tor is just the sum of the should­nesses, and the de­nom­i­na­tor is just the prob­a­bil­ity of the union:

But, we can mul­ti­ply both sides by the de­nom­i­na­tor to get a re­la­tion­ship on should­ness alone:

Thus, we know that both co­or­di­nates of are sim­ply the sum of the com­po­nent parts. This means union of dis­joint events is vec­tor ad­di­tion in our vec­tor space, as illus­trated in my di­a­gram ear­lier.

## Lin­ear Transformations

When we rep­re­sent prefer­ences in a vec­tor space, it is nat­u­ral to think of them as ba­sis-in­de­pen­dent: the way we drew the axes was ar­bi­trary; all that mat­ters is the sys­tem of prefer­ences be­ing rep­re­sented. What this ends up mean­ing is that we don’t care about lin­ear trans­for­ma­tions of the space, so long as the prefer­ences don’t get re­flected (which re­verses the prefer­ence rep­re­sented). This is a gen­er­al­iza­tion of the usual “util­ity is unique up to af­fine trans­for­ma­tions with pos­i­tive co­effi­cient”: util­ity is no longer unique in that way, but the com­bi­na­tion of prob­a­bil­ity and util­ity is unique up to non-re­flect­ing lin­ear trans­for­ma­tions.

Let’s look at that vi­su­ally. Mul­ti­ply­ing all the ex­pected util­ities by a pos­i­tive con­stant doesn’t change any­thing:

Ad­ding a con­stant to ex­pected util­ity doesn’t change any­thing:

Slightly weird, but not too weird… mul­ti­ply­ing all the prob­a­bil­ities by a pos­i­tive con­stant (and the same for Q, since Q is U*P) doesn’t change any­thing (mean­ing we don’t care if prob­a­bil­ities are nor­mal­ized):

Here’s the re­ally new trans­for­ma­tion, which can com­bine with the other 4 to cre­ate all the valid trans­for­ma­tions. The Jeffrey-Bolker ro­ta­tion, which changes what parts of our prefer­ences are rep­re­sented in prob­a­bil­ities vs util­ities:

Let’s pause for a bit on this one, since it is re­ally the whole point of the setup. What does it mean to ro­tate our vec­tor-val­ued mea­sure?

A sim­ple ex­am­ple: sup­pose that we can take a left path, or a right path. There are two pos­si­ble wor­lds, which are equally prob­a­ble: in Left World, the left path leads to a golden city overflow­ing with wealth and char­ity, which we would like to go to with V=+1. The right path leads to a dan­ger­ous bad­lands full of ban­dits, which we would like to avoid, V=-1. On the other hand, Right World (so named be­cause we would pre­fer to go right in this world) has a some­what nice village on the right path, V=+.5, and a some­what nasty swamp on the left, V=-.5. Sup­pos­ing that we are (strangely enough) un­cer­tain about which path we take, we calcu­late the events as fol­lows:

• Go left in left-world:

• P=.25

• V=1

• Q=.25

• Go left in right-world:

• P=.25

• V=-.5

• Q=-.125

• Go right in left-world:

• P=.25

• V=-1

• Q=-.25

• Go right in right-world:

• P=.25

• V=.5

• Q=.125

• Go left (union of the two left-go­ing cases):

• P=.5

• Q=.125

• V=Q/​P=.25

• Go right:

• P=.5

• Q=-.125

• V=Q/​P=-.25

We can calcu­late the V of each ac­tion and take the best. So, in this case, we sen­si­bly de­cide to go left, since the Left-world is more im­pact­ful to us and both are equally prob­a­ble.

Now, let’s ro­tate 30°. (Hope­fully I get the math right here.)

• Left in L-world:

• P=.09

• Q=.34

• V=3.7

• Left in R-world:

• P=.28

• Q=.02

• V=.06

• Right in L-world:

• P=.34

• Q=-.09

• V=-.26

• Right in R-world:

• P=.15

• Q=.23

• V=1.5

• Left over­all:

• P=.37

• Q=.36

• V=.97

• Right over­all:

• P=.49

• Q=.14

• V=.29

Now, it looks like go­ing left is ev­i­dence for be­ing in R-world, and go­ing right is ev­i­dence for be­ing in L-world! The dis­par­ity be­tween the wor­lds has also got­ten larger; L-world now has a differ­ence of al­most 4 util­ity be­tween the differ­ent paths, rather than 2. R-world now eval­u­ates both paths as pos­i­tive, with a differ­ence be­tween the two of only .9. Also note that our prob­a­bil­ities have stopped sum­ming to one (but as men­tioned already, this doesn’t mat­ter much; we could nor­mal­ize the prob­a­bil­ities if we want).

In any case, the fi­nal de­ci­sion is ex­actly the same, as we ex­pect. I don’t have a good in­tu­itive ex­pla­na­tion of what the agent is think­ing, but roughly, the de­creased con­trol the agent has over the situ­a­tion due to the cor­re­la­tion be­tween its ac­tions and which world it is in seems to be com­pen­sated for by the more ex­treme pay­off differ­ences in L-world.

## Ra­tional Preferences

Alright, so prefer­ences can be rep­re­sented as vec­tor-val­ued mea­sures in two di­men­sions. Does that mean ar­bi­trary vec­tor-val­ued mea­sures in two di­men­sions can be in­ter­preted as prefer­ences?

No.

The re­stric­tion that prob­a­bil­ities be non-nega­tive means that events can only ap­pear in quad­rants I and IV of the graph. We want to state this in a ba­sis-in­de­pen­dent way, though, since it is un­nat­u­ral to have a preferred ba­sis in a vec­tor space. One way to state the re­quire­ment is that there must be a line pass­ing through the (0,0) point, such that all of the events are strictly to one side of the line, ex­cept per­haps events at the (0,0) point it­self:

As illus­trated, there may be a sin­gle such line, or there may be mul­ti­ple, de­pend­ing on how closely prefer­ences hug the (0,0) point. The nor­mal vec­tor of this line (drawn in red) can be in­ter­preted as the di­men­sion, if you want to pull out prob­a­bil­ities in a way which guaran­tees that they are non-nega­tive. There may be a unique di­rec­tion cor­re­spond­ing to prob­a­bil­ity, and there may not. Since , we get a unique prob­a­bil­ity di­rec­tion if and only if we have events with both ar­bi­trar­ily high util­ities and ar­bi­trar­ily low. So, Jeffrey-Bolker ro­ta­tion is in­trin­si­cally tied up in the ques­tion of whether util­ities are bounded.

Ac­tu­ally, Scott prefers a differ­ent con­di­tion on vec­tor-val­ued mea­sures: that they have a unique (0,0) event. This al­lows for ei­ther in­finite pos­i­tive util­ities (not merely un­bounded—in­finite), or in­finite nega­tive util­ities, but not both. I find this less nat­u­ral. (Note that we have to have an empty event in our sigma-alge­bra, and it has to get value (0,0) as a ba­sic fact of vec­tor-val­ued mea­sures. Whether any other event is al­lowed to have that value is an­other ques­tion.)

How do we use vec­tor-val­ued prefer­ences to op­ti­mize? The ex­pected value of a vec­tor is the slope, . This runs into trou­ble for prob­a­bil­ity zero events, though, which we may cre­ate as we ro­tate. In­stead, we can pre­fer events which are less clock­wise:

(Note, how­ever, that the prefer­ence of a (0,0) event is un­defined.)

This gives the same an­swers for pos­i­tive-x-value, but keeps mak­ing sense as we ro­tate into other quad­rants. More and less clock­wise always makes sense as a no­tion since we as­sumed that the vec­tors always stay to one side of some line; we can’t spin around in a full cir­cle look­ing for the best op­tion, be­cause we will hit the sep­a­rat­ing line. This al­lows us to define a prefer­ence re­la­tion based on the an­gle of be­ing within 180° of ’s.

## Conclusion

This is a fun pic­ture of how prob­a­bil­ities and util­ities re­late to each other. It sug­gests that the two are in­ex­tri­ca­bly in­ter­twined, and mean­ingless in iso­la­tion. View­ing them in this way makes it some­what more nat­u­ral to think that prob­a­bil­ities are more like “car­ing mea­sure” ex­press­ing how much the agent cares about how things go in par­tic­u­lar wor­lds, rather than sub­jec­tive ap­prox­i­ma­tions of an ob­jec­tive “mag­i­cal re­al­ity fluid” which de­ter­mines what wor­lds are ex­pe­rienced. (See here for an ex­am­ple of this de­bate.) More prac­ti­cally, it gives a nice tool for vi­su­al­iz­ing the Jeffrey-Bolker ro­ta­tion, which helps us think about prefer­ence re­la­tions which are rep­re­sentable via mul­ti­ple differ­ent be­lief dis­tri­bu­tions.

A down­side of this frame­work is that it re­quires agents to be able to ex­press a prefer­ence be­tween any two events, which might be a lit­tle ab­surd. Let me know if you figure out how to con­nect this to com­plete-class style foun­da­tions which only re­quire agents to have prefer­ences over things which they can con­trol.

• I’ll ad­mit that I’m skep­ti­cal. It’s a cool math­e­mat­i­cal trick, but why should we think it is any­thing more than that?

• The unique­ness of 0 is only roughly equiv­a­lent to the half plane defi­ni­tion if you also as­sume con­vex­ity (I.e. the ex­is­tence of in­de­pen­dent coins of no value.)

• I can’t make sense of the part with R-world and L-world. You as­sign prob­a­bil­ities to your pos­si­ble ac­tions (by what rule?) then do ar­ith­metic on them to de­cide which ac­tion to take (why does that de­pend on prob­a­bil­ities of ac­tions?) then ro­tate the pic­ture and find that ac­tions are cor­re­lated with hid­den facts (how can such cor­re­la­tion hap­pen?) It looks like this metaphor doesn’t work very well for de­ci­sion-mak­ing, or we’re us­ing it wrong.

• Well… I agree with all of the “that’s pe­cu­liar” im­pli­ca­tions there. To an­swer your ques­tion:

The as­sign­ment of prob­a­bil­ities to ac­tions doesn’t in­fluence the fi­nal de­ci­sion here. We just need to as­sign prob­a­bil­ities to ev­ery­thing. They could be any­thing, and the de­ci­sion would come out the same.

The magic cor­re­la­tion is definitely weird. Be­fore I worked out an ex­am­ple for this post, I thought I had a rough idea of what Jeffrey-Bolker ro­ta­tion does to the prob­a­bil­ities and util­ities, but I was wrong.

I see the epistemic sta­tus of this as “coun­ter­in­tu­itive fact” rather than “us­ing the metaphor wrong”. The vec­tor-val­ued mea­sure is just a way to vi­su­al­ize it. You can set up ax­ioms in which the Jeffrey-Bolker ro­ta­tion is im­pos­si­ble (like the Sav­age ax­ioms), but in my opinion they’re cheat­ing to rule it out. In any case, this weird­ness clearly fol­lows from the Jeffrey-Bolker ax­ioms of de­ci­sion the­ory.

• The as­sign­ment of prob­a­bil­ities to ac­tions doesn’t in­fluence the fi­nal de­ci­sion here. We just need to as­sign prob­a­bil­ities to ev­ery­thing. They could be any­thing, and the de­ci­sion would come out the same.

Aren’t there mean­ingful con­straints here? If I think it’s equally likely that I’m in L-world and R-world and that this is in­de­pen­dent of my ac­tion, then I have the con­straint that P(Left, L-world)=P(Left, R-world) and an­other con­straint that P(Right, L-world)=P(Right, R-world), and if I haven’t de­cided yet then I have a con­straint that P>0 (since at my pre­sent state of knowl­edge I could take any of the ac­tions). But be­yond that, pos­i­tive lin­ear scal­ings are ir­rele­vant.

• I have the fol­low­ing ques­tion, the an­swer to which may be ob­vi­ous but I have difficulty to un­der­stand: “ex­pected util­ity” in a game is already mul­ti­pli­ca­tion of ex­pected prize on its prob­a­bil­ity. Why we mul­ti­ply it on the prob­a­bil­ity again?

• Abram is mul­ti­ply­ing the con­di­tional ex­pected util­ity of an event by the prob­a­bil­ity of that event. For ex­am­ple, the util­ity of a lot­tery ticket con­di­tional on win­ning the lot­tery could be a mil­lion dol­lars, and we mul­ti­ply that by the prob­a­bil­ity of win­ning the lot­tery. The re­sult is “probu­til­ity” of an event. Tak­ing the union of dis­joint events is lin­ear in both prob­a­bil­ities and probu­til­ities, so we can think of them as co­or­di­nates of a vec­tor.

• I still have a feel­ing that he is us­ing “ex­pected util­ity” term differ­ently than it is used in other places where it is already pre­sented as (util­ity)x(prob­a­bil­ity), like here: https://​​wiki.less­wrong.com/​​wiki/​​Ex­pected_utility

E.g.: In your ex­am­ple: util­ity of a win­ning ticket = 1 mil­lion USD

Prob­a­bil­ity of win­ning: one millionth

Ex­pected util­ity of a ticket = 1 USD.

Probu­til­ity = ???

• I was con­fused about this too, but now I think I have some idea of what’s go­ing on.

Nor­mally prob­a­bil­ity is defined for events, but ex­pected value is defined for ran­dom vari­ables, not events. What is hap­pen­ing in this post is that we are tak­ing the ex­pected value of events, by way of the con­di­tional ex­pected value of the ran­dom vari­able (con­di­tion­ing on the event). In sym­bols, if is some event in our sam­ple space, we are say­ing , where is some ran­dom vari­able (this ran­dom vari­able is sup­posed to be clear from the con­text, so it doesn’t ap­pear on the left hand side of the equa­tion).

Go­ing back to cousin_it’s lot­tery ex­am­ple, we can for­mal­ize this as fol­lows. The sam­ple space can be and the prob­a­bil­ity mea­sure is defined as and . The ran­dom vari­able rep­re­sents the lot­tery, and it is defined by and .

Now we can calcu­late. The ex­pected value of the lot­tery is:

The ex­pected value of win­ning is:

The “probu­til­ity” of win­ning is:

So in this case, the “probu­til­ity” of win­ning is the same as the ex­pected value of the lot­tery. How­ever, this is only the case be­cause the situ­a­tion is so sim­ple. In par­tic­u­lar, if was not equal to zero (while win­ning and los­ing re­mained ex­clu­sive events), then the two would have been differ­ent (the ex­pected value of the lot­tery would have changed while the “probu­til­ity” would have re­mained the same).

• What is hap­pen­ing in this post is that we are tak­ing the ex­pected value of events, by way of the con­di­tional ex­pected value of the ran­dom vari­able (con­di­tion­ing on the event).

...and I was en­light­ened. As­sum­ing this is cor­rect (it fits with how I read this post and a cou­ple oth­ers), this seems like a much bet­ter way to ex­plain what’s go­ing on with probu­til­ity.

• Probu­til­ity of win­ning = 1 USD

• So what is the differ­ence be­tween probu­til­ity and “ex­pected util­ity”? is it just an­other name for well-known idea? (The com­ment was ed­ited as at first I read “probu­til­ity” as “prob­a­bil­ity” in your com­ment.)

• I am con­fused. My cur­rent un­der­stand­ing is that we’re start­ing with only a prefer­ence re­la­tion, and no as­sump­tions on prob­a­bil­ity (so no lot­ter­ies, as in the VNM the­o­rem). In that case, there are tons of util­ity func­tions that can model any given ar­bi­trary prefer­ence re­la­tion. It seems like I could get a re­sult like this by say­ing “take the prefer­ence re­la­tion, write down a util­ity func­tion that en­codes it, de­com­pose it into the ra­tio of two parts, call one of them ‘prob­a­bil­ity’ and the other ‘prob­a­bil­ity*util­ity’, and now note that there are trans­for­ma­tions to other util­ity func­tions that en­code the same prefer­ence re­la­tion and un­sur­pris­ingly they change the rel­a­tive amounts of each of the parts—there­fore prob­a­bil­ity and util­ity are in­ex­tri­ca­bly linked”. (This is al­most cer­tainly ei­ther wrong or a straw­man, but I don’t know how.) But in all of this there’s no rea­son to think of the de­nom­i­na­tor of the ra­tio as “prob­a­bil­ity”, we just called it that sug­ges­tively. Per­haps my cri­tique is that if we start with _just_ a prefer­ence re­la­tion and only need to keep the prefer­ence re­la­tion in­tact, we shouldn’t ex­pect to re­cover any­thing like nor­mal ex­pected util­ity the­ory, be­cause there’s no for­mal rea­son to have any­thing like prob­a­bil­ities. Even if you want to in­ter­pret prob­a­bil­ity as a “car­ing mea­sure” in­stead of “mag­i­cal re­al­ity fluid” it should still show up be­fore you work through the math and in­ter­pret one of the quan­tities as “car­ing mea­sure”. But mostly I’m con­fused so who knows, this may all be in­co­her­ent.

• What does it look like to ro­tate and then renor­mal­ize?

There seem to be two an­swers. The first an­swer is that the high­est prob­a­bil­ity event is the one farthest to the right. This event must be the en­tire . All we do to renor­mal­ize is scale un­til this event is prob­a­bil­ity 1.

If we ro­tate un­til some prob­a­bil­ities are nega­tive, and then renor­mal­ize in this way, the nega­tive prob­a­bil­ities stay nega­tive, but rescale.

The sec­ond way to renor­mal­ize is to choose a sep­a­rat­ing line, and use its nor­mal vec­tor as prob­a­bil­ity. This keeps prob­a­bil­ity pos­i­tive. Then we find the high­est prob­a­bil­ity event as be­fore, and call this prob­a­bil­ity 1.

Try­ing to pic­ture this, an ob­vi­ous ques­tion is: can the high­est prob­a­bil­ity event change when we ro­tate?

• The re­stric­tion that prob­a­bil­ities be non-nega­tive means that events can only ap­pear in quad­rants I and IV of the graph.

Why do we re­strict the prob­a­bil­ities to be non-nega­tive? Is there any­thing in par­tic­u­lar that keeps us from pul­ling an Aaron­son and gen­er­al­iz­ing prob­a­bil­ity to in­clude nega­tive and com­plex com­po­nents, even ab­sent a clear mo­ti­va­tor like QM?

• Doesn’t the ne­ces­sity of this half-plane con­tain­ing the ac­tions re­store a differ­ence, even just within this de­ci­sion al­gorithm (leav­ing aside the unique­ness of prob­a­bil­ity in learn­ing and rea­son­ing), be­tween prob­a­bil­ity and util­ity? Prob­a­bil­ity is the di­rec­tion per­pen­dicu­lar to this in­visi­ble line, and util­ity is the slope rel­a­tive to this in­visi­ble line.

• It can, if there is a unique line. There isn’t a unique line in gen­eral—you can draw sev­eral lines, get­ting differ­ent prob­a­bil­ity di­rec­tions for each.

• Sure. And given the rescal­a­bil­ity, for any set of val­ues you can rescale ev­ery­thing so that al­most any line is pos­si­ble. But then ev­ery­thing is in some sus­pi­ciously nar­row band, which again lends it­self to a prin­ci­pal com­po­nent sort of co­or­di­nate sys­tem.

When in­fer­ring such a di­vi­sion, in­ter­est­ingly the or­der­ing of value re­mains un­changed, but the or­der­ing of the in­ferred prob­a­bil­ity can be differ­ent than the or­der­ing of the origi­nal prob­a­bil­ity, be­cause av­er­age-value events are in­ter­preted as more prob­a­ble.