# An Orthodox Case Against Utility Functions

This post has benefit­ted from dis­cus­sion with Sam Eisen­stat, Scott Garrabrant, Tsvi Ben­son-Tilsen, Daniel Dem­ski, Daniel Koko­ta­jlo, and Stu­art Arm­strong. It started out as a thought about Stu­art Arm­strong’s re­search agenda.

In this post, I hope to say some­thing about what it means for a ra­tio­nal agent to have prefer­ences. The view I am putting for­ward is rel­a­tively new to me, but it is not very rad­i­cal. It is, dare I say, a con­ser­va­tive view—I hold close to Bayesian ex­pected util­ity the­ory. How­ever, my im­pres­sion is that it differs greatly from com­mon im­pres­sions of Bayesian ex­pected util­ity the­ory.

I will ar­gue against a par­tic­u­lar view of ex­pected util­ity the­ory—a view which I’ll call re­duc­tive util­ity. I do not re­call see­ing this view ex­plic­itly laid out and defended (ex­cept in in-per­son con­ver­sa­tions). How­ever, I ex­pect at least a good chunk of the as­sump­tions are com­monly made.

# Re­duc­tive Utility

The core tenets of re­duc­tive util­ity are as fol­lows:

• The sam­ple space of a ra­tio­nal agent’s be­liefs is, more or less, the set of pos­si­ble ways the world could be—which is to say, the set of pos­si­ble phys­i­cal con­figu­ra­tions of the uni­verse. Hence, each world is one such con­figu­ra­tion.

• The prefer­ences of a ra­tio­nal agent are rep­re­sented by a util­ity func­tion from wor­lds to real num­bers.

• Fur­ther­more, the util­ity func­tion should be a com­putable func­tion of wor­lds.

Since I’m set­ting up the view which I’m knock­ing down, there is a risk I’m strik­ing at a straw man. How­ever, I think there are some good rea­sons to find the view ap­peal­ing. The fol­low­ing sub­sec­tions will ex­pand on the three tenets, and at­tempt to provide some mo­ti­va­tion for them.

If the three points seem ob­vi­ous to you, you might just skip to the next sec­tion.

## Wor­lds Are Ba­si­cally Physical

What I mean here re­sem­bles the stan­dard phys­i­cal-re­duc­tion­ist view. How­ever, my em­pha­sis is on cer­tain fea­tures of this view:

• There is some “ba­sic stuff”—like like quarks or vibrat­ing strings or what-have-you.

• What there is to know about the world is some set of state­ments about this ba­sic stuff—par­ti­cle lo­ca­tions and mo­men­tums, or wave-form func­tion val­ues, or what-have-you.

• Th­ese spe­cial atomic state­ments should be log­i­cally in­de­pen­dent from each other (though they may of course be prob­a­bil­is­ti­cally re­lated), and to­gether, fully de­ter­mine the world.

• Th­ese should (more or less) be what be­liefs are about, such that we can (more or less) talk about be­liefs in terms of the sam­ple space as be­ing the set of wor­lds un­der­stood in this way.

This is the so-called “view from nowhere”, as Thomas Nagel puts it.

I don’t in­tend to con­strue this po­si­tion as rul­ing out cer­tain non-phys­i­cal facts which we may have be­liefs about. For ex­am­ple, we may be­lieve in­dex­i­cal facts on top of the phys­i­cal facts—there might be (1) be­liefs about the uni­verse, and (2) be­liefs about where we are in the uni­verse. Ex­cep­tions like this vi­o­late an ex­treme re­duc­tive view, but are still close enough to count as re­duc­tive think­ing for my pur­poses.

## Utility Is a Func­tion of Worlds

So we’ve got the “ba­si­cally phys­i­cal” . Now we write down a util­ity func­tion . In other words, util­ity is a ran­dom vari­able on our event space.

What’s the big deal?

One thing this is say­ing is that prefer­ences are a func­tion of the world. Speci­fi­cally, prefer­ences need not only de­pend on what is ob­served. This is in­com­pat­i­ble with stan­dard RL in a way that mat­ters.

But, in ad­di­tion to say­ing that util­ity can de­pend on more than just ob­ser­va­tions, we are re­strict­ing util­ity to only de­pend on things that are in the world. After we con­sider all the in­for­ma­tion in , there can­not be any ex­tra un­cer­tainty about util­ity—no ex­tra “moral facts” which we may be un­cer­tain of. If there are such moral facts, they have to be pre­sent some­where in the uni­verse (at least, deriv­able from facts about the uni­verse).

One im­pli­ca­tion of this: if util­ity is about high-level en­tities, the util­ity func­tion is re­spon­si­ble for de­riv­ing them from low-level stuff. For ex­am­ple, if the uni­verse is made of quarks, but util­ity is a func­tion of beauty, con­scious­ness, and such, then needs to con­tain the beauty-de­tec­tor and con­scious­ness-de­tec­tor and so on—oth­er­wise how can it com­pute util­ity given all the in­for­ma­tion about the world?

## Utility Is Computable

Fi­nally, and most crit­i­cally for the dis­cus­sion here, should be a com­putable func­tion.

To clar­ify what I mean by this: should have some sort of rep­re­sen­ta­tion which al­lows us to feed it into a Tur­ing ma­chine—let’s say it’s an in­finite bit-string which as­signs true or false to each of the “atomic sen­tences” which de­scribe the world. should be a com­putable func­tion; that is, there should be a Tur­ing ma­chine which takes a ra­tio­nal num­ber and takes , prints a ra­tio­nal num­ber within of , and halts. (In other words, we can com­pute to any de­sired de­gree of ap­prox­i­ma­tion.)

Why should be com­putable?

One ar­gu­ment is that should be com­putable be­cause the agent has to be able to use it in com­pu­ta­tions. This per­spec­tive is es­pe­cially ap­peal­ing if you think of as a black-box func­tion which you can only op­ti­mize through search. If you can’t eval­u­ate , how are you sup­posed to use it? If ex­ists as an ac­tual mod­ule some­where in the brain, how is it sup­posed to be im­ple­mented? (If you don’t think this sounds very con­vinc­ing, great!)

Re­quiring to be com­putable may also seem easy. What is there to lose? Are there prefer­ence struc­tures we re­ally care about be­ing able to rep­re­sent, which are fun­da­men­tally not com­putable?

And what would it even mean for a com­putable agent to have non-com­putable prefer­ences?

How­ever, the com­putabil­ity re­quire­ment is more re­stric­tive than it may seem.

There is a sort of con­ti­nu­ity im­plied by com­putabil­ity: must not de­pend too much on “small” differ­ences be­tween wor­lds. The com­pu­ta­tion only ac­cesses finitely many bits of be­fore it halts. All the rest of the bits in must not make more than differ­ence to the value of .

This means some seem­ingly sim­ple util­ity func­tions are not com­putable.

As an ex­am­ple, con­sider the pro­cras­ti­na­tion para­dox. Your task is to push a but­ton. You get 10 util­ity for push­ing the but­ton. You can push it any time you like. How­ever, if you never press the but­ton, you get −10. On any day, you are fine with putting the but­ton-press­ing off for one more day. Yet, if you put it off for­ever, you lose!

We can think of as a string like 000000100.., where the “1” is the day you push the but­ton. To com­pute the util­ity, we might look for the “1″, out­putting 10 if we find it.

But what about the all-zero uni­verse, 0000000...? The pro­gram must loop for­ever. We can’t tell we’re in the all-zero uni­verse by ex­am­in­ing any finite num­ber of bits. You don’t know whether you will even­tu­ally push the but­ton. (Even if the uni­verse also gives your source code, you can’t nec­es­sar­ily tell from that—the log­i­cal difficulty of de­ter­min­ing this about your­self is, of course, the origi­nal point of the pro­cras­ti­na­tion para­dox.)

Hence, a prefer­ence struc­ture like this is not com­putable, and is not al­lowed ac­cord­ing to the re­duc­tive util­ity doc­trine.

The ad­vo­cate of re­duc­tive util­ity might take this as a vic­tory. The pro­cras­ti­na­tion para­dox has been avoided, and other para­doxes with a similar struc­ture. (The St. Peters­burg Para­dox is an­other ex­am­ple.)

On the other hand, if you think this is a le­gi­t­i­mate prefer­ence struc­ture, deal­ing with such ‘prob­le­matic’ prefer­ences mo­ti­vates aban­don­ment of re­duc­tive util­ity.

# Sub­jec­tive Utility: The Real Thing

We can strongly op­pose all three points with­out leav­ing or­tho­dox Bayesi­anism. Speci­fi­cally, I’ll sketch how the Jeffrey-Bolker ax­ioms en­able non-re­duc­tive util­ity. (The ti­tle of this sec­tion is a refer­ence to Jeffrey’s book Sub­jec­tive Prob­a­bil­ity: The Real Thing.)

How­ever, the real po­si­tion I’m ad­vo­cat­ing is more grounded in log­i­cal in­duc­tion rather than the Jeffrey-Bolker ax­ioms; I’ll sketch that ver­sion at the end.

## The View From Somewhere

The re­duc­tive-util­ity view ap­proached things from the start­ing-point of the uni­verse. Beliefs are for what is real, and what is real is ba­si­cally phys­i­cal.

The non-re­duc­tive view starts from the stand­point of the agent. Beliefs are for things you can think about. This doesn’t rule out a phys­i­cal­ist ap­proach. What it does do is give high-level ob­jects like ta­bles and chairs an equal foot­ing with low-level ob­jects like quarks: both are in­ferred from sen­sory ex­pe­rience by the agent.

Rather than as­sum­ing an un­der­ly­ing set of wor­lds, Jeffrey-Bolker as­sume only a set of events. For two events and , the con­junc­tion ex­ists, and the dis­junc­tion , and the nega­tions and . How­ever, un­like in the Kol­mogorov ax­ioms, these are not as­sumed to be in­ter­sec­tion, union, and com­ple­ment of an un­der­ly­ing set of wor­lds.

Let me em­pha­size that: we need not as­sume there are “wor­lds” at all.

In philos­o­phy, this is called situ­a­tion se­man­tics—an al­ter­na­tive to the more com­mon pos­si­ble-world se­man­tics. In math­e­mat­ics, it brings to mind pointless topol­ogy.

In the Jeffrey-Bolker treat­ment, a world is just a max­i­mally spe­cific event: an event which de­scribes ev­ery­thing com­pletely. But there is no re­quire­ment that max­i­mally-spe­cific events ex­ist. Per­haps any event, no mat­ter how de­tailed, can be fur­ther ex­tended by spec­i­fy­ing some yet-un­men­tioned stuff. (In­deed, the Jeffrey-Bolker ax­ioms as­sume this! Although, Jeffrey does not seem philo­soph­i­cally com­mit­ted to that as­sump­tion, from what I have read.)

Thus, there need not be any “view from nowhere”—no se­man­tic van­tage point from which we see the whole uni­verse.

This, of course, de­prives us of the ob­jects which util­ity was a func­tion of, in the re­duc­tive view.

## Utility Is a Func­tion of Events

The re­duc­tive-util­ity makes a dis­tinc­tion be­tween util­ity—the ran­dom vari­able it­self—and ex­pected util­ity, which is the sub­jec­tive es­ti­mate of the ran­dom vari­able which we use for mak­ing de­ci­sions.

The Jeffrey-Bolker frame­work does not make a dis­tinc­tion. Every­thing is a sub­jec­tive prefer­ence eval­u­a­tion.

A re­duc­tive-util­ity ad­vo­cate sees the ex­pected util­ity of an event as de­rived from the util­ity of the wor­lds within the event. They start by defin­ing ; then, we define the ex­pected util­ity of an event as -- or, more gen­er­ally, the cor­re­spond­ing in­te­gral.

In the Jeffrey-Bolker frame­work, we in­stead define di­rectly on events. Th­ese prefer­ences are re­quired to be co­her­ent with break­ing things up into sums, so = -- but we do not define one from the other.

We don’t have to know how to eval­u­ate en­tire wor­lds in or­der to eval­u­ate events. All we have to know is how to eval­u­ate events!

## Up­dates Are Computable

Jeffrey-Bolker doesn’t say any­thing about com­putabil­ity. How­ever, if we do want to ad­dress this sort of is­sue, it leaves us in a differ­ent po­si­tion.

Be­cause sub­jec­tive ex­pec­ta­tion is pri­mary, it is now more nat­u­ral to re­quire that the agent can eval­u­ate events, with­out any re­quire­ment about a func­tion on wor­lds. (Of course, we could do that in the Kol­mogorov frame­work.)

Agents don’t need to be able to com­pute the util­ity of a whole world. All they need to know is how to up­date ex­pected util­ities as they go along.

Of course, the sub­jec­tive util­ity can’t be just any way of up­dat­ing as you go along. It needs to be co­her­ent, in the sense of the Jeffrey-Bolker ax­ioms. And, main­tain­ing co­her­ence can be very difficult. But it can be quite easy even in cases where the ran­dom-vari­able treat­ment of the util­ity func­tion is not com­putable.

Let’s go back to the pro­cras­ti­na­tion ex­am­ple. In this case, to eval­u­ate the ex­pected util­ity of each ac­tion at a given time-step, the agent does not need to figure out whether it ever pushes the but­ton. It just needs to have some prob­a­bil­ity, which it up­dates over time.

For ex­am­ple, an agent might ini­tially as­sign prob­a­bil­ity to press­ing the but­ton at time , and to never press­ing the but­ton. Its prob­a­bil­ity that it would ever press the but­ton, and thus its util­ity es­ti­mate, would de­crease with each ob­served time-step in which it didn’t press the but­ton. (Of course, such an agent would press the but­ton im­me­di­ately.)

Of course, this “solu­tion” doesn’t touch on any of the tricky log­i­cal is­sues which the pro­cras­ti­na­tion para­dox was origi­nally in­tro­duced to illus­trate. This isn’t meant as a solu­tion to the pro­cras­ti­na­tion para­dox—only as an illus­tra­tion of how to co­her­ently up­date dis­con­tin­u­ous prefer­ences. This sim­ple is un­com­putable by the defi­ni­tion of the pre­vi­ous sec­tion.

It also doesn’t ad­dress com­pu­ta­tional tractabil­ity in a very real way, since if the prior is very com­pli­cated, com­put­ing the sub­jec­tive ex­pec­ta­tions can get ex­tremely difficult.

We can come closer to ad­dress­ing log­i­cal is­sues and com­pu­ta­tional tractabil­ity by con­sid­er­ing things in a log­i­cal in­duc­tion frame­work.

# Utility Is Not a Function

In a log­i­cal in­duc­tion (LI) frame­work, the cen­tral idea be­comes “up­date your sub­jec­tive ex­pec­ta­tions in any way you like, so long as those ex­pec­ta­tions aren’t (too eas­ily) ex­ploitable to Dutch-book.” This clar­ifies what it means for the up­dates to be “co­her­ent”—it is some­what more el­e­gant than say­ing ”… any way you like, so long as they fol­low the Jeffrey-Bolker ax­ioms.”

This re­places the idea of “util­ity func­tion” en­tirely—there isn’t any need for a func­tion any more, just a log­i­cally-un­cer­tain-vari­able (LUV, in the ter­minol­ogy from the LI pa­per).

Ac­tu­ally, there are differ­ent ways one might want to set things up. I hope to get more tech­ni­cal in a later post. For now, here’s some bul­let points:

• In the sim­ple pro­cras­ti­na­tion-para­dox ex­am­ple, you push the but­ton if you have any un­cer­tainty at all. So things are not that in­ter­est­ing.

• In more com­pli­cated ex­am­ples—where there is some real benefit to pro­cras­ti­nat­ing—a LI-based agent could to­tally pro­cras­ti­nate for­ever. This is be­cause LI doesn’t give any guaran­tee about con­verg­ing to cor­rect be­liefs for un­com­putable propo­si­tions like whether Tur­ing ma­chines halt or whether peo­ple stop pro­cras­ti­nat­ing.

• Believ­ing you’ll stop pro­cras­ti­nat­ing even though you won’t is perfectly co­her­ent—in the same way that be­liev­ing in non­stan­dard num­bers is perfectly log­i­cally con­sis­tent. Put­ting our­selves in the shoes of such an agent, this just means we’ve ex­am­ined our own de­ci­sion-mak­ing to the best of our abil­ity, and have put sig­nifi­cant prob­a­bil­ity on “we don’t pro­cras­ti­nate for­ever”. This kind of rea­son­ing is nec­es­sar­ily fal­lible.

• Yet, if a sys­tem we built were to do this, we might have strong ob­jec­tions. So, this can count as an al­ign­ment prob­lem. How can we give feed­back to a sys­tem to avoid this kind of mis­take? I hope to work on this ques­tion in fu­ture posts.

• IIUC, you ar­gue that for an em­bed­ded agent to have an ex­plicit util­ity func­tion, it needs to be a func­tion of the micro­scopic de­scrip­tion of the uni­verse. This is un­satis­fac­tory since the agent shouldn’t start out know­ing micro­scopic physics. The al­ter­na­tive you sug­gest is us­ing the more ex­otic Jeffrey-Bolker ap­proach. How­ever, this is not how I be­lieve em­bed­ded agency should work.

In­stead, you should con­sider a util­ity func­tion that de­pends on the uni­verse de­scribed in what­ever on­tol­ogy the util­ity func­tion is defined (which we may call “macro­scopic”). Micro­scopic physics comes in when the agent learns a fine-grained model of the dy­nam­ics in the macro­scopic on­tol­ogy. In par­tic­u­lar, this fine-grained model can in­volve a fine-grained state space.

The other is­sue dis­cussed is util­ity func­tions of the sort ex­em­plified by the pro­cras­ti­na­tion para­dox. I think that be­sides be­ing un­com­putable, this brings in other patholo­gies. For ex­am­ple, since the util­ity func­tions you con­sider are dis­con­tin­u­ous, it is no longer guaran­teed an op­ti­mal policy ex­ists at all. Per­son­ally, I think dis­con­tin­u­ous util­ity func­tions are strange and poorly mo­ti­vated.

• I don’t want to make a strong ar­gu­ment against your po­si­tion here. Your po­si­tion can be seen as one ex­am­ple of “don’t make util­ity a func­tion of the micro­scopic”.

But let’s pre­tend for a minute that I do want to make a case for my way of think­ing about it as op­posed to yours.

• Hu­mans are not clear on what macro­scopic physics we at­tach util­ity to. It is pos­si­ble that we can em­u­late hu­man judge­ment suffi­ciently well by learn­ing over macro­scopic-util­ity hy­pothe­ses (ie, par­tial hy­pothe­ses in your frame­work). But per­haps no in­di­vi­d­ual hy­poth­e­sis will suc­cess­fully cap­ture the way hu­man value judge­ments fluidly switch be­tween macro­scopic on­tolo­gies—per­haps hu­man rea­son­ing of this kind can only be ac­cu­rately cap­tured by a dy­namic LI-style “trader” who re­acts flex­ibly to an ob­served situ­a­tion, rather than a fixed par­tial hy­poth­e­sis. In other words, per­haps we need to cap­ture some­thing about how hu­mans rea­son, rather than any fixed on­tol­ogy (even of the flex­ible macro­scopic kind).

• Your way of han­dling macro­scopic on­tolo­gies en­tails knigh­tian un­cer­tainty over the micro­scopic pos­si­bil­ities. Isn’t that go­ing to lack a lot of op­ti­miza­tion power? EG, if hu­mans rea­soned this way us­ing in­tu­itive physics, we’d be afraid that any sci­ence ex­per­i­ment cre­at­ing weird con­di­tions might de­stroy the world, and try to min­i­mize chances of those situ­a­tions be­ing set up, or some­thing along those lines? I’m guess­ing you have some way to miti­gate this, but I don’t know how it works.

As for dis­con­tin­u­ous util­ity:

For ex­am­ple, since the util­ity func­tions you con­sider are dis­con­tin­u­ous, it is no longer guaran­teed an op­ti­mal policy ex­ists at all. Per­son­ally, I think dis­con­tin­u­ous util­ity func­tions are strange and poorly mo­ti­vated.

My main mo­ti­vat­ing force here is to cap­ture the max­i­mal breadth of what ra­tio­nal (ie co­her­ent, ie non-ex­ploitable) prefer­ences can be, in or­der to avoid rul­ing out some hu­man prefer­ences. I have an in­tu­ition that this can ul­ti­mately help get the right learn­ing-the­o­retic guaran­tees as op­posed to hurt, but, I have not done any­thing to val­i­date that in­tu­ition yet.

With re­spect to pro­cras­ti­na­tion-like prob­lems, op­ti­mal­ity has to be sub­jec­tive, since there is no foolproof way to tell when an agent will pro­cras­ti­nate for­ever. If hu­mans have any prefer­ences like this, then al­ign­ment means al­ign­ment with hu­man sub­jec­tive eval­u­a­tions of this mat­ter—if the hu­man (or some ex­trap­o­lated hu­man vo­li­tion, like HCH) looks at the sys­tem’s be­hav­ior and says “NO!! Push the but­ton now, you fool!!” then the sys­tem is mis­al­igned. The value-learn­ing should ac­count for this sort of feed­back in or­der to avoid this. But this does not at­tempt to min­i­mize loss in an ob­jec­tive sense—we ex­port that con­cern to the (ex­trap­o­lated?) hu­man eval­u­a­tion which we are bound­ing loss with re­spect to.

With re­spect to the prob­lem of no-op­ti­mal-policy, my in­tu­ition is that you try for bounded loss in­stead; so (as with log­i­cal in­duc­tion) you are never perfect but you have some kind of mis­take bound. Of course this is more difficult with util­ity than it is with pure epistemics.

• Hu­mans are not clear on what macro­scopic physics we at­tach util­ity to. It is pos­si­ble that we can em­u­late hu­man judge­ment suffi­ciently well by learn­ing over macro­scopic-util­ity hy­pothe­ses (ie, par­tial hy­pothe­ses in your frame­work). But per­haps no in­di­vi­d­ual hy­poth­e­sis will suc­cess­fully cap­ture the way hu­man value judge­ments fluidly switch be­tween macro­scopic on­tolo­gies...

First, it seems to me rather clear what macro­scopic physics I at­tach util­ity to. If I care about peo­ple, this means my util­ity func­tion comes with some model of what a “per­son” is (that has many free pa­ram­e­ters), and if some­thing falls within the pa­ram­e­ters of this model then it’s a per­son, and if it doesn’t then it isn’t a per­son (ofc we can also have a fuzzy bound­ary, which is sup­ported in quasi-Bayesi­anism).

Se­cond, what does it mean for a hy­poth­e­sis to be “in­di­vi­d­ual”? If we have a prior over a fam­ily of hy­pothe­ses, we can take their con­vex com­bi­na­tion and get a new in­di­vi­d­ual hy­poth­e­sis. So I’m not sure what sort of “fluidity” you imag­ine that is not sup­ported by this.

Your way of han­dling macro­scopic on­tolo­gies en­tails knigh­tian un­cer­tainty over the micro­scopic pos­si­bil­ities. Isn’t that go­ing to lack a lot of op­ti­miza­tion power? EG, if hu­mans rea­soned this way us­ing in­tu­itive physics, we’d be afraid that any sci­ence ex­per­i­ment cre­at­ing weird con­di­tions might de­stroy the world, and try to min­i­mize chances of those situ­a­tions be­ing set up, or some­thing along those lines?

The agent doesn’t have full Knigh­tian un­cer­tainty over all micro­scopic pos­si­bil­ities. The prior is com­posed of re­fine­ments of an “on­tolog­i­cal be­lief” that has this un­cer­tainty. You can even con­sider a ver­sion of this for­mal­ism that is en­tirely Bayesian (i.e. each re­fine­ment has to be max­i­mal), but then you lose the abil­ity to re­tain an “ob­jec­tive” macro­scopic re­al­ity in which the agent’s point of view is “un­spe­cial”, be­cause if the agent’s be­liefs about this re­al­ity have no Knigh­tian un­cer­tainty then it’s in­con­sis­tent with the agent’s free will (you could “avoid” this prob­lem us­ing an EDT or CDT agent but this would be bad for the usual rea­sons EDT and CDT are bad, and ofc you need Knigh­tian un­cer­tainty any­way be­cause of non-re­al­iz­abil­ity).

• First, it seems to me rather clear what macro­scopic physics I at­tach util­ity to. If I care about peo­ple, this means my util­ity func­tion comes with some model of what a “per­son” is (that has many free pa­ram­e­ters), and if some­thing falls within the pa­ram­e­ters of this model then it’s a per­son,

This does not strike me as the sort of thing which will be easy to write out. But there are other ex­am­ples. What if hu­mans value some­thing like ob­server-in­de­pen­dent beauty? EG, valu­ing beau­tiful things ex­ist­ing re­gard­less of whether any­one ob­serves their beauty. Then it seems pretty un­clear what on­tolog­i­cal ob­jects it gets pred­i­cated on.

Se­cond, what does it mean for a hy­poth­e­sis to be “in­di­vi­d­ual”? If we have a prior over a fam­ily of hy­pothe­ses, we can take their con­vex com­bi­na­tion and get a new in­di­vi­d­ual hy­poth­e­sis. So I’m not sure what sort of “fluidity” you imag­ine that is not sup­ported by this.

What I have in mind is com­pli­cated in­ter­ac­tions be­tween differ­ent on­tolo­gies. Sup­pose that we have one on­tol­ogy—the on­tol­ogy of clas­si­cal eco­nomics—in which:

• Utility is pred­i­cated on in­di­vi­d­u­als alone.

• In­di­vi­d­u­als always and only value their own he­dons; any ap­par­ent re­vealed prefer­ence for some­thing else is ac­tu­ally an in­di­ca­tion that ob­serv­ing that thing makes the per­son happy, or that be­hav­ing as if they value that other thing makes them happy. (I don’t know why this is part of clas­si­cal eco­nomics, but it seems at least highly cor­re­lated with clas­si­cal-econ views.)

• Ag­gre­gate util­ity (across many in­di­vi­d­u­als) can only be defined by giv­ing an ex­change rate, since util­ity func­tions of differ­ent in­di­vi­d­u­als are in­com­pa­rable. How­ever, an ex­change rate is im­plic­itly de­ter­mined by the mar­ket.

And we have an­other on­tol­ogy—the hip­pie on­tol­ogy—in which:

• En­ergy, aka vibra­tions, is an es­sen­tial part of so­cial in­ter­ac­tions and other things.

• Peo­ple and things can have good en­ergy and bad en­ergy.

• Peo­ple can be on the same wave­length.

• Etc.

And sup­pose what we want to do is try to rec­on­cile the value-con­tent of these two differ­ent per­spec­tives. This isn’t go­ing to be a mix­ture be­tween two par­tial hy­pothe­ses. It might ac­tu­ally be closer to an in­ter­sec­tion be­tween two par­tial hy­pothe­ses—since the differ­ent hy­pothe­ses largely talk about differ­ent en­tities. But that won’t be right ei­ther. Rather, there is philo­soph­i­cal work to be done, figur­ing out how to ap­pro­pri­ately mix the val­ues which are rep­re­sented in the two on­tolo­gies.

My in­tu­ition be­hind al­low­ing prefer­ence struc­tures which are “un­com­putable” as func­tions of fully speci­fied wor­lds is, in part, that one might con­tinue do­ing this kind of philo­soph­i­cal work in an un­bounded way—IE there is no rea­son to as­sume there’s a point at which this philo­soph­i­cal work is finished and you now have some­thing which can be con­ve­niently rep­re­sented as a func­tion of some spe­cific set of en­tities. Much like log­i­cal in­duc­tion never finishes and gives you a Bayesian prob­a­bil­ity func­tion, even if it gets closer over time.

The agent doesn’t have full Knigh­tian un­cer­tainty over all micro­scopic pos­si­bil­ities. The prior is com­posed of re­fine­ments of an “on­tolog­i­cal be­lief” that has this un­cer­tainty. You can even con­sider a ver­sion of this for­mal­ism that is en­tirely Bayesian (i.e. each re­fine­ment has to be max­i­mal),

OK, that makes sense!

but then you lose the abil­ity to re­tain an “ob­jec­tive” macro­scopic re­al­ity in which the agent’s point of view is “un­spe­cial”, be­cause if the agent’s be­liefs about this re­al­ity have no Knigh­tian un­cer­tainty then it’s in­con­sis­tent with the agent’s free will (you could “avoid” this prob­lem us­ing an EDT or CDT agent but this would be bad for the usual rea­sons EDT and CDT are bad, and ofc you need Knigh­tian un­cer­tainty any­way be­cause of non-re­al­iz­abil­ity).

Right.

• First, it seems to me rather clear what macro­scopic physics I at­tach util­ity to...

This does not strike me as the sort of thing which will be easy to write out.

Of course it is not easy to write out. Hu­man­ity prefer­ences are highly com­plex. By “clear” I only meant that it’s clear some­thing like this ex­ists, not that I or any­one can write it out.

What if hu­mans value some­thing like ob­server-in­de­pen­dent beauty? EG, valu­ing beau­tiful things ex­ist­ing re­gard­less of whether any­one ob­serves their beauty.

This seems ill-defined. What is a “thing”? What does it mean for a thing to “ex­ist”? I can imag­ine valu­ing beau­tiful wild na­ture, by hav­ing “wild na­ture” be a part of the in­nate on­tol­ogy. I can even imag­ine prefer­ring cer­tain com­pu­ta­tions to have re­sults with cer­tain prop­er­ties. So, we can con­sider a prefer­ence that some kind of sim­plic­ity-prior-like com­pu­ta­tion out­puts bit se­quences with some com­plex­ity the­o­retic prop­erty we call “beauty”. But if you want to go even more ab­stract than that, I don’t know how to make sense of that (“make sense” not as “for­mal­ize” but just as “un­der­stand what you’re talk­ing about”).

It would be best if you had a sim­ple ex­am­ple, like a di­a­mond max­i­mizer, where it’s more or less clear that it makes sense to speak of agents with this prefer­ence.

What I have in mind is com­pli­cated in­ter­ac­tions be­tween differ­ent on­tolo­gies. Sup­pose that we have one on­tol­ogy—the on­tol­ogy of clas­si­cal eco­nomics—in which...

And we have an­other on­tol­ogy—the hip­pie on­tol­ogy—in which...

And sup­pose what we want to do is try to rec­on­cile the value-con­tent of these two differ­ent per­spec­tives.

Why do we want to rec­on­cile them? I think that you might be mix­ing two differ­ent ques­tions here. The first ques­tion is what kind of prefer­ences ideal “non-my­opic” agents can have. About this I main­tain that my frame­work pro­vides a good an­swer, or at least a good first ap­prox­i­ma­tion of the an­swer. The sec­ond ques­tion is what kind of prefer­ences hu­mans can have. But hu­mans are agents with only semi-co­her­ent prefer­ences, and I see no rea­son to be­lieve things like rec­on­cil­ing clas­si­cal eco­nomics with hip­pies should fol­low from any nat­u­ral math­e­mat­i­cal for­mal­ism. In­stead, I think we should model hu­mans as hav­ing prefer­ences that change over time, and the de­tailed dy­nam­ics of the change is just a func­tion the AI needs to learn, not some con­se­quence of math­e­mat­i­cal prin­ci­ples of ra­tio­nal­ity.

• Your way of han­dling macro­scopic on­tolo­gies en­tails knigh­tian un­cer­tainty over the micro­scopic pos­si­bil­ities.

Noth­ing can deal with quark-level pic­tures, so it’s the only op­tion.

EG, if hu­mans rea­soned this way us­ing in­tu­itive physics, we’d be afraid that any sci­ence ex­per­i­ment cre­at­ing weird con­di­tions might de­stroy the world

Us­ing in­tu­itive physics, there aren’t any micro­scopic con­di­tions. Its a re­cent dis­cov­ery that macro­scopic ob­jects are made of in­visi­bly tiny com­po­nents. So there was a time when peo­ple didn’t worry that mov­ing one elec­tron would de­stroy the uni­verse be­cause they had not heard of elec­trons, fol­lowed by a time when peo­ple knew that mov­ing one elec­tron would not de­stroy the uni­verse be­cause they un­der­stood elec­trons. Where’s the prob­lem?

• It seems to me that the Jeffrey-Bolker frame­work is a poor match for what’s go­ing on in peo­ples’ heads when they make value judge­ments, com­pared to the VNM frame­work. If I think about how good the con­se­quences of an ac­tion are, I try to think about what I ex­pect to hap­pen if I take that ac­tion (ie the out­come), and I think about how likely that out­come is to have var­i­ous prop­er­ties that I care about, since I don’t know ex­actly what the out­come will be with cer­tainty. This isn’t to say that I liter­ally con­sider prob­a­bil­ity dis­tri­bu­tions in my mind, since I typ­i­cally use qual­i­ta­tive de­scrip­tions of prob­a­bil­ity rather than num­bers in [0,1], and when I do use num­bers, they are very rough, but this does seem like a sort of fuzzy, com­pu­ta­tion­ally limited ver­sion of a prob­a­bil­ity dis­tri­bu­tion. Similarly, my es­ti­ma­tions of how good var­i­ous out­comes are are of­ten qual­i­ta­tive, rather than nu­mer­i­cal, and again this seems like a fuzzy, com­pu­ta­tion­ally limited ver­sion of util­ity func­tion. In or­der to de­ter­mine the util­ity of the event “I take ac­tion A”, I need to con­sider how good and how likely var­i­ous con­se­quences are, and take the ex­pec­ta­tion of the ‘how good’ with re­spect to the ‘how likely’. The Jeffrey-Bolker frame­work seems to be ask­ing me to pre­tend none of that ever hap­pened.

• If I think about how good the con­se­quences of an ac­tion are, I try to think about what I ex­pect to hap­pen if I take that ac­tion (ie the out­come), and I think about how likely that out­come is to have var­i­ous prop­er­ties that I care about, since I don’t know ex­actly what the out­come will be with cer­tainty… I need to con­sider how good and how likely var­i­ous con­se­quences are, and take the ex­pec­ta­tion of the ‘how good’ with re­spect to the ‘how likely’.

I don’t un­der­stand JB yet, but when I in­tro­spected just now, my ex­pe­rience of de­ci­sion-mak­ing doesn’t have any sep­a­ra­tion be­tween be­liefs and val­ues, so I think I dis­agree with the above. I’ll try to ex­plain why by de­scribing my ex­pe­rience. (Note: Long com­ment be­low is just say­ing one very sim­ple thing. Sorry for length. There’s a one-line tl;dr at the end.)

Right now I’m con­sid­er­ing do­ing three differ­ent things. I can go and play a videogame that my friend sug­gested we play to­gether, I can do some LW work with my col­league, or I can go play some gui­tar/​pi­ano. I feel like the videogame isn’t very fun right now be­cause I think the one my friend sug­gested not that in­ter­est­ing of a shared ex­pe­rience. I feel like the work is fun be­cause I’m ex­cited about pub­lish­ing the re­sults of the work, and the work it­self in­volves a kind of cog­ni­tion I en­joy. And play­ing pi­ano is fun be­cause I’ve been skil­ling up a lot lately and I’m go­ing to do ac­com­pany some of my house­mates in some hamil­ton songs.

Now, I know some likely ways that what seems valuable to me might change. There are other videogames I’ve played lately that have been re­ally fas­ci­nat­ing and re­ward­ing to play to­gether, that in­volve prob­lem solv­ing where 2 peo­ple can be cre­ative to­gether. I can imag­ine the work turn­ing out to not ac­tu­al­lybe the fun part but the bor­ing parts. I can imag­ine that I’ve found no trac­tion (skill-up) in play­ing pi­ano, or that we’re go­ing to use a recorded sound­track rather than my play­ing for the songs we’re learn­ing.

All of these to me feel like up­dates in my un­der­stand­ing of what events are reach­able to me; this doesn’t feel like chang­ing my util­ity eval­u­a­tion of the events. The event of “play videogame while friend watches bored” could change to “play videogame while cre­atively prob­lem-solv­ing with friend”. The event of “gain skill in pi­ano and then later perform songs well with friends” could change to “strug­gle to do some­thing difficult and sound bad and that’s it”.

If I think about chang­ing my util­ity func­tion, I ex­pect that would feel more like… well, I’m not sure. My straw ver­sion is “I cre­atively solve prob­lems with my friend on a videogame, but some­how that’s ob­jec­tively bad so I will not do it”. That’s where some vari­able in the util­ity func­tion changed while all the rest of the facts about my psy­chol­ogy and re­al­ity stay the same. This doesn’t feel to me like my reg­u­lar ex­pe­rience of de­ci­sion-mak­ing.

But, maybe that’s not the idea. The idea is like if I had some neu­rolog­i­cal change, per­haps I be­come more of a so­ciopath and stop feel­ing em­pa­thy and ev­ery­one just feels like ob­jects to me rather than al­ive. Then a bunch of the so­cial ex­pe­riences above would change, they’d lose any ex­pe­rience of things like vi­car­i­ous en­joy­ment and plea­sure of bond­ing with friends. Per­haps that’s what VNM is talk­ing about in my ex­pe­rience.

I think that some of the stan­dard “up­dates to my ethics /​ util­ity func­tion” ideas that peo­ple dis­cuss of­ten don’t feel like this to me. Like, some peo­ple say that re­flect­ing onf pop­u­la­tion ethics leads them to change their util­ity func­tion and start to care about the far fu­ture. That’s not my ex­pe­rience – for me it’s been things like the times in HPMOR when Harry thinks about civ­i­liza­tions of the fu­ture, what they’ll be like/​think, and how awe­some they can be. It feels real to me, like a reach­able state, and this is what has changed a lot of my be­havi­our, in con­trast with chang­ing some vari­able in a func­tion of world-states that’s in­de­pen­dent from my un­der­stand­ing of what events are achiev­able.

To be clear, some­times I de­scribe my ex­pe­rience more like the so­ciopath ex­am­ple, where my fun­da­men­tal in­ter­ests/​val­ues change. I say things like “I don’t en­joy videogames as much as I used to” or “Th­ese days I value hon­esty and re­li­a­bil­ity a lot more than po­lite­ness”, and there is a sense there where I now ex­pe­rience the same events very differ­ently. “I had a pos­i­tive meet­ing with John” might now be “I feel like he was be­ing eva­sive about the topic we were dis­cussing”. The things that are salient to me change. And I think that the lan­guage of “my val­ues have changed” is of­ten an effec­tive one for com­mu­ni­cat­ing that – even if my ex­pe­rience does not match be­liefs|util­ity, any suffi­ciently co­her­ent agent can be de­scribed this way, and it is of­ten easy to help oth­ers model me by de­scribing my val­ues as hav­ing changed.

But I think my in­ter­nal ex­pe­rience is more that I made sub­stan­tial up­dates about what events I’m mov­ing to­wards, and the event “We had a pleas­ant in­ter­ac­tion which will lead to use work­ing effec­tively to­gether” has changed to “We were not able to say the pos­si­bly un­wel­come facts of the mat­ter, which will lead to a world where we don’t work effec­tively to­gether”. So in­ter­nally it feels like an up­date about what events are reach­able, even though some­one from the out­side who doesn’t un­der­stand my in­ter­nal ex­pe­rience might more nat­u­rally say “It seems like Ben is treat­ing the same event differ­ently now, so I’ll model him as hav­ing changed his val­ues”.

tl;dr: While I of­ten talk sep­a­rately about what ac­tions I/​you/​we could take and how valuable those ac­tions are are, in­ter­nally when when I’m ‘eval­u­at­ing’ the ac­tions, I’m just try­ing to vi­su­al­ise what they are, and there is no sec­ond step of run­ning my util­ity func­tion on those vi­su­al­i­sa­tions.

As I say, I’m not sure I un­der­stand JB, so per­haps this is also in­con­sis­tent with it. I just read your com­ment and no­ticed it didn’t match my own in­tro­spec­tive ex­pe­rience, so I thought I’d share my ex­pe­rience.

• I agree that the con­sid­er­a­tions you men­tioned in your ex­am­ple are not changes in val­ues, and didn’t mean to im­ply that that sort of thing is a change in val­ues. In­stead, I just meant that such shifts in ex­pec­ta­tions are changes in prob­a­bil­ity dis­tri­bu­tions, rather than changes in events, since I think of such things in terms of how likely each of the pos­si­ble out­comes are, rather than just which out­comes are pos­si­ble and which are ruled out.

• Per­haps it goes with­out say­ing, but ob­vi­ously, both frame­works are flex­ible enough to al­low for most phe­nom­ena—the ques­tion here is what is more nat­u­ral in one frame­work or an­other.

My main ar­gu­ment is that the pro­cras­ti­na­tion para­dox is not nat­u­ral at all in a Sav­age frame­work, as it sug­gests an un­com­putable util­ity func­tion. I think this plau­si­bly out­weighs the is­sue you’re point­ing at.

But with re­spect to the is­sue you are point­ing at:

I try to think about what I ex­pect to hap­pen if I take that ac­tion (ie the out­come), and I think about how likely that out­come is to have var­i­ous prop­er­ties that I care about,

In the Sav­age frame­work, an out­come already en­codes ev­ery­thing you care about. So the com­pu­ta­tion which seems to be sug­gested by Sav­age is to think of these max­i­mally-speci­fied out­comes, as­sign­ing them prob­a­bil­ity and util­ity, and then com­bin­ing those to get ex­pected util­ity. This seems to be very de­mand­ing: it re­quires imag­in­ing these very de­tailed sce­nar­ios.

Alter­nately, we might say (as as Sav­age said) that the Sav­age ax­ioms ap­ply to “small wor­lds”—small sce­nar­ios which the agent ab­stracts from its ex­pe­rience, such as the de­ci­sion of whether to break an egg for an omelette. Th­ese can be eas­ily con­sid­ered by the agent, if it can as­sign val­ues “from out­side the prob­lem” in an ap­pro­pri­ate way.

But then, to ac­count for the breadth of hu­man rea­son­ing, it seems to me we also want an ac­count of things like ex­tend­ing a small world when we find that it isn’t suffi­cient, and co­her­ence be­tween differ­ent small-world frames for re­lated de­ci­sions.

This gives a pic­ture very much like the Jeffrey-Bolker pic­ture, in that we don’t re­ally work with out­comes which com­pletely spec­ify ev­ery­thing we care about, but rather, work with a va­ri­ety of sim­plified out­comes with co­her­ence re­quire­ments be­tween sim­pler and more com­plex views.

So over­all I think it is bet­ter to have some pic­ture where you can break things up in a more tractable way, rather than hav­ing full out­comes which you need to pass through to get val­ues.

In the Jeffrey-Bolker frame­work, you can re-es­ti­mate the value of an event by break­ing it up into pieces, es­ti­mat­ing the value and prob­a­bil­ity of each piece, and com­bin­ing them back to­gether. This pro­cess could be iter­ated in a man­ner similar to dy­namic pro­gram­ming in RL, to im­prove value es­ti­mates for ac­tions—al­though one needs to set­tle on a story about where the in­for­ma­tion origi­nally comes from. I cur­rently like the log­i­cal-in­duc­tion-like pic­ture where you get in­for­ma­tion com­ing in “some­how” (a broad va­ri­ety of feed­back is pos­si­ble, in­clud­ing ab­stract judge­ments about util­ity which are hard to cash out in spe­cific cases) and you try to make ev­ery­thing as co­her­ent as pos­si­ble in the mean­while.

• In the Sav­age frame­work, an out­come already en­codes ev­ery­thing you care about.

Yes, but if you don’t know which out­come is the true one, so you’re con­sid­er­ing a prob­a­bil­ity dis­tri­bu­tion over out­comes in­stead of a sin­gle out­come, then it still makes sense to speak of the prob­a­bil­ity that the true out­come has some fea­ture. This is what I meant.

So the com­pu­ta­tion which seems to be sug­gested by Sav­age is to think of these max­i­mally-speci­fied out­comes, as­sign­ing them prob­a­bil­ity and util­ity, and then com­bin­ing those to get ex­pected util­ity. This seems to be very de­mand­ing: it re­quires imag­in­ing these very de­tailed sce­nar­ios.

You do not need to be able to imag­ine ev­ery pos­si­ble out­come in­di­vi­d­u­ally in or­der to think of func­tions on or prob­a­bil­ity dis­tri­bu­tions over the set of out­comes, any more than I need to be able to imag­ine each in­di­vi­d­ual real num­ber in or­der to un­der­stand the func­tion or the stan­dard nor­mal dis­tri­bu­tion.

It seems that you’re go­ing by an anal­ogy like Jeffrey-Bolker : VNM :: events : out­comes, which is par­tially right, but leaves out an im­por­tant sense in which the cor­rect anal­ogy is Jeffrey-Bolker : VNM :: events : prob­a­bil­ity dis­tri­bu­tions, since al­though util­ity is defined on out­comes, the func­tion that is ac­tu­ally eval­u­ated is ex­pected util­ity, which is defined on prob­a­bil­ity dis­tri­bu­tions (this be­ing a dis­tinc­tion that does not ex­ist in Jeffrey-Bolker, but does ex­ist in my con­cep­tion of real-world hu­man de­ci­sion mak­ing).

• I’ve cu­rated this. This seems to me like an im­por­tant con­cep­tual step in un­der­stand­ing agency, the sub­jec­tive view is very in­ter­est­ing and sur­pris­ing to me. This has been writ­ten up very clearly and well, I ex­pect peo­ple to link back to this post quite a lot, and I’m re­ally ex­cited to read more posts on this. Thanks a lot Abram.

• First, I re­ally like this shift in think­ing, partly be­cause it moves the nee­dle to­ward an anti-re­al­ist po­si­tion, where you don’t even need to pos­tu­late an ex­ter­nal world (you prob­a­bly don’t see it that way, de­spite say­ing “Every­thing is a sub­jec­tive prefer­ence eval­u­a­tion”).

Se­cond, I won­der if you need an even stronger re­stric­tion, not just com­putable, but effi­ciently com­putable, given that it’s the agent that is do­ing the com­pu­ta­tion, not some the­o­ret­i­cal AIXI. This would prob­a­bly also change “too eas­ily” in “those ex­pec­ta­tions aren’t (too eas­ily) ex­ploitable to Dutch-book.” to effi­ciently. Maybe it should be even more re­stric­tive to avoid diminish­ing re­turns try­ing to squeeze ev­ery last bit of util­ity by spend­ing a lot of com­pute.

• First, I re­ally like this shift in think­ing, partly be­cause it moves the nee­dle to­ward an anti-re­al­ist po­si­tion, where you don’t even need to pos­tu­late an ex­ter­nal world (you prob­a­bly don’t see it that way, de­spite say­ing “Every­thing is a sub­jec­tive prefer­ence eval­u­a­tion”).

I definitely see it as a shift in that di­rec­tion, al­though I’m not ready to re­ally bite the bul­lets—I’m still feel­ing out what I per­son­ally see as the im­pli­ca­tions. Like, I want a re­al­ist-but-anti-re­al­ist view ;p

Se­cond, I won­der if you need an even stronger re­stric­tion, not just com­putable, but effi­ciently com­putable, given that it’s the agent that is do­ing the com­pu­ta­tion, not some the­o­ret­i­cal AIXI. This would prob­a­bly also change “too eas­ily” in “those ex­pec­ta­tions aren’t (too eas­ily) ex­ploitable to Dutch-book.” to effi­ciently.

Right, that’s very much what I’m think­ing.

• I definitely see it as a shift in that di­rec­tion, al­though I’m not ready to re­ally bite the bul­lets—I’m still feel­ing out what I per­son­ally see as the im­pli­ca­tions. Like, I want a re­al­ist-but-anti-re­al­ist view ;p

Well, we all ad­vance at our own pace. Ac­cept­ing that re­al­ity, truth and ex­is­tence are rel­a­tive and of­ten sub­jec­tive no­tions is not an easy step :) Or that there are var­i­ous de­grees of ex­is­tence.

• we need not as­sume there are “wor­lds” at all. … In math­e­mat­ics, it brings to mind pointless topol­ogy.

I don’t think the mo­ti­va­tion for this is quite the same as the mo­ti­va­tion for pointless topol­ogy, which is de­signed to mimic clas­si­cal topol­ogy in a way that Jeffrey-Bolker-style de­ci­sion the­ory does not mimic VNM-style de­ci­sion the­ory. In pointless topol­ogy, a con­tin­u­ous func­tion of lo­cales is a func­tion from the lat­tice of open sets of to the lat­tice of open sets of . So a similar thing here would be to treat a util­ity func­tion as a func­tion from some lat­tice of sub­sets of (the Borel sub­sets, for in­stance) to the lat­tice of events.

My un­der­stand­ing of the Jeffrey-Bolker frame­work is that its pri­mary differ­ence from the VNM frame­work is not its pointless­ness, but the fact that it comes with a prior prob­a­bil­ity dis­tri­bu­tion over out­comes, which can only be up­dated by con­di­tion­ing on events (i.e. up­dat­ing on ev­i­dence that has prob­a­bil­ity 1 in some wor­lds and prob­a­bil­ity 0 in the rest). VNM does not start out with a prior, and al­lows any prob­a­bil­ity dis­tri­bu­tion over out­comes to be com­pared to any other, and Jeffrey-Bolker only al­lows com­par­i­son of prob­a­bil­ity dis­tri­bu­tions ob­tained by con­di­tion­ing the prior on an event. Of course, this in­ter­pre­ta­tion re­quires a fair amount of read­ing be­tween the lines, since the Jeffrey-Bolker ax­ioms make no ex­plicit men­tion of any prob­a­bil­ity dis­tri­bu­tion, but I don’t see any other rea­son­able way to in­ter­pret them, since if asked which of two events is bet­ter, I will of­ten be un­able to an­swer with­out fur­ther in­for­ma­tion, since the events may con­tain wor­lds of widely vary­ing util­ity. As­so­ci­at­ing an event with a fixed prior con­di­tioned on the event gives me this ad­di­tional in­for­ma­tion needed to an­swer the ques­tion, and I don’t see how any oth­ers could work. Start­ing with a prior that gets con­di­tioned on events that cor­re­spond to the agent’s ac­tions seems to build in ev­i­den­tial de­ci­sion the­ory as an as­sump­tion, which makes me sus­pi­cious of it.

In the Jeffrey-Bolker treat­ment, a world is just a max­i­mally spe­cific event: an event which de­scribes ev­ery­thing com­pletely. But there is no re­quire­ment that max­i­mally-spe­cific events ex­ist.

This can be re­solved by defin­ing wor­lds to be min­i­mal non-zero el­e­ments of the com­ple­tion of the Boolean alge­bra of events, rather than a min­i­mal non-zero event. This is what you seemed to be im­plic­itly do­ing later with the in­finite bit­strings ex­am­ple, where the events were clopen sub­sets of Can­tor space (i.e. sets of in­finite bit­strings such that mem­ber­ship in the set only de­pends on finitely many bits), and this Boolean alge­bra has no min­i­mal non-zero el­e­ments (max­i­mally-spe­cific events), but the min­i­mal non-zero el­e­ments of its com­ple­tion cor­re­spond to in­finite bit­strings, as de­sired.

• Of course, this in­ter­pre­ta­tion re­quires a fair amount of read­ing be­tween the lines, since the Jeffrey-Bolker ax­ioms make no ex­plicit men­tion of any prob­a­bil­ity dis­tri­bu­tion, but I don’t see any other rea­son­able way to in­ter­pret them,

Part of the point of the JB ax­ioms is that prob­a­bil­ity is con­structed to­gether with util­ity in the rep­re­sen­ta­tion the­o­rem, in con­trast to VNM, which con­structs util­ity via the rep­re­sen­ta­tion the­o­rem, but takes prob­a­bil­ity as ba­sic.

This makes Sav­age a bet­ter com­par­i­son point, since the Sav­age ax­ioms are more similar to the VNM frame­work while also try­ing to con­struct prob­a­bil­ity and util­ity to­gether with one rep­re­sen­ta­tion the­o­rem.

VNM does not start out with a prior, and al­lows any prob­a­bil­ity dis­tri­bu­tion over out­comes to be com­pared to any other, and Jeffrey-Bolker only al­lows com­par­i­son of prob­a­bil­ity dis­tri­bu­tions ob­tained by con­di­tion­ing the prior on an event.

As a rep­re­sen­ta­tion the­o­rem, this makes VNM weaker and JB stronger: VNM re­quires stronger as­sump­tions (it re­quires that the prefer­ence struc­ture in­clude in­for­ma­tion about all these prob­a­bil­ity-dis­tri­bu­tion com­par­i­sons), where JB only re­quires prefer­ence com­par­i­son of events which the agent sees as real pos­si­bil­ities. A similar re­mark can be made of Sav­age.

Start­ing with a prior that gets con­di­tioned on events that cor­re­spond to the agent’s ac­tions seems to build in ev­i­den­tial de­ci­sion the­ory as an as­sump­tion, which makes me sus­pi­cious of it.

Right, that’s fair. Although: James Joyce, the big CDT ad­vo­cate, is quite the Jeffrey-Bolker fan! See Why We Still Need the Logic of De­ci­sion for his rea­sons.

I don’t think the mo­ti­va­tion for this is quite the same as the mo­ti­va­tion for pointless topol­ogy, which is de­signed to mimic clas­si­cal topol­ogy in a way that Jeffrey-Bolker-style de­ci­sion the­ory does not mimic VNM-style de­ci­sion the­ory. [...] So a similar thing here would be to treat a util­ity func­tion as a func­tion from some lat­tice of sub­sets of (the Borel sub­sets, for in­stance) to the lat­tice of events.

Doesn’t pointless topol­ogy al­low for some dis­tinc­tions which aren’t mean­ingful in point­ful topol­ogy, though? (I’m not re­ally very fa­mil­iar, I’m just go­ing off of some­thing I’ve heard.)

Isn’t the ap­proach you men­tion pretty close to JB? You’re not mod­el­ing the VNM/​Sav­age thing of ar­bi­trary gam­bles; you’re just as­sign­ing val­ues (and prob­a­bil­ities) to events, like in JB.

Set­ting aside VNM and Sav­age and JB, and con­sid­er­ing the most com­mon ap­proach in prac­tice—use the Kol­mogorov ax­ioms of prob­a­bil­ity, and treat util­ity as a ran­dom vari­able—it seems like the pointless analogue would be close to what you say.

This can be re­solved by defin­ing wor­lds to be min­i­mal non-zero el­e­ments of the com­ple­tion of the Boolean alge­bra of events, rather than a min­i­mal non-zero event.

Yeah. The ques­tion re­mains, though: should we think of util­ity as a func­tion of these min­i­mal el­e­ments of the com­ple­tion? Or not? The com­putabil­ity is­sue I raise is, to me, sug­ges­tive of the nega­tive.

• This makes Sav­age a bet­ter com­par­i­son point, since the Sav­age ax­ioms are more similar to the VNM frame­work while also try­ing to con­struct prob­a­bil­ity and util­ity to­gether with one rep­re­sen­ta­tion the­o­rem.

Sure, I guess I just always talk about VNM in­stead of Sav­age be­cause I never both­ered to learn how Sav­age’s ver­sion works. Per­haps I should.

As a rep­re­sen­ta­tion the­o­rem, this makes VNM weaker and JB stronger: VNM re­quires stronger as­sump­tions (it re­quires that the prefer­ence struc­ture in­clude in­for­ma­tion about all these prob­a­bil­ity-dis­tri­bu­tion com­par­i­sons), where JB only re­quires prefer­ence com­par­i­son of events which the agent sees as real pos­si­bil­ities.

This might be true if we were ideal­ized agents who do Bayesian up­dat­ing perfectly with­out any com­pu­ta­tional limi­ta­tions, but as it is, it seems to me that the as­sump­tion that there is a fixed prior is un­rea­son­ably de­mand­ing. Peo­ple some­times up­date prob­a­bil­ities based purely on fur­ther thought, rather than em­piri­cal ev­i­dence, and a frame­work in which there is a fixed prior which gets con­di­tioned on events, and ban­ishes dis­cus­sion of any other prob­a­bil­ity dis­tri­bu­tions, would seem to have some trou­ble han­dling this.

Doesn’t pointless topol­ogy al­low for some dis­tinc­tions which aren’t mean­ingful in point­ful topol­ogy, though?

Sure, for in­stance, there are many dis­tinct lo­cales that have no points (only one of which is the empty lo­cale), whereas there is only one or­di­nary topolog­i­cal space with no points.

Isn’t the ap­proach you men­tion pretty close to JB? You’re not mod­el­ing the VNM/​Sav­age thing of ar­bi­trary gam­bles; you’re just as­sign­ing val­ues (and prob­a­bil­ities) to events, like in JB.

As­sum­ing you’re refer­ring to “So a similar thing here would be to treat a util­ity func­tion as a func­tion from some lat­tice of sub­sets of (the Borel sub­sets, for in­stance) to the lat­tice of events”, no. In JB, the set of events is the do­main of the util­ity func­tion, and in what I said, it is the codomain.

• I think that com­putable is ob­vi­ously too strong a con­di­tion for clas­si­cal util­ity; enu­mer­able is bet­ter.

Imag­ine you’re about to see the source code of a ma­chine that’s run­ning, and if the ma­chine even­tu­ally halts then 2 utilons will be gen­er­ated. That’s a sim­pler prob­lem to rea­son about than the pro­cras­ti­na­tion para­dox, and your util­ity func­tion is enu­mer­able but not com­putable. (Like­wise, log­i­cal in­duc­tors ob­vi­ously don’t make PA ap­prox­i­mately com­putable, but their prop­er­ties are what you’d want the defi­ni­tion of ap­prox­i­mately enu­mer­able to be, if any such defi­ni­tion were stan­dard.)

I sus­pect that the pro­cras­ti­na­tion para­dox leans heav­ily on the com­putabil­ity re­quire­ment as well.

• I’m not sure what it would mean for a real-val­ued func­tion to be enu­mer­able. You could call a func­tion enu­mer­able if there’s a pro­gram that takes as in­put and enu­mer­ates the ra­tio­nals that are less than , but I don’t think this is what you want, since pre­sum­ably if a Tur­ing ma­chine halt­ing can gen­er­ate a pos­i­tive amount of util­ity that doesn’t de­pend on the num­ber of steps taken be­fore halt­ing, then it could gen­er­ate a nega­tive amount of util­ity by halt­ing as well.

I think ac­cept­ing the type of rea­son­ing you give sug­gests that limit-com­putabil­ity is enough (ie there’s a pro­gram that takes and pro­duces a se­quence of ra­tio­nals that con­verges to , with no guaran­tees on the rate of con­ver­gence). Though I don’t agree that it’s ob­vi­ous we should ac­cept such util­ity func­tions as valid.

• I mean the sort of “even­tu­ally ap­prox­i­mately con­sis­tent over com­putable pat­terns” thing ex­hibited by log­i­cal in­duc­tors, which is stronger than limit-com­putabil­ity.

• It’s not clear to me what this means in the con­text of a util­ity func­tion.

• Let’s talk first about non-em­bed­ded agents.

Say that I’m given the speci­fi­ca­tion of a Tur­ing ma­chine, and I have a com­putable util­ity map­ping from out­put states (in­clud­ing “does not halt”) to [0,1]. We pre­sum­ably agree that is pos­si­ble.

I agree that it’s im­pos­si­ble to make a com­putable map­ping from Tur­ing ma­chines to out­comes, so there­fore I can­not have a com­putable util­ity func­tion from TMs to the re­als which as­signs the same value to any two TMs with iden­ti­cal out­put.

But I can have a log­i­cal in­duc­tor which, for each TM, pro­duces a se­quence of pre­dic­tions about that TM’s out­put’s util­ity. Every TM that halts will even­tu­ally get the cor­rect util­ity, and ev­ery TM that doesn’t will con­verge to some util­ity in [0,1], with the usual prop­er­ties for log­i­cal in­duc­tors guaran­tee­ing that TMs eas­ily proven to have the same out­put will con­verge to the same num­ber, etc.

That’s a com­putable se­quence of util­ity func­tions over TMs with asymp­totic good prop­er­ties. At any stage, I could stop and tell you that I choose some par­tic­u­lar TM as the best one as it seems to me now.

I haven’t re­ally thought in a long while about ques­tions like “do log­i­cal in­duc­tors’ good prop­er­ties of self-pre­dic­tion mean that they could avoid the pro­cras­ti­na­tion para­dox”, so I could be talk­ing non­sense there.

• I think we’re go­ing to have to back up a bit. Call the space of out­comes and the space of Tur­ing ma­chines . It sounds like you’re talk­ing about two func­tions, and . I was think­ing of as the util­ity func­tion we were talk­ing about, but it seems you were think­ing of .

You sug­gested should be com­putable but should not be. It seems to me that should cer­tainly be com­putable (with the caveat that it might be a par­tial func­tion, rather than a to­tal func­tion), as com­pu­ta­tion is the only thing Tur­ing ma­chines do, and that if non-halt­ing is in­cluded in a space of out­comes (so that is to­tal), it should be rep­re­sented as some sort of limit of par­tial in­for­ma­tion, rather than rep­re­sented ex­plic­itly, so that is con­tin­u­ous.

In any case, a slight gen­er­al­iza­tion of Rice’s the­o­rem tells us that any com­putable func­tion from Tur­ing ma­chines to re­als that de­pends only of the ma­chine’s se­man­tics must be con­stant, so I sup­pose I’m forced to agree that, if we want a util­ity func­tion that is defined on all Tur­ing ma­chines and de­pends only on their se­man­tics, then at least one of or should be un­com­putable. But I guess I have to ask why we would want to as­sign util­ities to Tur­ing ma­chines.

• I’ve been us­ing com­putable to mean a to­tal func­tion (each in­stance is com­putable in finite time).

I’m think­ing of an agent out­side a uni­verse about to take an ac­tion, and each ac­tion will cause that uni­verse to run a par­tic­u­lar TM. (You could maybe frame this as “the agent chooses the tape for the TM to run on”.) For me, this is analo­gous to act­ing in the world and caus­ing the world to shift to­ward some out­comes over oth­ers.

By as­sert­ing that U should be the com­putable one, I’m as­sert­ing that “how much do I like this out­come” is a more tractable ques­tion than “which ac­tions re­sult in this out­come”.

An in­tu­ition pump in a hu­man set­ting:

I can check whether given states of a Go board are vic­to­ries for one player or the other, or if the game is not yet finished (this is analo­gous to U be­ing a to­tal com­putable func­tion). But it’s much more difficult to choose, for an un­finished game where I’m told I have a win­ning strat­egy, a move such that I still have a win­ning strat­egy. The best I can re­ally do as a hu­man is calcu­late a bit and then guess at how the leaves will prob­a­bly re­solve if we go down them (this is analo­gous to eval be­ing an enu­mer­able but not nec­es­sar­ily com­putable func­tion).

In gen­eral, in­di­vi­d­ual hu­mans are much bet­ter at figur­ing out what out­comes we want than we are at figur­ing out ex­actly how to achieve those out­comes. (It would be quite weird if the op­po­site were the case.) We’re not good at ei­ther in an ab­solute sense, of course.

• 2 points about how I think about this that differs sig­nifi­cantly. (I just read up on Bolker and Jeffrey, as I was pre­vi­ously un­fa­mil­iar.) I had been think­ing about writ­ing this up more fully, but have been busy. (i.e. if peo­ple think it’s worth­while, tell me and I will be more likely do so.)

First, util­ity is only ever com­puted over mod­els of re­al­ity, not over re­al­ity it­self, be­cause it is a part of the de­ci­sion mak­ing pro­cess, not di­rectly about any self-mon­i­tor­ing or feed­back pro­cess. It is never re­ally eval­u­ated against re­al­ity, nor does it need to be. Ev­i­dence for this in hu­mans is that peo­ple suck at ac­tu­ally notic­ing how they feel, what they like, etc. The up­dat­ing of their world model is a pro­cess that hap­pens alongside plan­ning and de­ci­sion mak­ing, and is only some­times ac­tively a tar­get of max­i­miz­ing util­ity be­cause peo­ple’s model can in­clude cor­re­spon­dence with re­al­ity as a goal. Many peo­ple sim­ply don’t do this, or care about map/​re­al­ity cor­re­spon­dence. They are very un­likely to read or re­spond to posts here, but any model of hu­mans should ac­count for their ex­is­tence, and the likely claim that their brains work the same way other peo­ple’s brains do.

Se­cond, Jeffrey’s “News Value” is how he fits in a re­la­tion­ship be­tween util­ity and re­al­ity. As men­tioned, for many peo­ple their map barely cor­re­sponds to the ter­ri­tory, and they don’t seem to suffer much. (Well, un­less an ex­ter­nal event im­poses it­self on them in a way that af­fects them in the pre­sent. And even then, how of­ten do they up­date their model?) So I don’t think Jeffrey is right. In­stead, I don’t think an agent could be said to “have” util­ity at all—util­ity max­i­miza­tion is a pro­cess, never an eval­u­ated goal. The only rea­son re­al­ity mat­ters is be­cause it pro­vides feed­back to the model over which peo­ple eval­u­ate util­ity, not be­cause util­ity is lost or gained. I think this also partly ex­plains hap­piness set points—as a point of notic­ing re­al­ity, hu­mans are mo­ti­vated by an­ti­ci­pated re­ward more than re­ward. I think the model I pro­pose makes this ob­vi­ous, in­stead of sur­pris­ing.

• Thank you for this.

Your char­ac­ter­i­za­tion of Re­duc­tive Utility matches very well my own ex­pe­rience in philo­soph­i­cal dis­cus­sion about util­i­tar­i­anism. Most of my in­ter­locu­tors ob­ject that I am propos­ing a re­duc­tive util­ity no­tion which suffers from in­com­putabil­ity (which is es­sen­tially how An­scombe dis­missed it all in one para­graph, putting gen­er­a­tions of philoso­phers pit­ted eter­nally against any form of con­se­quen­tial­ism).

How­ever, I always thought it was ob­vi­ous that one need not be­lieve that ob­jects and moral think­ing must be de­rived from ever lower lev­els of world states.

What do you think are the down­stream effects of hold­ing Re­duc­tive Utility Func­tion the­ory?

I’m think­ing the so­cial effects of RUF is more com­part­men­tal­iza­tion of do­mains be­cause from an agent per­spec­tive their con­ti­nu­ity is in­com­putable, does that make sense?

• I do not think you are sel­l­ing a straw­man, but the no­tion that a util­ity func­tion should be com­putable seems to me to be com­pletely ab­surd. It seems like a con­fu­sion born from not un­der­stand­ing what com­putabil­ity means in prac­tice.

Say I have a com­puter that will simu­late an ar­bi­trary Tur­ing ma­chine T, and will award me one utilon when that ma­chine halts, and do noth­ing for me un­til that hap­pens. With some clever cryp­tocur­rency scheme, this is a sce­nario I could ac­tu­ally build to­day. My util­ity func­tion ought plau­si­bly to have a term in it that as­signs a pos­i­tive value to the com­puter simu­lat­ing a halt­ing Tur­ing ma­chine, and zero to the com­puter simu­lat­ing a non-halt­ing Tur­ing ma­chine. Yet the as­sump­tion of util­ity func­tion com­putabil­ity would rule out this very sen­si­ble de­sire struc­ture.

If I live in a Con­way’s Game of Life uni­verse, there may be some chunk of uni­verse some­where that will even­tu­ally end up de­stroy­ing all life (in the biolog­i­cal sense, not the Game of Life sense) in my uni­verse. I as­sign lower util­ity to uni­verses where this is the case, than to those were it is not. Is that com­putable? No.

More pro­saically, as far as I cur­rently un­der­stand, the uni­verse we ac­tu­ally live in seems to be con­tin­u­ous in na­ture, and its state may not be de­scrib­able even in prin­ci­ple with a finite num­ber of bits. And even if it is, I do not ac­tu­ally know this, which means my util­ity func­tion is also over po­ten­tial uni­verses (which, as far as I know, might be the one I live in) that re­quire an in­finite amount of state bits. Why in the world would one ex­pect a util­ity func­tion over an un­countable do­main to be com­putable?

As far as I can see, the mo­ti­va­tion for re­quiring a util­ity func­tion to be com­putable is that this would make op­ti­miza­tion for said util­ity func­tion to be a great deal eas­ier. Cer­tainly this is true; there are pow­er­ful op­ti­miza­tion tech­niques that ap­ply only to com­putable util­ity func­tions, that an op­ti­mizer with an un­com­putable util­ity func­tion does not have ac­cess to in their full form. But the util­ity func­tion is not up for grabs; the fact that life will be eas­ier for me if I want a cer­tain thing, should not be taken as an in­di­ca­tion that that is want I want! This seems to me like the cart-be­fore-horse er­ror of try­ing to in­ter­pret the prob­lem as one that is eas­ier to solve, rather than the prob­lem one ac­tu­ally wants solved.

One ar­gu­ment is that U() should be com­putable be­cause the agent has to be able to use it in com­pu­ta­tions. If you can’t eval­u­ate U(), how are you sup­posed to use it? If U() ex­ists as an ac­tual mod­ule some­where in the brain, how is it sup­posed to be im­ple­mented?

This line of thought here illus­trates very well the (I claim) grossly mis­taken in­tu­ition for as­sum­ing com­putabil­ity. If you can’t eval­u­ate U() perfectly, then per­haps what your brain is do­ing is only an ap­prox­i­ma­tion of what you re­ally want, and per­haps the same con­straint will hold for any greater mind that you can de­vise. But that does not mean that what your brain is op­ti­miz­ing for is nec­es­sar­ily what it ac­tu­ally wants! There is no re­quire­ment at all that your brain is a perfect judge of the de­sir­a­bil­ity of the world it’s look­ing at, af­ter all (and we know for a fact that it does a far from perfect job at this).

• Say I have a com­puter that will simu­late an ar­bi­trary Tur­ing ma­chine T, and will award me one utilon when that ma­chine halts, and do noth­ing for me un­til that hap­pens. With some clever cryp­tocur­rency scheme, this is a sce­nario I could ac­tu­ally build to­day.

No, you can’t do that to­day. You could pro­duce a con­trap­tion that will de­posit 1 BTC into a cer­tain bit­coin wallet if and when some com­puter pro­gram halts, but this won’t do the wallet’s owner much good if they die be­fore the pro­gram halts. If you re­flect on what it means to award some­one a utilon, rather than a bit­coin, I main­tain that it isn’t ob­vi­ous that this is even pos­si­ble in the­ory.

Why in the world would one ex­pect a util­ity func­tion over an un­countable do­main to be com­putable?

There is a no­tion of com­putabil­ity in the con­tin­u­ous set­ting.

As far as I can see, the mo­ti­va­tion for re­quiring a util­ity func­tion to be com­putable is that this would make op­ti­miza­tion for said util­ity func­tion to be a great deal eas­ier.

This seems like a straw­man to me. A bet­ter mo­ti­va­tion would be that agents that ac­tu­ally ex­ist are com­putable, and a util­ity func­tion is de­ter­mined by judge­ments ren­dered by the agent, which is in­ca­pable of think­ing un­com­putable thoughts.

• Clearly, there is a kind of util­ity func­tion ac­tion that is com­putable. Clearly the kind of UF that is defined in terms of prefer­ences over fine-grained world-states isn’t com­putable. So, clearly, “util­ity func­tion” is be­ing used to mean differ­ent things.

• That seems to con­flate two differ­ent things: whether you can com­pute the oc­cur­rence of event E, as op­posed to whether you could com­pute your prefer­ence for E over not E.

• We can’t tell we’re in the all-zero uni­verse by ex­am­in­ing any finite num­ber of bits.

What does it mean for the all-zero uni­verse to be in­finite, as op­posed to not be­ing in­finite? Finite uni­verses have a finite num­ber of bits of in­for­ma­tion de­scribing them (This doesn’t ac­tu­ally negate the point that un­com­putable util­ity func­tions ex­ist, merely that util­ity func­tions that care whether they are in a mostly-empty vs perfectly empty uni­verse are a weak ex­am­ple.

Th­ese prefer­ences are re­quired to be co­her­ent with break­ing things up into sums, so U(E) = U(E∧A)⋅P(E∧A)+U(E∧¬A)⋅P(E∧¬A)/​P(E) -- but we do not define one from the other.

What hap­pens if the au­thor/​definer of U(E) is wrong about the prob­a­bil­ities? If U(E) is not defined from, nor defined by, the value of its sums, what bad stuff hap­pens if they aren’t equal? Con­sider the dyslexic telekinetic at a roulette table, who places a chip on 6, but thinks he placed the chip on 9; Propo­si­tion A is “I will win if the ball lands in the ‘9’ cup (or “I have bet on 9”, or all such similar propo­si­tions), and event E is that agent ex­er­cis­ing their telekine­sis to cause the ball to land in the 9 cup. (Put­ting de­ci­sions and ac­tions in the hy­po­thet­i­cal to avoid a pas­sive agent)

Is that agent merely *mis­taken* about the value of U(E), as a re­sult of their er­ror on P(A) and fol­low­ing the ap­pro­pri­ate math? Does their er­ror re­sult in a ma­jor change in their util­ity _func­tion_ _com­pu­ta­tion_ mea­sure­ment when they cor­rect their er­ror? Is it con­sid­ered safe for an agent to jus­tify cas­cad­ing ma­jor changes in util­ity mea­sure­ment over many (liter­ally all?) events af­ter up­dat­ing a prob­a­bil­ity?

An in­stan­ti­ated en­tity (one that ex­ists in a world) can only know of events E where such events are ei­ther ob­ser­va­tions that they make, or de­ci­sions that they make; I see flaws with an agent who sets forth ac­tions that it be­lieves suffi­cient to bring about a de­sired out­come and then feels satis­fied that it is done, and also with an agent that is seek­ing spoofable ob­ser­va­tions about that de­sired out­come (in par­tic­u­lar, the kind of dy­namic where agents will seek ev­i­dence that tends to con­firm de­sir­able event E, be­cause that ev­i­dence makes the agent happy, and ev­i­dence against E makes the agent sad, so they avoid such ev­i­dence).

• What hap­pens if the au­thor/​definer of U(E) is wrong about the prob­a­bil­ities? If U(E) is not defined from, nor defined by, the value of its sums, what bad stuff hap­pens if they aren’t equal?

Ul­ti­mately, I am ad­vo­cat­ing a log­i­cal-in­duc­tion like treat­ment of this kind of thing.

• Ini­tial val­ues are based on a kind of “prior”—a dis­tri­bu­tion of money across traders.

• Values are ini­tially in­con­sis­tent (in­deed, they’re always some­what in­con­sis­tent), but, be­come more con­sis­tent over time as a re­sult of traders cor­rect­ing in­con­sis­ten­cies. The traders who are bet­ter at this get more money, while the chron­i­cally in­con­sis­tent traders lose money and even­tu­ally don’t have in­fluence any more.

• Ev­i­dence of all sorts can come into the sys­tem, at any time. The sys­tem might sud­denly get in­for­ma­tion about the util­ity of some hy­po­thet­i­cal ex­am­ple, or a log­i­cal propo­si­tion about util­ity, what­ever. It can be ar­bi­trar­ily difficult to con­nect this ev­i­dence to prac­ti­cal cases. How­ever, the traders work to re­duce in­con­sis­ten­cies through­out the whole sys­tem, and there­fore, ev­i­dence gets prop­a­gated more or less as well as it can be.

• There is at least one ma­jor step that I did not know of, be­tween the things I think I un­der­stand and a mar­ket that has cur­rency and traders.

I un­der­stand how a mar­ket of traders can re­sult in a con­sen­sus eval­u­a­tion of prob­a­bil­ity, be­cause there is a *cor­rect* eval­u­a­tion of the prob­a­bil­ity of a propo­si­tion. How does a mar­ket of traders re­sult in a con­sen­sus eval­u­a­tion of the util­ity of an event? If two traders dis­agree about whether to pull the lever, how is it de­ter­mined which one gets the cur­rency?

• The mechanism is the same in both cases:

• Shares in the event are bought and sold on the mar­ket. The share will pay out $1 if the event is true. The share can also be shorted, in which case the shorter gets$1 if the event turns out false. The over­all price equil­ibrates to a prob­a­bil­ity for the event.

• There are sev­eral ways to han­dle util­ity. One way is to make bets about whether the util­ity will fall in par­tic­u­lar ranges. Another way is for the mar­ket to di­rectly con­tain shares of util­ity which can be pur­chased (and shorted). Th­ese pay out $U, what­ever the util­ity ac­tu­ally turns out to be—traders give it an ac­tual price by spec­u­lat­ing on what the even­tual value will be. In ei­ther case, we would then as­sign ex­pected util­ity to events via con­di­tional bet­ting. If we want do do re­ward-learn­ing in a setup like this, the (dis­counted) re­wards can be in­cre­men­tal pay­outs of the U shares. But note that even if there is no feed­back of any kind (IE, the shares of U never ac­tu­ally pay out), the shares equil­ibrate to a sub­jec­tive value on the mar­ket—like col­lec­tor’s items. But the mar­ket still forces the changes in value over time to be in­creas­ingly co­her­ent, and the con­di­tional be­liefs about it to be in­creas­ingly co­her­ent. This cor­re­sponds to fully sub­jec­tive util­ity with no out­side feed­back. If two traders dis­agree about whether to pull the lever, how is it de­ter­mined which one gets the cur­rency? They make bets about what hap­pens if the lever is or isn’t pul­led (in­clud­ing con­di­tional buys/​sells of shares of util­ity). Th­ese bets will be eval­u­ated as nor­mal. In this setup we only get feed­back on whichever ac­tion ac­tu­ally hap­pens—but, this may still be enough data to learn un­der cer­tain as­sump­tions (which I hope to dis­cuss in a fu­ture post). We can also con­sider more ex­otic set­tings in which we do get feed­back on both cases even though only one hap­pens; this could be fea­si­ble through hu­man feed­back about coun­ter­fac­tu­als. (I also hope to dis­cuss this al­ter­na­tive in a fu­ture post.) • Sup­pose the util­ity trad­ing com­mis­sion dis­cov­ered that a trader used for­bid­den meth­ods to short a util­ity bet (e.g. in­sider trad­ing, co­erc­ing other traders, ex­ploit­ing a flaw in the mar­ket­place), and takes ac­tion to con­fis­cate the illicit gains. What ac­tions trans­fer util­ity from the tar­get? (In sys­tems that pay out money, their bank ac­count is deb­ited; in sys­tems that use blockchain, trans­ac­tions are added or rol­led back man­u­ally) what does it mean to take util­ity from a trader di­rectly? • What does it mean for the all-zero uni­verse to be in­finite, as op­posed to not be­ing in­finite? Finite uni­verses have a finite num­ber of bits of in­for­ma­tion de­scribing them (This doesn’t ac­tu­ally negate the point that un­com­putable util­ity func­tions ex­ist, merely that util­ity func­tions that care whether they are in a mostly-empty vs perfectly empty uni­verse are a weak ex­am­ple. What it means here is pre­cisely that it is de­scribed by an in­finite num­ber of bits—speci­fi­cally, an in­finite num­ber of ze­ros! Granted, we could try to re­or­ga­nize the way we de­scribe the uni­verse so that we have a short code for that world, rather than an in­finitely long one. This be­comes a fairly sub­tle is­sue. I will say a cou­ple of things: First, it seems to me like the re­duc­tion­ist may want to ob­ject to such a re­or­ga­ni­za­tion. In the re­duc­tive view, it is im­por­tant that there is a spe­cial de­scrip­tion of the uni­verse, in which we have iso­lated the ac­tual ba­sic facts of re­al­ity—things re­sem­bling par­ti­cle po­si­tion and mo­men­tum, or what-have-you. Se­cond, I challenge you to pro­pose a de­scrip­tion lan­guage which (a) makes the pro­cras­ti­na­tion ex­am­ple com­putable, (b) maps all wor­lds onto a de­scrip­tion, and (c) does not cre­ate any in­valid in­put tapes. For ex­am­ple, I can make a mod­ified uni­verse-de­scrip­tion in which the first bit is ‘1’ if the but­ton ever gets pressed. The rest of the de­scrip­tion re­mains as be­fore, plac­ing a ‘1’ at time-steps when the but­ton is pressed (but offset by one place, to al­low for the ex­tra ini­tial bit). So see­ing ‘0’ right away tells me I’m in the but­ton-never-pressed world; it now has a 1-bit de­scrip­tion, rather than an in­finite-bit de­scrip­tion. HOWEVER, this de­scrip­tion lan­guage in­cludes a de­scrip­tion which does not cor­re­spond to any world, and is there­fore in­valid: the string which starts with ‘1’ but then con­tains only ze­ros for­ever. This is­sue has a va­ri­ety of po­ten­tial replies/​im­pli­ca­tions—I’m not say­ing the situ­a­tion is clear. I didn’t get into this kind of thing in the post be­cause it seems like there are just too many things to say about it, with no to­tally clear path. • The uni­verse that is de­scribed by an in­finite string of ze­roes differs from the uni­verse that is de­scribed by the empty string in what man­ner? • Planned sum­mary for the Align­ment Newslet­ter: How might we the­o­ret­i­cally ground util­ity func­tions? One ap­proach could be to view the pos­si­ble en­vi­ron­ments as a set of uni­verse his­to­ries (e.g. a list of the po­si­tions of all quarks, etc. at all times), and a util­ity func­tion as a func­tion that maps these uni­verse his­to­ries to real num­bers. We might want this util­ity func­tion to be com­putable, but this elimi­nates some plau­si­ble prefer­ences we might want to rep­re­sent. For ex­am­ple, in the pro­cras­ti­na­tion para­dox, the sub­ject prefers to push the but­ton as late as pos­si­ble, but dis­prefers never press­ing the but­ton. If the his­tory is in­finitely long, no com­putable func­tion can know for sure that the but­ton was never pressed: it’s always pos­si­ble that it was pressed at some later day. In­stead, we could use _sub­jec­tive util­ity func­tions_, which are defined over _events_, which is ba­si­cally any­thing you can think about (i.e. it could be chairs and ta­bles, or quarks and strings). This al­lows us to have util­ity func­tions over high level con­cepts. In the pre­vi­ous ex­am­ple, we can define an event “never presses the but­ton”, and rea­son about that event atom­i­cally, sidestep­ping the is­sues of com­putabil­ity. We could go fur­ther and view _prob­a­bil­ities_ as sub­jec­tive (as in the Jeffrey-Bolkor ax­ioms), and only re­quire that our be­liefs are up­dated in such a way that we can­not be Dutch-booked. This is the per­spec­tive taken in log­i­cal in­duc­tion. • The re­quire­ment about com­putabil­ity: But what about the all-zero uni­verse, 0000000...? The pro­gram must loop for­ever. We can’t tell we’re in the all-zero uni­verse by ex­am­in­ing any finite num­ber of bits. You don’t know whether you will even­tu­ally push the but­ton. An in­finite loop may be a para­dox. Per­haps the para­dox ex­ists only be­cause of the in­finity, or some con­fu­sion stem­ming from it or how it is used?* What is the differ­ence be­tween 0.9999 that goes on for­ever, and 1? In the real num­bers, 0. How do you de­ter­mine this? If you know the pro­cess gen­er­at­ing the num­bers you can tell. Prac­ti­cally? 1. If only a finite num­ber of digits is rele­vant to your de­ci­sion it doesn’t mat­ter. (Ad­di­tion­ally, if a the­ory isn’t falsifi­able, a) should we con­sider the hy­poth­e­sis?, and b) is there lower hang­ing fruit we should pick be­fore try­ing to solve a po­ten­tially un­solv­able prob­lem?) 2. Wait. Where did you get an in­finite num­ber of bits (which you are un­able to an­a­lyze be­cause they are in­finite) from? (Com­putabil­ity sounds nice, but ab­sent ar­bi­trar­ily large com­put­ing re­sources (i.e. in­finite), in this uni­verse, past a cer­tain point, com­putabil­ity don’t seem to ex­ist in a prac­ti­cal sense.) *It isn’t nec­es­sar­ily clear that the en­vi­ron­ment must be com­putable. (Even if there is some proof of this, an agent un­aware of the proof a) must func­tion with­out it, b) de­cide whether it is worth in­vest­ing the time to try and find/​cre­ate it.) • One ar­gu­ment is that U() should be com­putable be­cause the agent has to be able to use it in com­pu­ta­tions. This per­spec­tive is es­pe­cially ap­peal­ing if you think of U() as a black-box func­tion which you can only op­ti­mize through search. If you can’t eval­u­ate U(), how are you sup­posed to use it? If U() ex­ists as an ac­tual mod­ule some­where in the brain, how is it sup­posed to be im­ple­mented? This seems like a weak ar­gu­ment. If I think about a hu­man try­ing to achieve some goal in prac­tice, “think of U() as a black-box func­tion which you can only op­ti­mize through search” doesn’t re­ally de­scribe how we typ­i­cally rea­son. I would say that we op­ti­mize for things we can’t eval­u­ate all the time—it’s our de­fault mode of thought. We don’t need to eval­u­ate U() in or­der to de­cide which of two op­tions yields higher U(). Ex­am­ple: sup­pose I’m a gen­eral try­ing to max­i­mize my side’s chance of win­ning a war. Can I eval­u­ate the prob­a­bil­ity that we win, given all of the in­for­ma­tion available to me? No—fully ac­count­ing for ev­ery lit­tle piece of info I have is way be­yond my com­pu­ta­tional ca­pa­bil­ities. Even rea­son­ing through an en­tire end-to-end plan for win­ning takes far more effort than I usu­ally make for day-to-day de­ci­sions. Yet I can say that some ac­tions are likely to in­crease our chances of vic­tory, and I can pri­ori­tize ac­tions which are more likely to in­crease our chances of vic­tory by a larger amount. Sup­pose I’m run­ning a com­pany, try­ing to max­i­mize prof­its. I don’t make de­ci­sions by look­ing at the available op­tions, and then es­ti­mat­ing how prof­itable I ex­pect the com­pany to be un­der each choice. Rather, I rea­son lo­cally: at a cost of X I can gain Y, I’ve cached an in­tu­itive val­u­a­tion of X and Y based on their first-or­der effects, and I make the choice based on that with­out rea­son­ing through all the sec­ond-, third-, and higher-or­der effects of the choice. I don’t calcu­late all the way through to an ex­pected util­ity or any­thing com­pa­rable to it. If I see a$100 bill on the ground, I don’t need to rea­son through ex­actly what I’ll spend it on in or­der to de­cide to pick it up.

In gen­eral, I think hu­mans usu­ally make de­ci­sions di­rec­tion­ally and lo­cally: we try to de­cide which of two ac­tions is more likely to bet­ter achieve our goals, based on lo­cal con­sid­er­a­tions, with­out ac­tu­ally simu­lat­ing all the way to the pos­si­ble out­comes.

Tak­ing a more the­o­ret­i­cal per­spec­tive… how would a hu­man or other agent work with an un­com­putable U()? Well, we’d con­sider spe­cific choices available to us, and then try to guess which of those is more likely to give higher U(). We might look for proofs that one spe­cific choice or the an­other is bet­ter; we might lev­er­age log­i­cal in­duc­tion; we might do some­thing else en­tirely. None of that nec­es­sar­ily re­quires eval­u­at­ing U().

• Yeah, a di­dac­tic prob­lem with this post is that when I write ev­ery­thing out, the “re­duc­tive util­ity” po­si­tion does not sound that tempt­ing.

I still think it’s a re­ally easy trap to fall into, though, be­cause be­fore think­ing too much the as­sump­tion of a com­putable util­ity func­tion sounds ex­tremely rea­son­able.

Sup­pose I’m run­ning a com­pany, try­ing to max­i­mize prof­its. I don’t make de­ci­sions by look­ing at the available op­tions, and then es­ti­mat­ing how prof­itable I ex­pect the com­pany to be un­der each choice. Rather, I rea­son lo­cally: at a cost of X I can gain Y, I’ve cached an in­tu­itive val­u­a­tion of X and Y based on their first-or­der effects, and I make the choice based on that with­out rea­son­ing through all the sec­ond-, third-, and higher-or­der effects of the choice. I don’t calcu­late all the way through to an ex­pected util­ity or any­thing com­pa­rable to it.

With dy­namic-pro­gram­ming in­spired al­gorithms such as AlphaGo, “cached an in­tu­itive val­u­a­tion of X and Y” is mod­eled as a kind of ap­prox­i­mate eval­u­a­tion which is learned based on feed­back—but feed­back re­quires the abil­ity to com­pute U() at some point. (So you don’t start out know­ing how to eval­u­ate un­cer­tain situ­a­tions, but you do start out know­ing how to eval­u­ate util­ity on com­pletely speci­fied wor­lds.)

So one might still rea­son­ably as­sume you need to be able to com­pute U() de­spite this.

• Yeah, a di­dac­tic prob­lem with this post is that when I write ev­ery­thing out, the “re­duc­tive util­ity” po­si­tion does not sound that tempt­ing.

I ac­tu­ally found the po­si­tion very tempt­ing un­til I got to the sub­jec­tive util­ity sec­tion.

• sup­pose I’m a gen­eral try­ing to max­i­mize my side’s chance of win­ning a war. Can I eval­u­ate the prob­a­bil­ity that we win, given all of the in­for­ma­tion available to me? No—fully ac­count­ing for ev­ery lit­tle piece of info I have is way be­yond my com­pu­ta­tional ca­pa­bil­ities. Even rea­son­ing through an en­tire end-to-end plan for win­ning takes far more effort than I usu­ally make for day-to-day de­ci­sions. Yet I can say that some ac­tions are likely to in­crease our chances of vic­tory, and I can pri­ori­tize ac­tions which are more likely to in­crease our chances of vic­tory by a larger amount.

So, when and why are we able to get away with do­ing that?

AFAICT, the for­mal­isms of agents that I’m aware of (Bayesian in­fer­ence, AIXI etc.) set things up by sup­pos­ing log­i­cal om­ni­science and that the true world gen­er­at­ing our hy­pothe­ses is in the set of hy­pothe­ses and from there you can show that the agent will max­imise ex­pected utilty, or not get dutch booked or what­ever. But hu­mans, and ML al­gorithms for that mat­ter, don’t do that, we’re able to get “good enough” re­sults even when we know our mod­els are wrong and don’t cap­ture a good deal of the un­der­ly­ing pro­cess gen­er­at­ing our ob­ser­va­tions. Fur­ther­more, it seems that em­piri­cally, the more ex­pres­sive the model class we use, and the more com­pute thrown at the prob­lem, the bet­ter these bounded in­fer­ence al­gorithms work. I haven’t found a good ex­pla­na­tion of why this is the case be­yond hand wavy “we ap­proach log­i­cal om­ni­science as com­pute goes to in­finity and our hy­poth­e­sis space grows to en­com­pass all com­putable hy­pothe­ses, so even­tu­ally our ap­prox­i­ma­tion should work like the ideal Bayesian one”.

• I think in part we can get away with it be­cause it’s pos­si­ble to op­ti­mize for things that are only usu­ally de­cid­able.

Take win­ning the war for ex­am­ple. There may be no com­puter pro­gram that could look at any state of the world and tell you who won the war—there are lots of weird edge cases that could cause a Tur­ing ma­chine to not re­turn a de­ci­sion. But if we ex­pect to be able to tell who won the war with very high prob­a­bil­ity (or have a model that we think matches who wins the war with high prob­a­bil­ity), then we can just sort of ig­nore the weird edge cases and model failures when calcu­lat­ing an ex­pected util­ity.

• Per­haps...

[wav­ing hand]

As the ap­prox­i­ma­tion gets closer to the ideal, the re­sults do as well. (The Less Wrong quote seems rele­vant.)

• The de­scrip­tion of a par­tic­u­lar ver­sion of ex­pected util­ity the­ory feels very par­tic­u­lar to me.

Utility is gen­er­ally ex­pressed as a func­tion of a ran­dom vari­able. Not as a func­tion of an el­e­ment from the sam­ple space.

For in­stance: sup­pose that my util­ity is lin­ear in the profit or loss from the fol­low­ing game. We draw one bit from /​dev/​ran­dom. If it is true, I win a pound, else I lose one.

Utility is not here a func­tion of ‘the con­figu­ra­tion of the uni­verse’. It is a func­tion of a bool. The bool it­self may de­pend on (some sub­set of) ‘the con­figu­ra­tion of the uni­verse’ but re­al­ity maps uni­verse to bool for us, com­putabil­ity be damned.

• I don’t think we should be sur­prised that any rea­son­able util­ity func­tion is un­com­putable. Con­sider a set of wor­lds with utopias that last only as long as a Tur­ing ma­chine in the world does not halt and are oth­er­wise iden­ti­cal. There is one such world for each Tur­ing ma­chine. All of these wor­lds are pos­si­ble. No com­putable util­ity func­tion can as­sign higher util­ity to ev­ery world with a never halt­ing Tur­ing ma­chine.