An Orthodox Case Against Utility Functions

This post has benefit­ted from dis­cus­sion with Sam Eisen­stat, Scott Garrabrant, Tsvi Ben­son-Tilsen, Daniel Dem­ski, Daniel Koko­ta­jlo, and Stu­art Arm­strong. It started out as a thought about Stu­art Arm­strong’s re­search agenda.

In this post, I hope to say some­thing about what it means for a ra­tio­nal agent to have prefer­ences. The view I am putting for­ward is rel­a­tively new to me, but it is not very rad­i­cal. It is, dare I say, a con­ser­va­tive view—I hold close to Bayesian ex­pected util­ity the­ory. How­ever, my im­pres­sion is that it differs greatly from com­mon im­pres­sions of Bayesian ex­pected util­ity the­ory.

I will ar­gue against a par­tic­u­lar view of ex­pected util­ity the­ory—a view which I’ll call re­duc­tive util­ity. I do not re­call see­ing this view ex­plic­itly laid out and defended (ex­cept in in-per­son con­ver­sa­tions). How­ever, I ex­pect at least a good chunk of the as­sump­tions are com­monly made.

Re­duc­tive Utility

The core tenets of re­duc­tive util­ity are as fol­lows:

  • The sam­ple space of a ra­tio­nal agent’s be­liefs is, more or less, the set of pos­si­ble ways the world could be—which is to say, the set of pos­si­ble phys­i­cal con­figu­ra­tions of the uni­verse. Hence, each world is one such con­figu­ra­tion.

  • The prefer­ences of a ra­tio­nal agent are rep­re­sented by a util­ity func­tion from wor­lds to real num­bers.

  • Fur­ther­more, the util­ity func­tion should be a com­putable func­tion of wor­lds.

Since I’m set­ting up the view which I’m knock­ing down, there is a risk I’m strik­ing at a straw man. How­ever, I think there are some good rea­sons to find the view ap­peal­ing. The fol­low­ing sub­sec­tions will ex­pand on the three tenets, and at­tempt to provide some mo­ti­va­tion for them.

If the three points seem ob­vi­ous to you, you might just skip to the next sec­tion.

Wor­lds Are Ba­si­cally Physical

What I mean here re­sem­bles the stan­dard phys­i­cal-re­duc­tion­ist view. How­ever, my em­pha­sis is on cer­tain fea­tures of this view:

  • There is some “ba­sic stuff”—like like quarks or vibrat­ing strings or what-have-you.

  • What there is to know about the world is some set of state­ments about this ba­sic stuff—par­ti­cle lo­ca­tions and mo­men­tums, or wave-form func­tion val­ues, or what-have-you.

  • Th­ese spe­cial atomic state­ments should be log­i­cally in­de­pen­dent from each other (though they may of course be prob­a­bil­is­ti­cally re­lated), and to­gether, fully de­ter­mine the world.

  • Th­ese should (more or less) be what be­liefs are about, such that we can (more or less) talk about be­liefs in terms of the sam­ple space as be­ing the set of wor­lds un­der­stood in this way.

This is the so-called “view from nowhere”, as Thomas Nagel puts it.

I don’t in­tend to con­strue this po­si­tion as rul­ing out cer­tain non-phys­i­cal facts which we may have be­liefs about. For ex­am­ple, we may be­lieve in­dex­i­cal facts on top of the phys­i­cal facts—there might be (1) be­liefs about the uni­verse, and (2) be­liefs about where we are in the uni­verse. Ex­cep­tions like this vi­o­late an ex­treme re­duc­tive view, but are still close enough to count as re­duc­tive think­ing for my pur­poses.

Utility Is a Func­tion of Worlds

So we’ve got the “ba­si­cally phys­i­cal” . Now we write down a util­ity func­tion . In other words, util­ity is a ran­dom vari­able on our event space.

What’s the big deal?

One thing this is say­ing is that prefer­ences are a func­tion of the world. Speci­fi­cally, prefer­ences need not only de­pend on what is ob­served. This is in­com­pat­i­ble with stan­dard RL in a way that mat­ters.

But, in ad­di­tion to say­ing that util­ity can de­pend on more than just ob­ser­va­tions, we are re­strict­ing util­ity to only de­pend on things that are in the world. After we con­sider all the in­for­ma­tion in , there can­not be any ex­tra un­cer­tainty about util­ity—no ex­tra “moral facts” which we may be un­cer­tain of. If there are such moral facts, they have to be pre­sent some­where in the uni­verse (at least, deriv­able from facts about the uni­verse).

One im­pli­ca­tion of this: if util­ity is about high-level en­tities, the util­ity func­tion is re­spon­si­ble for de­riv­ing them from low-level stuff. For ex­am­ple, if the uni­verse is made of quarks, but util­ity is a func­tion of beauty, con­scious­ness, and such, then needs to con­tain the beauty-de­tec­tor and con­scious­ness-de­tec­tor and so on—oth­er­wise how can it com­pute util­ity given all the in­for­ma­tion about the world?

Utility Is Computable

Fi­nally, and most crit­i­cally for the dis­cus­sion here, should be a com­putable func­tion.

To clar­ify what I mean by this: should have some sort of rep­re­sen­ta­tion which al­lows us to feed it into a Tur­ing ma­chine—let’s say it’s an in­finite bit-string which as­signs true or false to each of the “atomic sen­tences” which de­scribe the world. should be a com­putable func­tion; that is, there should be a Tur­ing ma­chine which takes a ra­tio­nal num­ber and takes , prints a ra­tio­nal num­ber within of , and halts. (In other words, we can com­pute to any de­sired de­gree of ap­prox­i­ma­tion.)

Why should be com­putable?

One ar­gu­ment is that should be com­putable be­cause the agent has to be able to use it in com­pu­ta­tions. This per­spec­tive is es­pe­cially ap­peal­ing if you think of as a black-box func­tion which you can only op­ti­mize through search. If you can’t eval­u­ate , how are you sup­posed to use it? If ex­ists as an ac­tual mod­ule some­where in the brain, how is it sup­posed to be im­ple­mented? (If you don’t think this sounds very con­vinc­ing, great!)

Re­quiring to be com­putable may also seem easy. What is there to lose? Are there prefer­ence struc­tures we re­ally care about be­ing able to rep­re­sent, which are fun­da­men­tally not com­putable?

And what would it even mean for a com­putable agent to have non-com­putable prefer­ences?

How­ever, the com­putabil­ity re­quire­ment is more re­stric­tive than it may seem.

There is a sort of con­ti­nu­ity im­plied by com­putabil­ity: must not de­pend too much on “small” differ­ences be­tween wor­lds. The com­pu­ta­tion only ac­cesses finitely many bits of be­fore it halts. All the rest of the bits in must not make more than differ­ence to the value of .

This means some seem­ingly sim­ple util­ity func­tions are not com­putable.

As an ex­am­ple, con­sider the pro­cras­ti­na­tion para­dox. Your task is to push a but­ton. You get 10 util­ity for push­ing the but­ton. You can push it any time you like. How­ever, if you never press the but­ton, you get −10. On any day, you are fine with putting the but­ton-press­ing off for one more day. Yet, if you put it off for­ever, you lose!

We can think of as a string like 000000100.., where the “1” is the day you push the but­ton. To com­pute the util­ity, we might look for the “1″, out­putting 10 if we find it.

But what about the all-zero uni­verse, 0000000...? The pro­gram must loop for­ever. We can’t tell we’re in the all-zero uni­verse by ex­am­in­ing any finite num­ber of bits. You don’t know whether you will even­tu­ally push the but­ton. (Even if the uni­verse also gives your source code, you can’t nec­es­sar­ily tell from that—the log­i­cal difficulty of de­ter­min­ing this about your­self is, of course, the origi­nal point of the pro­cras­ti­na­tion para­dox.)

Hence, a prefer­ence struc­ture like this is not com­putable, and is not al­lowed ac­cord­ing to the re­duc­tive util­ity doc­trine.

The ad­vo­cate of re­duc­tive util­ity might take this as a vic­tory. The pro­cras­ti­na­tion para­dox has been avoided, and other para­doxes with a similar struc­ture. (The St. Peters­burg Para­dox is an­other ex­am­ple.)

On the other hand, if you think this is a le­gi­t­i­mate prefer­ence struc­ture, deal­ing with such ‘prob­le­matic’ prefer­ences mo­ti­vates aban­don­ment of re­duc­tive util­ity.

Sub­jec­tive Utility: The Real Thing

We can strongly op­pose all three points with­out leav­ing or­tho­dox Bayesi­anism. Speci­fi­cally, I’ll sketch how the Jeffrey-Bolker ax­ioms en­able non-re­duc­tive util­ity. (The ti­tle of this sec­tion is a refer­ence to Jeffrey’s book Sub­jec­tive Prob­a­bil­ity: The Real Thing.)

How­ever, the real po­si­tion I’m ad­vo­cat­ing is more grounded in log­i­cal in­duc­tion rather than the Jeffrey-Bolker ax­ioms; I’ll sketch that ver­sion at the end.

The View From Somewhere

The re­duc­tive-util­ity view ap­proached things from the start­ing-point of the uni­verse. Beliefs are for what is real, and what is real is ba­si­cally phys­i­cal.

The non-re­duc­tive view starts from the stand­point of the agent. Beliefs are for things you can think about. This doesn’t rule out a phys­i­cal­ist ap­proach. What it does do is give high-level ob­jects like ta­bles and chairs an equal foot­ing with low-level ob­jects like quarks: both are in­ferred from sen­sory ex­pe­rience by the agent.

Rather than as­sum­ing an un­der­ly­ing set of wor­lds, Jeffrey-Bolker as­sume only a set of events. For two events and , the con­junc­tion ex­ists, and the dis­junc­tion , and the nega­tions and . How­ever, un­like in the Kol­mogorov ax­ioms, these are not as­sumed to be in­ter­sec­tion, union, and com­ple­ment of an un­der­ly­ing set of wor­lds.

Let me em­pha­size that: we need not as­sume there are “wor­lds” at all.

In philos­o­phy, this is called situ­a­tion se­man­tics—an al­ter­na­tive to the more com­mon pos­si­ble-world se­man­tics. In math­e­mat­ics, it brings to mind pointless topol­ogy.

In the Jeffrey-Bolker treat­ment, a world is just a max­i­mally spe­cific event: an event which de­scribes ev­ery­thing com­pletely. But there is no re­quire­ment that max­i­mally-spe­cific events ex­ist. Per­haps any event, no mat­ter how de­tailed, can be fur­ther ex­tended by spec­i­fy­ing some yet-un­men­tioned stuff. (In­deed, the Jeffrey-Bolker ax­ioms as­sume this! Although, Jeffrey does not seem philo­soph­i­cally com­mit­ted to that as­sump­tion, from what I have read.)

Thus, there need not be any “view from nowhere”—no se­man­tic van­tage point from which we see the whole uni­verse.

This, of course, de­prives us of the ob­jects which util­ity was a func­tion of, in the re­duc­tive view.

Utility Is a Func­tion of Events

The re­duc­tive-util­ity makes a dis­tinc­tion be­tween util­ity—the ran­dom vari­able it­self—and ex­pected util­ity, which is the sub­jec­tive es­ti­mate of the ran­dom vari­able which we use for mak­ing de­ci­sions.

The Jeffrey-Bolker frame­work does not make a dis­tinc­tion. Every­thing is a sub­jec­tive prefer­ence eval­u­a­tion.

A re­duc­tive-util­ity ad­vo­cate sees the ex­pected util­ity of an event as de­rived from the util­ity of the wor­lds within the event. They start by defin­ing ; then, we define the ex­pected util­ity of an event as -- or, more gen­er­ally, the cor­re­spond­ing in­te­gral.

In the Jeffrey-Bolker frame­work, we in­stead define di­rectly on events. Th­ese prefer­ences are re­quired to be co­her­ent with break­ing things up into sums, so = -- but we do not define one from the other.

We don’t have to know how to eval­u­ate en­tire wor­lds in or­der to eval­u­ate events. All we have to know is how to eval­u­ate events!

Up­dates Are Computable

Jeffrey-Bolker doesn’t say any­thing about com­putabil­ity. How­ever, if we do want to ad­dress this sort of is­sue, it leaves us in a differ­ent po­si­tion.

Be­cause sub­jec­tive ex­pec­ta­tion is pri­mary, it is now more nat­u­ral to re­quire that the agent can eval­u­ate events, with­out any re­quire­ment about a func­tion on wor­lds. (Of course, we could do that in the Kol­mogorov frame­work.)

Agents don’t need to be able to com­pute the util­ity of a whole world. All they need to know is how to up­date ex­pected util­ities as they go along.

Of course, the sub­jec­tive util­ity can’t be just any way of up­dat­ing as you go along. It needs to be co­her­ent, in the sense of the Jeffrey-Bolker ax­ioms. And, main­tain­ing co­her­ence can be very difficult. But it can be quite easy even in cases where the ran­dom-vari­able treat­ment of the util­ity func­tion is not com­putable.

Let’s go back to the pro­cras­ti­na­tion ex­am­ple. In this case, to eval­u­ate the ex­pected util­ity of each ac­tion at a given time-step, the agent does not need to figure out whether it ever pushes the but­ton. It just needs to have some prob­a­bil­ity, which it up­dates over time.

For ex­am­ple, an agent might ini­tially as­sign prob­a­bil­ity to press­ing the but­ton at time , and to never press­ing the but­ton. Its prob­a­bil­ity that it would ever press the but­ton, and thus its util­ity es­ti­mate, would de­crease with each ob­served time-step in which it didn’t press the but­ton. (Of course, such an agent would press the but­ton im­me­di­ately.)

Of course, this “solu­tion” doesn’t touch on any of the tricky log­i­cal is­sues which the pro­cras­ti­na­tion para­dox was origi­nally in­tro­duced to illus­trate. This isn’t meant as a solu­tion to the pro­cras­ti­na­tion para­dox—only as an illus­tra­tion of how to co­her­ently up­date dis­con­tin­u­ous prefer­ences. This sim­ple is un­com­putable by the defi­ni­tion of the pre­vi­ous sec­tion.

It also doesn’t ad­dress com­pu­ta­tional tractabil­ity in a very real way, since if the prior is very com­pli­cated, com­put­ing the sub­jec­tive ex­pec­ta­tions can get ex­tremely difficult.

We can come closer to ad­dress­ing log­i­cal is­sues and com­pu­ta­tional tractabil­ity by con­sid­er­ing things in a log­i­cal in­duc­tion frame­work.

Utility Is Not a Function

In a log­i­cal in­duc­tion (LI) frame­work, the cen­tral idea be­comes “up­date your sub­jec­tive ex­pec­ta­tions in any way you like, so long as those ex­pec­ta­tions aren’t (too eas­ily) ex­ploitable to Dutch-book.” This clar­ifies what it means for the up­dates to be “co­her­ent”—it is some­what more el­e­gant than say­ing ”… any way you like, so long as they fol­low the Jeffrey-Bolker ax­ioms.”

This re­places the idea of “util­ity func­tion” en­tirely—there isn’t any need for a func­tion any more, just a log­i­cally-un­cer­tain-vari­able (LUV, in the ter­minol­ogy from the LI pa­per).

Ac­tu­ally, there are differ­ent ways one might want to set things up. I hope to get more tech­ni­cal in a later post. For now, here’s some bul­let points:

  • In the sim­ple pro­cras­ti­na­tion-para­dox ex­am­ple, you push the but­ton if you have any un­cer­tainty at all. So things are not that in­ter­est­ing.

  • In more com­pli­cated ex­am­ples—where there is some real benefit to pro­cras­ti­nat­ing—a LI-based agent could to­tally pro­cras­ti­nate for­ever. This is be­cause LI doesn’t give any guaran­tee about con­verg­ing to cor­rect be­liefs for un­com­putable propo­si­tions like whether Tur­ing ma­chines halt or whether peo­ple stop pro­cras­ti­nat­ing.

  • Believ­ing you’ll stop pro­cras­ti­nat­ing even though you won’t is perfectly co­her­ent—in the same way that be­liev­ing in non­stan­dard num­bers is perfectly log­i­cally con­sis­tent. Put­ting our­selves in the shoes of such an agent, this just means we’ve ex­am­ined our own de­ci­sion-mak­ing to the best of our abil­ity, and have put sig­nifi­cant prob­a­bil­ity on “we don’t pro­cras­ti­nate for­ever”. This kind of rea­son­ing is nec­es­sar­ily fal­lible.

  • Yet, if a sys­tem we built were to do this, we might have strong ob­jec­tions. So, this can count as an al­ign­ment prob­lem. How can we give feed­back to a sys­tem to avoid this kind of mis­take? I hope to work on this ques­tion in fu­ture posts.