Terminal Values and Instrumental Values

On a purely in­stinc­tive level, any hu­man plan­ner be­haves as if they dis­t­in­guish be­tween means and ends. Want choco­late? There’s choco­late at the Publix su­per­mar­ket. You can get to the su­per­mar­ket if you drive one mile south on Wash­ing­ton Ave. You can drive if you get into the car. You can get into the car if you open the door. You can open the door if you have your car keys. So you put your car keys into your pocket, and get ready to leave the house...

...when sud­denly the word comes on the ra­dio that an earth­quake has de­stroyed all the choco­late at the lo­cal Publix. Well, there’s no point in driv­ing to the Publix if there’s no choco­late there, and no point in get­ting into the car if you’re not driv­ing any­where, and no point in hav­ing car keys in your pocket if you’re not driv­ing. So you take the car keys out of your pocket, and call the lo­cal pizza ser­vice and have them de­liver a choco­late pizza. Mm, deli­cious.

I rarely no­tice peo­ple los­ing track of plans they de­vised them­selves. Peo­ple usu­ally don’t drive to the su­per­mar­ket if they know the choco­late is gone. But I’ve also no­ticed that when peo­ple be­gin ex­plic­itly talk­ing about goal sys­tems in­stead of just want­ing things, men­tion­ing “goals” in­stead of us­ing them, they oft be­come con­fused. Hu­mans are ex­perts at plan­ning, not ex­perts on plan­ning, or there’d be a lot more AI de­vel­op­ers in the world.

In par­tic­u­larly, I’ve no­ticed peo­ple get con­fused when—in ab­stract philo­soph­i­cal dis­cus­sions rather than ev­ery­day life—they con­sider the dis­tinc­tion be­tween means and ends; more for­mally, be­tween “in­stru­men­tal val­ues” and “ter­mi­nal val­ues”.

(Another long post needed as a refer­ence.)

Part of the prob­lem, it seems to me, is that the hu­man mind uses a rather ad-hoc sys­tem to keep track of its goals—it works, but not cleanly. English doesn’t em­body a sharp dis­tinc­tion be­tween means and ends: “I want to save my sister’s life” and “I want to ad­minister peni­cillin to my sister” use the same word “want”.

Can we de­scribe, in mere English, the dis­tinc­tion that is get­ting lost?

As a first stab:

“In­stru­men­tal val­ues” are de­sir­able strictly con­di­tional on their an­ti­ci­pated con­se­quences. “I want to ad­minister peni­cillin to my sister”, not be­cause a peni­cillin-filled sister is an in­trin­sic good, but in an­ti­ci­pa­tion of peni­cillin cur­ing her flesh-eat­ing pneu­mo­nia. If in­stead you an­ti­ci­pated that in­ject­ing peni­cillin would melt your sister into a pud­dle like the Wicked Witch of the West, you’d fight just as hard to keep her peni­cillin-free.

“Ter­mi­nal val­ues” are de­sir­able with­out con­di­tion­ing on other con­se­quences: “I want to save my sister’s life” has noth­ing to do with your an­ti­ci­pat­ing whether she’ll get in­jected with peni­cillin af­ter that.

This first at­tempt suffers from ob­vi­ous flaws. If sav­ing my sister’s life would cause the Earth to be swal­lowed up by a black hole, then I would go off and cry for a while, but I wouldn’t ad­minister peni­cillin. Does this mean that sav­ing my sister’s life was not a “ter­mi­nal” or “in­trin­sic” value, be­cause it’s the­o­ret­i­cally con­di­tional on its con­se­quences? Am I only try­ing to save her life be­cause of my be­lief that a black hole won’t con­sume the Earth af­ter­ward? Com­mon sense should say that’s not what’s hap­pen­ing.

So for­get English. We can set up a math­e­mat­i­cal de­scrip­tion of a de­ci­sion sys­tem in which ter­mi­nal val­ues and in­stru­men­tal val­ues are sep­a­rate and in­com­pat­i­ble types—like in­te­gers and float­ing-point num­bers, in a pro­gram­ming lan­guage with no au­to­matic con­ver­sion be­tween them.

An ideal Bayesian de­ci­sion sys­tem can be set up us­ing only four el­e­ments:

  • Out­comes : type Out­come[]

    • list of pos­si­ble outcomes

    • {sister lives, sister dies}

  • Ac­tions: type Ac­tion[]

    • list of pos­si­ble actions

    • {ad­minister peni­cillin, don’t ad­minister peni­cillin}

  • Utility_func­tion : type Out­come → Utility

    • util­ity func­tion that maps each out­come onto a utility

    • (a util­ity be­ing rep­re­sentable as a real num­ber be­tween nega­tive and pos­i­tive in­finity)

    • {sister lives: 1, sister dies: 0}

  • Con­di­tional_prob­a­bil­ity_func­tion : type Ac­tion → Out­come → Probability

    • con­di­tional prob­a­bil­ity func­tion that maps each ac­tion onto a prob­a­bil­ity dis­tri­bu­tion over outcomes

    • (a prob­a­bil­ity be­ing rep­re­sentable as a real num­ber be­tween 0 and 1)

    • {ad­minister peni­cillin: sister lives, .9; sister dies, .1 ;; don’t ad­minister peni­cillin: sister lives, 0.3; sister dies, 0.7}

If you can’t read the type sys­tem di­rectly, don’t worry, I’ll always trans­late into English. For pro­gram­mers, see­ing it de­scribed in dis­tinct state­ments helps to set up dis­tinct men­tal ob­jects.

And the de­ci­sion sys­tem it­self?

  • Ex­pected_Utility : Ac­tion A → (Sum O in Out­comes: Utility(O) * Prob­a­bil­ity(O|A))

    • The “ex­pected util­ity” of an ac­tion equals the sum, over all out­comes, of the util­ity of that out­come times the con­di­tional prob­a­bil­ity of that out­come given that ac­tion.

    • {EU(ad­minister peni­cillin) = 0.9 ; EU(don’t ad­minister peni­cillin) = 0.3}

  • Choose : → (Argmax A in Ac­tions: Ex­pected_Utility(A))

    • Pick an ac­tion whose “ex­pected util­ity” is maximal

    • {re­turn: ad­minister peni­cillin}

For ev­ery ac­tion, calcu­late the con­di­tional prob­a­bil­ity of all the con­se­quences that might fol­low, then add up the util­ities of those con­se­quences times their con­di­tional prob­a­bil­ity. Then pick the best ac­tion.

This is a math­e­mat­i­cally sim­ple sketch of a de­ci­sion sys­tem. It is not an effi­cient way to com­pute de­ci­sions in the real world.

Sup­pose, for ex­am­ple, that you need a se­quence of acts to carry out a plan? The for­mal­ism can eas­ily rep­re­sent this by let­ting each Ac­tion stand for a whole se­quence. But this cre­ates an ex­po­nen­tially large space, like the space of all sen­tences you can type in 100 let­ters. As a sim­ple ex­am­ple, if one of the pos­si­ble acts on the first turn is “Shoot my own foot off”, a hu­man plan­ner will de­cide this is a bad idea gen­er­ally—elimi­nate all se­quences be­gin­ning with this ac­tion. But we’ve flat­tened this struc­ture out of our rep­re­sen­ta­tion. We don’t have se­quences of acts, just flat “ac­tions”.

So, yes, there are a few minor com­pli­ca­tions. Ob­vi­ously so, or we’d just run out and build a real AI this way. In that sense, it’s much the same as Bayesian prob­a­bil­ity the­ory it­self.

But this is one of those times when it’s a sur­pris­ingly good idea to con­sider the ab­surdly sim­ple ver­sion be­fore adding in any high-falutin’ com­pli­ca­tions.

Con­sider the philoso­pher who as­serts, “All of us are ul­ti­mately self­ish; we care only about our own states of mind. The mother who claims to care about her son’s welfare, re­ally wants to be­lieve that her son is do­ing well—this be­lief is what makes the mother happy. She helps him for the sake of her own hap­piness, not his.” You say, “Well, sup­pose the mother sac­ri­fices her life to push her son out of the path of an on­com­ing truck. That’s not go­ing to make her happy, just dead.” The philoso­pher stam­mers for a few mo­ments, then replies, “But she still did it be­cause she val­ued that choice above oth­ers—be­cause of the feel­ing of im­por­tance she at­tached to that de­ci­sion.”

So you say, “TYPE ERROR: No con­struc­tor found for Ex­pected_Utility → Utility.

Allow me to ex­plain that re­ply.

Even our sim­ple for­mal­ism illus­trates a sharp dis­tinc­tion be­tween ex­pected util­ity, which is some­thing that ac­tions have; and util­ity, which is some­thing that out­comes have. Sure, you can map both util­ities and ex­pected util­ities onto real num­bers. But that’s like ob­serv­ing that you can map wind speed and tem­per­a­ture onto real num­bers. It doesn’t make them the same thing.

The philoso­pher be­gins by ar­gu­ing that all your Utilities must be over Out­comes con­sist­ing of your state of mind. If this were true, your in­tel­li­gence would op­er­ate as an en­g­ine to steer the fu­ture into re­gions where you were happy Fu­ture states would be dis­t­in­guished only by your state of mind; you would be in­differ­ent be­tween any two fu­tures in which you had the same state of mind.

And you would, in­deed, be rather un­likely to sac­ri­fice your own life to save an­other.

When we ob­ject that peo­ple some­times do sac­ri­fice their lives, the philoso­pher’s re­ply shifts to dis­cussing Ex­pected Utilities over Ac­tions: “The feel­ing of im­por­tance she at­tached to that de­ci­sion.” This is a dras­tic jump that should make us leap out of our chairs in in­dig­na­tion. Try­ing to con­vert an Ex­pected_Utility into a Utility would cause an out­right er­ror in our pro­gram­ming lan­guage. But in English it all sounds the same.

The choices of our sim­ple de­ci­sion sys­tem are those with high­est Ex­pected Utility, but this doesn’t say any­thing what­so­ever about where it steers the fu­ture. It doesn’t say any­thing about the util­ities the de­cider as­signs, or which real-world out­comes are likely to hap­pen as a re­sult. It doesn’t say any­thing about the mind’s func­tion as an en­g­ine.

The phys­i­cal cause of a phys­i­cal ac­tion is a cog­ni­tive state, in our ideal de­cider an Ex­pected_Utility, and this ex­pected util­ity is calcu­lated by eval­u­at­ing a util­ity func­tion over imag­ined con­se­quences. To save your son’s life, you must imag­ine the event of your son’s life be­ing saved, and this imag­i­na­tion is not the event it­self. It’s a quo­ta­tion, like the differ­ence be­tween “snow” and snow. But that doesn’t mean that what’s in­side the quote marks must it­self be a cog­ni­tive state. If you choose the ac­tion that leads to the fu­ture that you rep­re­sent with “my son is still al­ive”, then you have func­tioned as an en­g­ine to steer the fu­ture into a re­gion where your son is still al­ive. Not an en­g­ine that steers the fu­ture into a re­gion where you rep­re­sent the sen­tence “my son is still al­ive”. To steer the fu­ture there, your util­ity func­tion would have to re­turn a high util­ity when fed “”my son is still al­ive”″, the quo­ta­tion of the quo­ta­tion, your imag­i­na­tion of your­self imag­in­ing. Recipes make poor cake when you grind them up and toss them in the bat­ter.

And that’s why it’s helpful to con­sider the sim­ple de­ci­sion sys­tems first. Mix enough com­pli­ca­tions into the sys­tem, and formerly clear dis­tinc­tions be­come harder to see.

So now let’s look at some com­pli­ca­tions. Clearly the Utility func­tion (map­ping Out­comes onto Utilities) is meant to for­mal­ize what I ear­lier referred to as “ter­mi­nal val­ues”, val­ues not con­tin­gent upon their con­se­quences. What about the case where sav­ing your sister’s life leads to Earth’s de­struc­tion by a black hole? In our for­mal­ism, we’ve flat­tened out this pos­si­bil­ity. Out­comes don’t lead to Out­comes, only Ac­tions lead to Out­comes. Your sister re­cov­er­ing from pneu­mo­nia fol­lowed by the Earth be­ing de­voured by a black hole would be flat­tened into a sin­gle “pos­si­ble out­come”.

And where are the “in­stru­men­tal val­ues” in this sim­ple for­mal­ism? Ac­tu­ally, they’ve van­ished en­tirely! You see, in this for­mal­ism, ac­tions lead di­rectly to out­comes with no in­ter­ven­ing events. There’s no no­tion of throw­ing a rock that flies through the air and knocks an ap­ple off a branch so that it falls to the ground. Throw­ing the rock is the Ac­tion, and it leads straight to the Out­come of the ap­ple ly­ing on the ground—ac­cord­ing to the con­di­tional prob­a­bil­ity func­tion that turns an Ac­tion di­rectly into a Prob­a­bil­ity dis­tri­bu­tion over Out­comes.

In or­der to ac­tu­ally com­pute the con­di­tional prob­a­bil­ity func­tion, and in or­der to sep­a­rately con­sider the util­ity of a sister’s pneu­mo­nia and a black hole swal­low­ing Earth, we would have to rep­re­sent the net­work struc­ture of causal­ity—the way that events lead to other events.

And then the in­stru­men­tal val­ues would start com­ing back. If the causal net­work was suffi­ciently reg­u­lar, you could find a state B that tended to lead to C re­gard­less of how you achieved B. Then if you wanted to achieve C for some rea­son, you could plan effi­ciently by first work­ing out a B that led to C, and then an A that led to B. This would be the phe­nomenon of “in­stru­men­tal value”—B would have “in­stru­men­tal value” be­cause it led to C. C it­self might be ter­mi­nally val­ued—a term in the util­ity func­tion over the to­tal out­come. Or C might just be an in­stru­men­tal value, a node that was not di­rectly val­ued by the util­ity func­tion.

In­stru­men­tal value, in this for­mal­ism, is purely an aid to the effi­cient com­pu­ta­tion of plans. It can and should be dis­carded wher­ever this kind of reg­u­lar­ity does not ex­ist.

Sup­pose, for ex­am­ple, that there’s some par­tic­u­lar value of B that doesn’t lead to C. Would you choose an A which led to that B? Or never mind the ab­stract philos­o­phy: If you wanted to go to the su­per­mar­ket to get choco­late, and you wanted to drive to the su­per­mar­ket, and you needed to get into your car, would you gain en­try by rip­ping off the car door with a steam shovel? (No.) In­stru­men­tal value is a “leaky ab­strac­tion”, as we pro­gram­mers say; you some­times have to toss away the cached value and com­pute out the ac­tual ex­pected util­ity. Part of be­ing effi­cient with­out be­ing suici­dal is notic­ing when con­ve­nient short­cuts break down. Though this for­mal­ism does give rise to in­stru­men­tal val­ues, it does so only where the req­ui­site reg­u­lar­ity ex­ists, and strictly as a con­ve­nient short­cut in com­pu­ta­tion.

But if you com­pli­cate the for­mal­ism be­fore you un­der­stand the sim­ple ver­sion, then you may start think­ing that in­stru­men­tal val­ues have some strange life of their own, even in a nor­ma­tive sense. That, once you say B is usu­ally good be­cause it leads to C, you’ve com­mit­ted your­self to always try for B even in the ab­sence of C. Peo­ple make this kind of mis­take in ab­stract philos­o­phy, even though they would never, in real life, rip open their car door with a steam shovel. You may start think­ing that there’s no way to de­velop a con­se­quen­tial­ist that max­i­mizes only in­clu­sive ge­netic fit­ness, be­cause it will starve un­less you in­clude an ex­plicit ter­mi­nal value for “eat­ing food”. Peo­ple make this mis­take even though they would never stand around open­ing car doors all day long, for fear of be­ing stuck out­side their cars if they didn’t have a ter­mi­nal value for open­ing car doors.

In­stru­men­tal val­ues live in (the net­work struc­ture of) the con­di­tional prob­a­bil­ity func­tion. This makes in­stru­men­tal value strictly de­pen­dent on be­liefs-of-fact given a fixed util­ity func­tion. If I be­lieve that peni­cillin causes pneu­mo­nia, and that the ab­sence of peni­cillin cures pneu­mo­nia, then my per­ceived in­stru­men­tal value of peni­cillin will go from high to low. Change the be­liefs of fact—change the con­di­tional prob­a­bil­ity func­tion that as­so­ci­ates ac­tions to be­lieved con­se­quences—and the in­stru­men­tal val­ues will change in uni­son.

In moral ar­gu­ments, some dis­putes are about in­stru­men­tal con­se­quences, and some dis­putes are about ter­mi­nal val­ues. If your de­bat­ing op­po­nent says that ban­ning guns will lead to lower crime, and you say that ban­ning guns lead to higher crime, then you agree about a su­pe­rior in­stru­men­tal value (crime is bad), but you dis­agree about which in­ter­me­di­ate events lead to which con­se­quences. But I do not think an ar­gu­ment about fe­male cir­cum­ci­sion is re­ally a fac­tual ar­gu­ment about how to best achieve a shared value of treat­ing women fairly or mak­ing them happy.

This im­por­tant dis­tinc­tion of­ten gets flushed down the toi­let in an­gry ar­gu­ments. Peo­ple with fac­tual dis­agree­ments and shared val­ues, each de­cide that their de­bat­ing op­po­nents must be so­ciopaths. As if your hated en­emy, gun con­trol /​ rights ad­vo­cates, re­ally wanted to kill peo­ple, which should be im­plau­si­ble as re­al­is­tic psy­chol­ogy.

I fear the hu­man brain does not strongly type the dis­tinc­tion be­tween ter­mi­nal moral be­liefs and in­stru­men­tal moral be­liefs. “We should ban guns” and “We should save lives” don’t feel differ­ent, as moral be­liefs, the way that sight feels differ­ent from sound. De­spite all the other ways that the hu­man goal sys­tem com­pli­cates ev­ery­thing in sight, this one dis­tinc­tion it man­ages to col­lapse into a mish­mash of things-with-con­di­tional-value.

To ex­tract out the ter­mi­nal val­ues we have to in­spect this mish­mash of valuable things, try­ing to figure out which ones are get­ting their value from some­where else. It’s a difficult pro­ject! If you say that you want to ban guns in or­der to re­duce crime, it may take a mo­ment to re­al­ize that “re­duc­ing crime” isn’t a ter­mi­nal value, it’s a su­pe­rior in­stru­men­tal value with links to ter­mi­nal val­ues for hu­man lives and hu­man hap­pinesses. And then the one who ad­vo­cates gun rights may have links to the su­pe­rior in­stru­men­tal value of “re­duc­ing crime” plus a link to a value for “free­dom”, which might be a ter­mi­nal value unto them, or an­other in­stru­men­tal value...

We can’t print out our com­plete net­work of val­ues de­rived from other val­ues. We prob­a­bly don’t even store the whole his­tory of how val­ues got there. By con­sid­er­ing the right moral dilem­mas, “Would you do X if Y”, we can of­ten figure out where our val­ues came from. But even this pro­ject it­self is full of pit­falls; mis­lead­ing dilem­mas and gappy philo­soph­i­cal ar­gu­ments. We don’t know what our own val­ues are, or where they came from, and can’t find out ex­cept by un­der­tak­ing er­ror-prone pro­jects of cog­ni­tive ar­chae­ol­ogy. Just form­ing a con­scious dis­tinc­tion be­tween “ter­mi­nal value” and “in­stru­men­tal value”, and keep­ing track of what it means, and us­ing it cor­rectly, is hard work. Only by in­spect­ing the sim­ple for­mal­ism can we see how easy it ought to be, in prin­ci­ple.

And that’s to say noth­ing of all the other com­pli­ca­tions of the hu­man re­ward sys­tem—the whole use of re­in­force­ment ar­chi­tec­ture, and the way that eat­ing choco­late is plea­surable, and an­ti­ci­pat­ing eat­ing choco­late is plea­surable, but they’re differ­ent kinds of plea­sures...

But I don’t com­plain too much about the mess.

Be­ing ig­no­rant of your own val­ues may not always be fun, but at least it’s not bor­ing.