Comment on Coherence arguments do not imply goal directed behavior

In co­her­ence ar­gu­ments do not im­ply goal di­rected be­hav­ior Ro­hin Shah ar­gues that a sys­tem’s merely be­ing at all model-able as an EU max­i­mizer does not im­ply that it has “goal di­rected be­hav­ior”. The ar­gu­ment as I un­der­stand it runs some­thing like this:

1: Any be­hav­ior what­so­ever max­i­mizes some util­ity func­tion.

2: Not all be­hav­iors are goal di­rected.

Con­clu­sion: A sys­tem’s be­hav­ior max­i­miz­ing some util­ity func­tion does not im­ply that its be­hav­ior is goal di­rected.

I think this ar­gu­ment is tech­ni­cally sound, but misses an im­por­tant con­nec­tion be­tween VNM co­her­ence and goal di­rected be­hav­ior.

Shah does not give a for­mal defi­ni­tion of “goal di­rected be­hav­ior” but it is ba­si­cally what you in­tu­itively think it is. Goal di­rected be­hav­ior is the sort of be­hav­ior that seems like it is aimed at ac­com­plish­ing some goal. Shah cor­rectly points out that a sys­tem be­ing goal di­rected and be­ing good at ac­com­plish­ing its goal is what makes it dan­ger­ous, not merely that it is good at max­i­miz­ing some util­ity func­tion. Every ob­ject in the uni­verse perfectly max­i­mizes the util­ity func­tion that as­signs 1 to all of the ac­tual causal con­se­quences of its be­hav­ior, and 0 to any other causal con­se­quences its be­hav­ior might have had.

Shah seems to sug­gest that be­ing model-able as an EU max­i­mizer is not very closely re­lated to goal di­rected be­hav­ior. Sure, hav­ing goal di­rected be­hav­ior im­plies that you are model-able as an EU max­i­mizer, but so does hav­ing any kind of be­hav­ior what­so­ever.

The im­pli­ca­tion does not run the other way ac­cord­ing to Shah. Some­thing be­ing an EU max­i­mizer for some util­ity func­tion, even a perfect one, does not im­ply that its be­hav­ior is goal di­rected. I think this is right, but I will ar­gue that nonethe­less, it is true that it be­ing a good idea for you to model an agent as an EU max­i­mizer does im­ply that its be­hav­ior will seem goal di­rected (at least to you).

Shah gives the ex­am­ple of a twitch­ing robot. This is not a robot that max­i­mizes the prob­a­bil­ity of its twitch­ing, or that wants to twitch as long as pos­si­ble. Shah agrees that a robot that max­i­mized those things would be dan­ger­ous. Rather, this is a robot that just twitches. Such a robot max­i­mizes a util­ity func­tion that as­signs 1 to what­ever the ac­tual con­se­quences of its ac­tual twitch­ing be­hav­iors are, and 0 to any­thing else that the con­se­quences might have been.

This sys­tem is a perfect EU max­i­mizer for that util­ity func­tion, but it is not an op­ti­miza­tion pro­cess for any util­ity func­tion. For a sys­tem to be an op­ti­miza­tion pro­cess it must be that it is more effi­cient to pre­dict it by mod­el­ing it as an op­ti­miza­tion pro­cess than by mod­el­ing it as a me­chan­i­cal sys­tem. Another way to put it is that it must be a good idea for you to model it as an EU max­i­mizer.

This might be true in two differ­ent ways. It might be more effi­cient in terms of time or com­pute. My pre­dic­tions of the be­hav­ior when I model the sys­tem as an EU max­i­mizer might not be as good as my pre­dic­tions of the be­hav­ior when I model it as a me­chan­i­cal sys­tem, but the re­duced ac­cu­racy is worth it, be­cause mod­el­ing the sys­tem me­chan­i­cally would take me much longer or be oth­er­wise costly. Think of pre­dict­ing a chess play­ing pro­gram. Even though I could pre­dict the next move by learn­ing its source code and com­put­ing it by hand on pa­per, I would be bet­ter off in most con­texts just think­ing about what I would do in its cir­cum­stances if I were try­ing to win at chess.

Another re­lated but dis­tinct sense in which it might be more effi­cient is that mod­el­ing the sys­tem as an EU max­i­mizer might al­low me to com­press its be­hav­ior more than mod­el­ing it as a me­chan­i­cal sys­tem. Imag­ine if I had to send some­one a python pro­gram that makes pre­dic­tions about the be­hav­ior of the twitch­ing robot. I could write a pro­gram that just prints “twitch” over and over again, or I could write a pro­gram that mod­els the whole world and picks the be­hav­ior that best max­i­mizes the ex­pected value of a util­ity func­tion that as­signs 1 to what­ever the ac­tual con­se­quences of the twitch­ing are, and 0 to what­ever else they might have been. I claim that the sec­ond pro­gram would be longer. It would not how­ever al­low the re­ceiver of my mes­sage to pre­dict the be­hav­ior of the robot any more ac­cu­rately than a pro­gram that just prints “it twitches again” over and over.

Maybe the ex­act twitch­ing pat­tern is com­pli­cated, or maybe it stops at some par­tic­u­lar time, and in that case the first pro­gram would have to be more com­pli­cated, but as long as the twitch­ing does not seem goal di­rected, I claim that a python pro­gram that pre­dicts the robot’s be­hav­ior by mod­el­ing the uni­verse and the coun­ter­fac­tual con­se­quences of differ­ent kinds of pos­si­ble twitch­ing will always be longer than one that pre­dicts the twitch­ing by ex­ploit­ing reg­u­lar­i­ties that fol­low from the robot’s me­chan­i­cal de­sign. I think this might be what it means for a sys­tem to be goal di­rected.

(Might also be worth point­ing out that know­ing that there is a util­ity func­tion which the twitch­ing robot is a perfect op­ti­mizer rel­a­tive to does not al­low us to pre­dict its be­hav­ior in ad­vance. “It op­ti­mizes the util­ity func­tion that as­signs 1 to the con­se­quences of its be­hav­ior and 0 to ev­ery­thing else” is a bad the­ory of the twitch­ing robot in the same way that “the lady down the street is a witch; she did it” is a bad the­ory of any­thing.)

A sys­tem seems goal di­rected to you if the best way you have of pre­dict­ing it is by mod­el­ing it as an EU max­i­mizer with some par­tic­u­lar util­ity func­tion and cre­dence func­tion. (Ac­tu­ally, the par­tic­u­lars of the EU for­mal­ism might not be very rele­vant to what makes hu­mans think of a sys­tem’s be­hav­ior as goal di­rected. It be­ing a good idea to model it as hav­ing some­thing like prefer­ences and some sort of rea­son­ably ac­cu­rate model of the world that sup­ports coun­ter­fac­tual rea­son­ing is prob­a­bly good enough.) This con­cep­tion of goal di­rected-ness is some­what awk­ward be­cause the first no­tion of “effi­ciently model” is rel­a­tive to your ca­pac­i­ties and goals, and the sec­ond no­tion is rel­a­tive to the pro­gram­ming lan­guage we choose, but I think it is ba­si­cally right nonethe­less. Luck­ily, we hu­mans have rel­a­tively similar ca­pac­i­ties and goals, and it can be shown that us­ing the sec­ond no­tion of “effi­ciently model” we will only dis­agree about how agenty differ­ent sys­tems are by at most some ad­di­tive con­stant re­gard­less of what pro­gram­ming lan­guages we choose.

One ar­gu­ment that what it means for a sys­tem’s be­hav­ior to seem goal di­rected to you is just for it to be best for you to model it as an EU max­i­mizer is that if it were a bet­ter idea for you to model it some other way, that is prob­a­bly how you would model it in­stead. This is why we do not model bot­tle caps as EU max­i­miz­ers but do model chess pro­grams as (some­thing at least a lot like) EU max­i­miz­ers. This is also why the twitch­ing robot does not seem in­tel­li­gent to us, ab­sent other sub­sys­tems that we should model as EU max­i­miz­ers, but that’s a story for a differ­ent post.

I think we should ex­pect most sys­tems that it is a good idea for us to model as EU max­i­miz­ers to pur­sue con­ver­gent in­stru­men­tal goals like com­pu­ta­tional power and en­sur­ing sur­vival, etc. If I know the util­ity func­tion of an EU max­i­mizer bet­ter than I know its spe­cific be­hav­ior, of­ten the best way for me to pre­dict its be­hav­ior is by imag­in­ing what I would do in its cir­cum­stances if I had the same goal. Take a com­pli­cated util­ity like the util­ity func­tion that as­signs 1 to what­ever the ac­tual con­se­quences of the twitch­ing robot’s twitches are and 0 to any­thing else. Imag­ine that I did not have the util­ity func­tion speci­fied that way, which hides all of the com­plex­ity in “what­ever the ac­tual con­se­quences are.” Rather, imag­ine I had the util­ity speci­fied as an ex­tremely spe­cific de­scrip­tion of the world that gets scored above all else with­out refer­ence to the ac­tual twitch­ing pat­tern of the robot. If max­i­miz­ing that util­ity func­tion were my goal, it would seem like a good idea to me to get more com­pu­ta­tional power for pre­dict­ing the out­comes of my available ac­tions, to make sure that I am not turned off pre­ma­turely, and to try to get as ac­cu­rate a model of my en­vi­ron­ment as pos­si­ble.

In con­clu­sion, I agree with Shah that be­ing able to model a sys­tem as an EU max­i­mizer at all does not im­ply that its be­hav­ior is goal di­rected, but I think that sort of misses the point. If the best way for you to model a sys­tem is to model it as an EU max­i­mizer, then its be­hav­ior will seem goal di­rected to you, and if the short­est pro­gram that pre­dicts a sys­tem’s be­hav­ior does so by mod­el­ing it as an EU max­i­mizer, then its be­hav­ior will be goal di­rected (or at least up to an ad­di­tive con­stant). I think the best way for you to model most sys­tems that are more in­tel­li­gent than you will be to model them as EU max­i­miz­ers, or some­thing close, but again, that’s a story for a differ­ent post.