Superintelligence 10: Instrumentally convergent goals

This is part of a weekly read­ing group on Nick Bostrom’s book, Su­per­in­tel­li­gence. For more in­for­ma­tion about the group, and an in­dex of posts so far see the an­nounce­ment post. For the sched­ule of fu­ture top­ics, see MIRI’s read­ing guide.


Wel­come. This week we dis­cuss the tenth sec­tion in the read­ing guide: In­stru­men­tally con­ver­gent goals. This cor­re­sponds to the sec­ond part of Chap­ter 7.

This post sum­ma­rizes the sec­tion, and offers a few rele­vant notes, and ideas for fur­ther in­ves­ti­ga­tion. Some of my own thoughts and ques­tions for dis­cus­sion are in the com­ments.

There is no need to pro­ceed in or­der through this post, or to look at ev­ery­thing. Feel free to jump straight to the dis­cus­sion. And if you are be­hind on the book, don’t let it put you off dis­cussing. Where ap­pli­ca­ble and I re­mem­ber, page num­bers in­di­cate the rough part of the chap­ter that is most re­lated (not nec­es­sar­ily that the chap­ter is be­ing cited for the spe­cific claim).

Read­ing: In­stru­men­tal con­ver­gence from Chap­ter 7 (p109-114)


Summary

  1. The in­stru­men­tal con­ver­gence the­sis: we can iden­tify ‘con­ver­gent in­stru­men­tal val­ues’ (hence­forth CIVs). That is, sub­goals that are use­ful for a wide range of more fun­da­men­tal goals, and in a wide range of situ­a­tions. (p109)

  2. Even if we know noth­ing about an agent’s goals, CIVs let us pre­dict some of the agent’s be­hav­ior (p109)

  3. Some CIVs:

    1. Self-preser­va­tion: be­cause you are an ex­cel­lent per­son to en­sure your own goals are pur­sued in fu­ture.

    2. Goal-con­tent in­tegrity (i.e. not chang­ing your own goals): be­cause if you don’t have your goals any more, you can’t pur­sue them.

    3. Cog­ni­tive en­hance­ment: be­cause mak­ing bet­ter de­ci­sions helps with any goals.

    4. Tech­nolog­i­cal perfec­tion: be­cause tech­nol­ogy lets you have more use­ful re­sources.

    5. Re­source ac­qui­si­tion: be­cause a broad range of re­sources can sup­port a broad range of goals.

  4. For each CIV, there are plau­si­ble com­bi­na­tions of fi­nal goals and sce­nar­ios un­der which an agent would not pur­sue that CIV. (p109-114)

Notes

1. Why do we care about CIVs?
CIVs to ac­quire re­sources and to pre­serve one­self and one’s val­ues play im­por­tant roles in the ar­gu­ment for AI risk. The de­sired con­clu­sions are that we can already pre­dict that an AI would com­pete strongly with hu­mans for re­sources, and also than an AI once turned on will go to great lengths to stay on and in­tact.

2. Re­lated work
Steve Omo­hun­dro wrote the sem­i­nal pa­per on this topic. The LessWrong wiki links to all of the re­lated pa­pers I know of. Omo­hun­dro’s list of CIVs (or as he calls them, ‘ba­sic AI drives’) is a bit differ­ent from Bostrom’s:

  1. Self-improvement

  2. Rationality

  3. Preser­va­tion of util­ity functions

  4. Avoid­ing coun­terfeit utility

  5. Self-protection

  6. Ac­qui­si­tion and effi­cient use of resources

3. Con­ver­gence for val­ues and situ­a­tion­s
It seems po­ten­tially helpful to dis­t­in­guish con­ver­gence over situ­a­tions and con­ver­gence over val­ues. That is, to think of in­stru­men­tal goals on two axes—one of how uni­ver­sally agents with differ­ent val­ues would want the thing, and one of how large a range of situ­a­tions it is use­ful in. A ware­house full of corn is use­ful for al­most any goals, but only in the nar­row range of situ­a­tions where you are a corn-eat­ing or­ganism who fears an apoc­a­lypse (or you can trade it). A world of re­sources con­verted into com­put­ing hard­ware is ex­tremely valuable in a wide range of sce­nar­ios, but much more so if you don’t es­pe­cially value pre­serv­ing the nat­u­ral en­vi­ron­ment. Many things that are CIVs for hu­mans don’t make it onto Bostrom’s list, I pre­sume be­cause he ex­pects the sce­nario for AI to be differ­ent enough. For in­stance, procur­ing so­cial sta­tus is use­ful for all kinds of hu­man goals. For an AI in the situ­a­tion of a hu­man, it would ap­pear to also be use­ful. For an AI more pow­er­ful than the rest of the world com­bined, so­cial sta­tus is less helpful.

4. What sort of things are CIVs?
Ar­guably all CIVs men­tioned above could be clus­tered un­der ‘cause your goals to con­trol more re­sources’. This im­plies caus­ing more agents to have your val­ues (e.g. pro­tect­ing your val­ues in your­self), caus­ing those agents to have re­sources (e.g. get­ting re­sources and trans­form­ing them into bet­ter re­sources) and get­ting the agents to con­trol the re­sources effec­tively as well as nom­i­nally (e.g. cog­ni­tive en­hance­ment, ra­tio­nal­ity). It also sug­gests con­ver­gent val­ues we haven’t men­tioned. To cause more agents to have one’s val­ues, one might cre­ate or pro­tect other agents with your val­ues, or spread your val­ues to ex­ist­ing other agents. To im­prove the re­sources held by those with one’s val­ues, a very con­ver­gent goal in hu­man so­ciety is to trade. This leads to a con­ver­gent goal of cre­at­ing or ac­quiring re­sources which are highly val­ued by oth­ers, even if not by you. Money and so­cial in­fluence are par­tic­u­larly widely re­deemable ‘re­sources’. Trade also causes oth­ers to act like they have your val­ues when they don’t, which is a way of spread­ing one’s val­ues.

As I men­tioned above, my guess is that these are left out of Su­per­in­tel­li­gence be­cause they in­volve so­cial in­ter­ac­tions. I think Bostrom ex­pects a pow­er­ful sin­gle­ton, to whom other agents will be ir­rele­vant. If you are not con­fi­dent of the sin­gle­ton sce­nario, these CIVs might be more in­ter­est­ing.

5. Another dis­cus­sion
John Dana­her dis­cusses this sec­tion of Su­per­in­tel­li­gence, but not dis­agree­ably enough to read as ‘an­other view’.

Another view

I don’t know of any strong crit­i­cism of the in­stru­men­tal con­ver­gence the­sis, so I will play devil’s ad­vo­cate.

The con­cept of a sub-goal that is use­ful for many fi­nal goals is un­ob­jec­tion­able. How­ever the in­stru­men­tal con­ver­gence the­sis claims more than this, and this stronger claim is im­por­tant for the de­sired ar­gu­ment for AI doom. The fur­ther claims are also on less solid ground, as we shall see.

Ac­cord­ing to the in­stru­men­tal con­ver­gence the­sis, con­ver­gent in­stru­men­tal goals not only ex­ist, but can at least some­times be iden­ti­fied by us. This is needed for ar­gu­ing that we can fore­see that AI will pri­ori­tize grab­bing re­sources, and that it will be very hard to con­trol. That we can iden­tify con­ver­gent in­stru­men­tal goals may seem clear—af­ter all, we just did: self-preser­va­tion, in­tel­li­gence en­hance­ment and the like. How­ever to say any­thing in­ter­est­ing, our claim must not only be that these val­ues are bet­ter than not, but that they will be pri­ori­tized by the kinds of AI that will ex­ist, in a sub­stan­tial range of cir­cum­stances that will arise. This is far from clear, for sev­eral rea­sons.

Firstly, to know what the AI would pri­ori­tize we need to know some­thing about its al­ter­na­tives, and we can be much less con­fi­dent that we have thought of all of the al­ter­na­tive in­stru­men­tal val­ues an AI might have. For in­stance, in the ab­stract in­tel­li­gence en­hance­ment may seem con­ver­gently valuable, but in prac­tice adult hu­mans de­vote lit­tle effort to it. This is be­cause in­vest­ments in in­tel­li­gence are rarely com­pet­i­tive with other en­deav­ors.

Se­condly, we haven’t said any­thing quan­ti­ta­tive about how gen­eral or strong our pro­posed con­ver­gent in­stru­men­tal val­ues are likely to be, or how we are weight­ing the space of pos­si­ble AI val­ues. Without even any guesses, it is hard to know what to make of re­sult­ing pre­dic­tions. The qual­i­ta­tive­ness of the dis­cus­sion also raises the con­cern that think­ing on the prob­lem has not been very con­crete, and so may not be en­gaged with what is likely in prac­tice.

Thirdly, we have ar­rived at these con­ver­gent in­stru­men­tal goals by the­o­ret­i­cal ar­gu­ments about what we think of as de­fault ra­tio­nal agents and ‘nor­mal’ cir­cum­stances. Th­ese may be very differ­ent dis­tri­bu­tions of agents and sce­nar­ios from those pro­duced by our en­g­ineer­ing efforts. For in­stance, per­haps al­most all con­ceiv­able sets of val­ues—in what­ever sense—would fa­vor ac­cru­ing re­sources ruth­lessly. It would still not be that sur­pris­ing if an agent some­how cre­ated nois­ily from hu­man val­ues cared about only ac­quiring re­sources by cer­tain means or had blan­ket ill-feel­ings about greed.

In sum, it is un­clear that we can iden­tify im­por­tant con­ver­gent in­stru­men­tal val­ues, and con­se­quently un­clear that such con­sid­er­a­tions can strongly help pre­dict the be­hav­ior of real fu­ture AI agents.

In-depth investigations

If you are par­tic­u­larly in­ter­ested in these top­ics, and want to do fur­ther re­search, these are a few plau­si­ble di­rec­tions, some in­spired by Luke Muehlhauser’s list, which con­tains many sug­ges­tions re­lated to parts of Su­per­in­tel­li­gence. Th­ese pro­jects could be at­tempted at var­i­ous lev­els of depth.

  1. Do ap­prox­i­mately all fi­nal goals make an op­ti­mizer want to ex­pand be­yond the cos­molog­i­cal hori­zon?

  2. Can we say any­thing more quan­ti­ta­tive about the strength or prevalence of these con­ver­gent in­stru­men­tal val­ues?

  3. Can we say more about val­ues that are likely to be con­ver­gently in­stru­men­tal just across AIs that are likely to be de­vel­oped, and situ­a­tions they are likely to find them­selves in?

    If you are in­ter­ested in any­thing like this, you might want to men­tion it in the com­ments, and see whether other peo­ple have use­ful thoughts.

    How to proceed

    This has been a col­lec­tion of notes on the chap­ter. The most im­por­tant part of the read­ing group though is dis­cus­sion, which is in the com­ments sec­tion. I pose some ques­tions for you there, and I in­vite you to add your own. Please re­mem­ber that this group con­tains a va­ri­ety of lev­els of ex­per­tise: if a line of dis­cus­sion seems too ba­sic or too in­com­pre­hen­si­ble, look around for one that suits you bet­ter!

    Next week, we will talk about the treach­er­ous turn. To pre­pare, read “Ex­is­ten­tial catas­tro­phe…” and “The treach­er­ous turn” from Chap­ter 8. The dis­cus­sion will go live at 6pm Pa­cific time next Mon­day 24th Novem­ber. Sign up to be no­tified here.