Superintelligence 14: Motivation selection methods

This is part of a weekly read­ing group on Nick Bostrom’s book, Su­per­in­tel­li­gence. For more in­for­ma­tion about the group, and an in­dex of posts so far see the an­nounce­ment post. For the sched­ule of fu­ture top­ics, see MIRI’s read­ing guide.

Wel­come. This week we dis­cuss the four­teenth sec­tion in the read­ing guide: Mo­ti­va­tion se­lec­tion meth­ods. This cor­re­sponds to the sec­ond part of Chap­ter Nine.

This post sum­ma­rizes the sec­tion, and offers a few rele­vant notes, and ideas for fur­ther in­ves­ti­ga­tion. Some of my own thoughts and ques­tions for dis­cus­sion are in the com­ments.

There is no need to pro­ceed in or­der through this post, or to look at ev­ery­thing. Feel free to jump straight to the dis­cus­sion. Where ap­pli­ca­ble and I re­mem­ber, page num­bers in­di­cate the rough part of the chap­ter that is most re­lated (not nec­es­sar­ily that the chap­ter is be­ing cited for the spe­cific claim).

Read­ing: “Mo­ti­va­tion se­lec­tion meth­ods” and “Synop­sis” from Chap­ter 9.


  1. One way to con­trol an AI is to de­sign its mo­tives. That is, to choose what it wants to do (p138)

  2. Some va­ri­eties of ‘mo­ti­va­tion se­lec­tion’ for AI safety:

    1. Direct speci­fi­ca­tion: figure out what we value, and code it into the AI (p139-40)

      1. Isaac Asi­mov’s ‘three laws of robotics’ are a fa­mous example

      2. Direct speci­fi­ca­tion might be fairly hard: both figur­ing out what we want and cod­ing it pre­cisely seem hard

      3. This could be based on rules, or some­thing like consequentialism

    2. Do­mes­tic­ity: the AI’s goals limit the range of things it wants to in­terfere with (140-1)

      1. This might make di­rect speci­fi­ca­tion eas­ier, as the world the AI in­ter­acts with (and thus which has to be thought of in spec­i­fy­ing its be­hav­ior) is sim­pler.

      2. Or­a­cles are an example

      3. This might be com­bined well with phys­i­cal con­tain­ment: the AI could be trapped, and also not want to es­cape.

    3. Indi­rect nor­ma­tivity: in­stead of spec­i­fy­ing what we value, spec­ify a way to spec­ify what we value (141-2)

      1. e.g. ex­trap­o­late our volition

      2. This means out­sourc­ing the hard in­tel­lec­tual work to the AI

      3. This will mostly be dis­cussed in chap­ter 13 (weeks 23-5 here)

    4. Aug­men­ta­tion: be­gin with a crea­ture with de­sir­able mo­tives, then make it smarter, in­stead of de­sign­ing good mo­tives from scratch. (p142)

      1. e.g. brain em­u­la­tions are likely to have hu­man de­sires (at least at the start)

      2. Whether we use this method de­pends on the kind of AI that is de­vel­oped, so usu­ally we won’t have a choice about whether to use it (ex­cept inas­much as we have a choice about e.g. whether to de­velop up­loads or syn­thetic AI first).

  3. Bostrom pro­vides a sum­mary of the chap­ter:

  4. The ques­tion is not which con­trol method is best, but rather which set of con­trol meth­ods are best given the situ­a­tion. (143-4)

Another view


Would you say there’s any eth­i­cal is­sue in­volved with im­pos­ing limits or con­straints on a su­per­in­tel­li­gence’s drives/​mo­ti­va­tions? By anal­ogy, I think most of us have the moral in­tu­ition that tech­nolog­i­cally in­terfer­ing with an un­born hu­man’s in­her­ent de­sires and mo­ti­va­tions would be ques­tion­able or wrong, sup­pos­ing that were even pos­si­ble. That is, say we could ge­net­i­cally mod­ify a sub­set of hu­man­ity to be cheer­ful slaves; that seems like a pretty morally un­sa­vory prospect. What makes en­g­ineer­ing a su­per­in­tel­li­gence speci­fi­cally to serve hu­man­ity less un­sa­vory?


1. Bostrom tells us that it is very hard to spec­ify hu­man val­ues. We have seen ex­am­ples of galax­ies full of pa­per­clips or fake smiles re­sult­ing from poor speci­fi­ca­tion. But these—and Isaac Asi­mov’s sto­ries—seem to tell us only that a few peo­ple spend­ing a small frac­tion of their time think­ing does not pro­duce any wa­ter­tight speci­fi­ca­tion. What if a thou­sand re­searchers spent a decade on it? Are the mil­lionth most ob­vi­ous at­tempts at speci­fi­ca­tion nearly as bad as the most ob­vi­ous twenty? How hard is it? A gen­eral ar­gu­ment for pes­simism is the the­sis that ‘value is frag­ile’, i.e. that if you spec­ify what you want very nearly but get it a tiny bit wrong, it’s likely to be al­most worth­less. Much like if you get one digit wrong in a phone num­ber. The de­gree to which this is so (with re­spect to value, not phone num­bers) is con­tro­ver­sial. I en­courage you to try to spec­ify a world you would be happy with (to see how hard it is, or pro­duce some­thing of value if it isn’t that hard).

2. If you’d like a taste of in­di­rect nor­ma­tivity be­fore the chap­ter on it, the LessWrong wiki page on co­her­ent ex­trap­o­lated vo­li­tion links to a bunch of sources.

3. The idea of ‘in­di­rect nor­ma­tivity’ (i.e. out­sourc­ing the prob­lem of spec­i­fy­ing what an AI should do, by giv­ing it some good in­struc­tions for figur­ing out what you value) brings up the gen­eral ques­tion of just what an AI needs to be given to be able to figure out how to carry out our will. An ob­vi­ous con­tender is a lot of in­for­ma­tion about hu­man val­ues. Though some peo­ple dis­agree with this—these peo­ple don’t buy the or­thog­o­nal­ity the­sis. Other is­sues some­times sug­gested to need work­ing out ahead of out­sourc­ing ev­ery­thing to AIs in­clude de­ci­sion the­ory, pri­ors, an­throp­ics, feel­ings about pas­cal’s mug­ging, and at­ti­tudes to in­finity. MIRI’s tech­ni­cal work of­ten fits into this cat­e­gory.

4. Dana­her’s last post on Su­per­in­tel­li­gence (so far) is on mo­ti­va­tion se­lec­tion. It mostly sum­ma­rizes and clar­ifies the chap­ter, so is mostly good if you’d like to think about the ques­tion some more with a slightly differ­ent fram­ing. He also pre­vi­ously con­sid­ered the difficulty of spec­i­fy­ing hu­man val­ues in The golem ge­nie and un­friendly AI (parts one and two), which is about In­tel­li­gence Ex­plo­sion and Ma­chine Ethics.

5. Brian Clegg thinks Bostrom should have dis­cussed Asi­mov’s sto­ries at greater length:

I think it’s a shame that Bostrom doesn’t make more use of sci­ence fic­tion to give ex­am­ples of how peo­ple have already thought about these is­sues – he gives only half a page to Asi­mov and the three laws of robotics (and how Asi­mov then spends most of his time show­ing how they’d go wrong), but that’s about it. Yet there has been a lot of thought and dare I say it, a lot more read­abil­ity than you typ­i­cally get in a text­book, put into the is­sues in sci­ence fic­tion than is be­ing al­lowed for, and it would have been wor­thy of a chap­ter in its own right.

If you haven’t already, you might con­sider (sort-of) fol­low­ing his ad­vice, and read­ing some sci­ence fic­tion.

In-depth investigations

If you are par­tic­u­larly in­ter­ested in these top­ics, and want to do fur­ther re­search, these are a few plau­si­ble di­rec­tions, some in­spired by Luke Muehlhauser’s list, which con­tains many sug­ges­tions re­lated to parts of Su­per­in­tel­li­gence. Th­ese pro­jects could be at­tempted at var­i­ous lev­els of depth.

  1. Can you think of novel meth­ods of spec­i­fy­ing the val­ues of one or many hu­mans?

  2. What are the most promis­ing meth­ods for ‘do­mes­ti­cat­ing’ an AI? (i.e. con­strain­ing it to only care about a small part of the world, and not want to in­terfere with the larger world to op­ti­mize that smaller part).

  3. Think more care­fully about the likely mo­ti­va­tions of dras­ti­cally aug­ment­ing brain emulations

If you are in­ter­ested in any­thing like this, you might want to men­tion it in the com­ments, and see whether other peo­ple have use­ful thoughts.

How to proceed

This has been a col­lec­tion of notes on the chap­ter. The most im­por­tant part of the read­ing group though is dis­cus­sion, which is in the com­ments sec­tion. I pose some ques­tions for you there, and I in­vite you to add your own. Please re­mem­ber that this group con­tains a va­ri­ety of lev­els of ex­per­tise: if a line of dis­cus­sion seems too ba­sic or too in­com­pre­hen­si­ble, look around for one that suits you bet­ter!

Next week, we will start to talk about a va­ri­ety of more and less agent-like AIs: ‘or­a­cles’, ge­nies’ and ‘sovereigns’. To pre­pare, read Chap­ter “Or­a­cles” and “Ge­nies and Sovereigns” from Chap­ter 10. The dis­cus­sion will go live at 6pm Pa­cific time next Mon­day 22nd De­cem­ber. Sign up to be no­tified here.