Superintelligence 16: Tool AIs

This is part of a weekly read­ing group on Nick Bostrom’s book, Su­per­in­tel­li­gence. For more in­for­ma­tion about the group, and an in­dex of posts so far see the an­nounce­ment post. For the sched­ule of fu­ture top­ics, see MIRI’s read­ing guide.

Wel­come. This week we dis­cuss the six­teenth sec­tion in the read­ing guide: Tool AIs. This cor­re­sponds to the last parts of Chap­ter Ten.

This post sum­ma­rizes the sec­tion, and offers a few rele­vant notes, and ideas for fur­ther in­ves­ti­ga­tion. Some of my own thoughts and ques­tions for dis­cus­sion are in the com­ments.

There is no need to pro­ceed in or­der through this post, or to look at ev­ery­thing. Feel free to jump straight to the dis­cus­sion. Where ap­pli­ca­ble and I re­mem­ber, page num­bers in­di­cate the rough part of the chap­ter that is most re­lated (not nec­es­sar­ily that the chap­ter is be­ing cited for the spe­cific claim).

Read­ing: : “Tool-AIs” and “Com­par­i­son” from Chap­ter 10


  1. Tool AI: an AI that is not ‘like an agent’, but more like an ex­cel­lent ver­sion of con­tem­po­rary soft­ware. Most no­tably per­haps, it is not goal-di­rected (p151)

  2. Con­tem­po­rary soft­ware may be safe be­cause it has low ca­pa­bil­ity rather than be­cause it re­li­ably does what you want, sug­gest­ing a very smart ver­sion of con­tem­po­rary soft­ware would be dan­ger­ous (p151)

  3. Hu­mans of­ten want to figure out how to do a thing that they don’t already know how to do. Nar­row AI is already used to search for solu­tions. Au­tomat­ing this search seems to mean giv­ing the ma­chine a goal (that of find­ing a great way to make pa­per­clips, for in­stance). That is, just car­ry­ing out a pow­er­ful search seems to have many of the prob­lems of AI. (p152)

  4. A ma­chine in­tended to be a tool may cause similar prob­lems to a ma­chine in­tended to be an agent, by search­ing to pro­duce plans that are per­verse in­stan­ti­a­tions, in­fras­truc­ture profu­sions or mind crimes. It may ei­ther carry them out it­self or give the plan to a hu­man to carry out. (p153)

  5. A ma­chine in­tended to be a tool may have agent-like parts. This could hap­pen if its in­ter­nal pro­cesses need to be op­ti­mized, and so it con­tains strong search pro­cesses for do­ing this. (p153)

  6. If tools are likely to ac­ci­den­tally be agent-like, it would prob­a­bly be bet­ter to just build agents on pur­pose and have more in­ten­tional con­trol over the de­sign. (p155)

  7. Which castes of AI are safest is un­clear and de­pends on cir­cum­stances. (p158)

Another view

Holden prompted dis­cus­sion of the Tool AI in 2012, in one of sev­eral Thoughts on the Sin­gu­lar­ity In­sti­tute:

...Google Maps is a type of ar­tifi­cial in­tel­li­gence (AI). It is far more in­tel­li­gent than I am when it comes to plan­ning routes.

Google Maps—by which I mean the com­plete soft­ware pack­age in­clud­ing the dis­play of the map it­self—does not have a “util­ity” that it seeks to max­i­mize. (One could fit a util­ity func­tion to its ac­tions, as to any set of ac­tions, but there is no sin­gle “pa­ram­e­ter to be max­i­mized” driv­ing its op­er­a­tions.)

Google Maps (as I un­der­stand it) con­sid­ers mul­ti­ple pos­si­ble routes, gives each a score based on fac­tors such as dis­tance and likely traf­fic, and then dis­plays the best-scor­ing route in a way that makes it eas­ily un­der­stood by the user. If I don’t like the route, for what­ever rea­son, I can change some pa­ram­e­ters and con­sider a differ­ent route. If I like the route, I can print it out or email it to a friend or send it to my phone’s nav­i­ga­tion ap­pli­ca­tion. Google Maps has no sin­gle pa­ram­e­ter it is try­ing to max­i­mize; it has no rea­son to try to “trick” me in or­der to in­crease its util­ity.

In short, Google Maps is not an agent, tak­ing ac­tions in or­der to max­i­mize a util­ity pa­ram­e­ter. It is a tool, gen­er­at­ing in­for­ma­tion and then dis­play­ing it in a user-friendly man­ner for me to con­sider, use and ex­port or dis­card as I wish.

Every soft­ware ap­pli­ca­tion I know of seems to work es­sen­tially the same way, in­clud­ing those that in­volve (spe­cial­ized) ar­tifi­cial in­tel­li­gence such as Google Search, Siri, Wat­son, Ry­bka, etc. Some can be put into an “agent mode” (as Wat­son was on Jeop­ardy!) but all can eas­ily be set up to be used as “tools” (for ex­am­ple, Wat­son can sim­ply dis­play its top can­di­date an­swers to a ques­tion, with the score for each, with­out speak­ing any of them.)

The “tool mode” con­cept is im­por­tantly differ­ent from the pos­si­bil­ity of Or­a­cle AI some­times dis­cussed by SI. The dis­cus­sions I’ve seen of Or­a­cle AI pre­sent it as an Un­friendly AI that is “trapped in a box”—an AI whose in­tel­li­gence is driven by an ex­plicit util­ity func­tion and that hu­mans hope to con­trol co­er­cively. Hence the dis­cus­sion of ideas such as the AI-Box Ex­per­i­ment. A differ­ent in­ter­pre­ta­tion, given in Karnofsky/​Tal­linn 2011, is an AI with a care­fully de­signed util­ity func­tion—likely as difficult to con­struct as “Friendli­ness”—that leaves it “wish­ing” to an­swer ques­tions helpfully. By con­trast with both these ideas, Tool-AGI is not “trapped” and it is not Un­friendly or Friendly; it has no mo­ti­va­tions and no driv­ing util­ity func­tion of any kind, just like Google Maps. It scores differ­ent pos­si­bil­ities and dis­plays its con­clu­sions in a trans­par­ent and user-friendly man­ner, as its in­struc­tions say to do; it does not have an over­ar­ch­ing “want,” and so, as with the spe­cial­ized AIs de­scribed above, while it may some­times “mis­in­ter­pret” a ques­tion (thereby scor­ing op­tions poorly and rank­ing the wrong one #1) there is no rea­son to ex­pect in­ten­tional trick­ery or ma­nipu­la­tion when it comes to dis­play­ing its re­sults.

Another way of putting this is that a “tool” has an un­der­ly­ing in­struc­tion set that con­cep­tu­ally looks like: “(1) Calcu­late which ac­tion A would max­i­mize pa­ram­e­ter P, based on ex­ist­ing data set D. (2) Sum­ma­rize this calcu­la­tion in a user-friendly man­ner, in­clud­ing what Ac­tion A is, what likely in­ter­me­di­ate out­comes it would cause, what other ac­tions would re­sult in high val­ues of P, etc.” An “agent,” by con­trast, has an un­der­ly­ing in­struc­tion set that con­cep­tu­ally looks like: “(1) Calcu­late which ac­tion, A, would max­i­mize pa­ram­e­ter P, based on ex­ist­ing data set D. (2) Ex­e­cute Ac­tion A.” In any AI where (1) is sep­a­rable (by the pro­gram­mers) as a dis­tinct step, (2) can be set to the “tool” ver­sion rather than the “agent” ver­sion, and this sep­a­ra­bil­ity is in fact pre­sent with most/​all mod­ern soft­ware. Note that in the “tool” ver­sion, nei­ther step (1) nor step (2) (nor the com­bi­na­tion) con­sti­tutes an in­struc­tion to max­i­mize a pa­ram­e­ter—to de­scribe a pro­gram of this kind as “want­ing” some­thing is a cat­e­gory er­ror, and there is no rea­son to ex­pect its step (2) to be de­cep­tive.

I elab­o­rated fur­ther on the dis­tinc­tion and on the con­cept of a tool-AI in Karnofsky/​Tal­linn 2011.

This is im­por­tant be­cause an AGI run­ning in tool mode could be ex­traor­di­nar­ily use­ful but far more safe than an AGI run­ning in agent mode...


1. While Holden’s post was prob­a­bly not the first to dis­cuss this kind of AI, it prompted many re­sponses. Eliezer ba­si­cally said that non-catas­trophic tool AI doesn’t seem that easy to spec­ify for­mally; that even if tool AI is best, agent-AI re­searchers are prob­a­bly pretty use­ful to that prob­lem; and that it’s not so bad of MIRI to not dis­cuss tool AI more, since there are a bunch of things other peo­ple think are similarly ob­vi­ously in need of dis­cus­sion. Luke ba­si­cally agreed with Eliezer. Stu­art ar­gues that hav­ing a tool clearly com­mu­ni­cate pos­si­bil­ities is a hard prob­lem, and talks about some other prob­lems. Com­menters say many things, in­clud­ing that only one AI needs to be agent-like to have a prob­lem, and that it’s not clear what it means for a pow­er­ful op­ti­mizer to not have goals.

2. A prob­lem of­ten brought up with pow­er­ful AIs is that when tasked with com­mu­ni­cat­ing, they will try to de­ceive you into lik­ing plans that will fulfil their goals. It seems to me that you can avoid such de­cep­tion prob­lems by us­ing a tool which searches for a plan you could do that would pro­duce a lot of pa­per­clips, rather than a tool that searches for a string that it could say to you that would pro­duce a lot of pa­per­clips. A plan that pro­duces many pa­per­clips but sounds so bad that you won’t do it still does bet­ter than a per­sua­sive lower-pa­per­clip plan on the pro­posed met­ric. There is still a dan­ger that you just won’t no­tice the per­verse way in which the in­struc­tions sug­gested to you will be in­stan­ti­ated, but at least the plan won’t be de­signed to hide it.

3. Note that in com­puter sci­ence, an ‘agent’ means some­thing other than ‘a ma­chine with a goal’, though it seems they haven’t set­tled on ex­actly what [some ex­am­ple efforts (pdf)].

Figure: A ‘sim­ple re­flex agent’ is not goal di­rected (but kind of looks goal-di­rected: one in ac­tion)

4. Bostrom seems to as­sume that a pow­er­ful tool would be a search pro­cess. This is re­lated to the idea that in­tel­li­gence is an ‘op­ti­miza­tion pro­cess’. But this is more of a defi­ni­tion than an em­piri­cal re­la­tion­ship be­tween the kinds of tech­nol­ogy we are think­ing of as in­tel­li­gent and the kinds of pro­cesses we think of as ‘search­ing’. Could there be things that merely con­tribute mas­sively to the in­tel­li­gence of a hu­man—such that we would think of them as very in­tel­li­gent tools—that nat­u­rally for­ward what­ever goals the hu­man has?

One can imag­ine a tool that is told what you are plan­ning to do, and tries to de­scribe the ma­jor con­se­quences of it. This is a search or op­ti­miza­tion pro­cess in the sense that it out­puts some­thing im­prob­a­bly apt from a large space of pos­si­ble out­puts, but that qual­ity alone seems not enough to make some­thing dan­ger­ous. For one thing, the ma­chine is not se­lect­ing out­puts for their effect on the world, but rather for their ac­cu­racy as de­scrip­tions. For an­other, the pro­cess be­ing run may not be an ac­tual ‘search’ in the sense of check­ing lots of things and find­ing one that does well on some crite­ria. It could for in­stance perform a com­pli­cated trans­for­ma­tion on the in­com­ing data and spit out the re­sult.

5. One ob­vi­ous prob­lem with tools is that they main­tain hu­mans as a com­po­nent in all goal-di­rected be­hav­ior. If hu­mans are some com­bi­na­tion of slow and rare com­pared to ar­tifi­cial in­tel­li­gence, there may be strong pres­sure to au­to­mate all as­pects of de­ci­sion­mak­ing, i.e. use agents.

In-depth investigations

If you are par­tic­u­larly in­ter­ested in these top­ics, and want to do fur­ther re­search, these are a few plau­si­ble di­rec­tions, some in­spired by Luke Muehlhauser’s list, which con­tains many sug­ges­tions re­lated to parts of Su­per­in­tel­li­gence. Th­ese pro­jects could be at­tempted at var­i­ous lev­els of depth.

  1. Would pow­er­ful tools nec­es­sar­ily be­come goal-di­rected agents in the trou­bling sense?

  2. Are differ­ent types of en­tity gen­er­ally likely to be­come op­ti­miz­ers, if they are not? If so, which ones? Un­der what dy­nam­ics? Are tool-ish or Or­a­cle-ish things sta­ble at­trac­tors in this way?

  3. Can we spec­ify com­mu­ni­ca­tion be­hav­ior in a way that doesn’t rely on hav­ing goals about the in­ter­locu­tor’s in­ter­nal state or be­hav­ior?

  4. If you as­sume (per­haps im­pos­si­bly) strong ver­sions of some nar­row-AI ca­pa­bil­ities, can you de­sign a safe tool which uses them? e.g. If you had a near perfect pre­dic­tor, can you de­sign a safe su­per-Google Maps?

If you are in­ter­ested in any­thing like this, you might want to men­tion it in the com­ments, and see whether other peo­ple have use­ful thoughts.

How to proceed

This has been a col­lec­tion of notes on the chap­ter. The most im­por­tant part of the read­ing group though is dis­cus­sion, which is in the com­ments sec­tion. I pose some ques­tions for you there, and I in­vite you to add your own. Please re­mem­ber that this group con­tains a va­ri­ety of lev­els of ex­per­tise: if a line of dis­cus­sion seems too ba­sic or too in­com­pre­hen­si­ble, look around for one that suits you bet­ter!

Next week, we will talk about mul­ti­po­lar sce­nar­ios—i.e. situ­a­tions where a sin­gle AI doesn’t take over the world. To pre­pare, read “Of horses and men” from Chap­ter 11. The dis­cus­sion will go live at 6pm Pa­cific time next Mon­day 5 Jan­uary. Sign up to be no­tified here.