Superintelligence 15: Oracles, genies and sovereigns

This is part of a weekly read­ing group on Nick Bostrom’s book, Su­per­in­tel­li­gence. For more in­for­ma­tion about the group, and an in­dex of posts so far see the an­nounce­ment post. For the sched­ule of fu­ture top­ics, see MIRI’s read­ing guide.

Wel­come. This week we dis­cuss the fif­teenth sec­tion in the read­ing guide: Or­a­cles, ge­nies, and sovereigns. This cor­re­sponds to the first part of Chap­ter ten.

This post sum­ma­rizes the sec­tion, and offers a few rele­vant notes, and ideas for fur­ther in­ves­ti­ga­tion. Some of my own thoughts and ques­tions for dis­cus­sion are in the com­ments.

There is no need to pro­ceed in or­der through this post, or to look at ev­ery­thing. Feel free to jump straight to the dis­cus­sion. Where ap­pli­ca­ble and I re­mem­ber, page num­bers in­di­cate the rough part of the chap­ter that is most re­lated (not nec­es­sar­ily that the chap­ter is be­ing cited for the spe­cific claim).

Read­ing: “Or­a­cles” and “Ge­nies and Sovereigns” from Chap­ter 10


  1. Strong AIs might come in differ­ent forms or ‘castes’, such as or­a­cles, ge­nies, sovereigns and tools. (p145)

  2. Or­a­cle: an AI that does noth­ing but an­swer ques­tions. (p145)

    1. The abil­ity to make a good or­a­cle prob­a­bly al­lows you to make a gen­er­ally ca­pa­ble AI. (p145)

    2. Nar­row su­per­in­tel­li­gent or­a­cles ex­ist: e.g. calcu­la­tors. (p145-6)

    3. An or­a­cle could be a non-agentlike ‘tool’ (more next week) or it could be a ra­tio­nal agent con­strained to only act through an­swer­ing ques­tions (p146)

    4. There are var­i­ous ways to try to con­strain an or­a­cle, through mo­ti­va­tion se­lec­tion (see last week) and ca­pa­bil­ity con­trol (see the pre­vi­ous week) (p146-7)

    5. An or­a­cle whose goals are not al­igned with yours might still be use­ful (p147-8)

    6. An or­a­cle might be mi­sused, even if it works as in­tended (p148)

  3. Ge­nie: an AI that car­ries out a high level com­mand, then waits for an­other. (p148)

    1. It would be nice if a ge­nie sought to un­der­stand and obey your in­ten­tions, rather than your ex­act words. (p149)

  4. Sovereign: an AI that acts au­tonomously in the world, in pur­suit of po­ten­tially long range ob­jec­tives (p148)

  5. A ge­nie or a sovereign might have pre­view func­tion­al­ity, where it de­scribes what it will do be­fore do­ing it. (p149)

  6. A ge­nie seems more dan­ger­ous than an or­a­cle: if you are go­ing to strongly phys­i­cally con­tain the or­a­cle, you may have been bet­ter just deny­ing it so much ac­cess to the world and ask­ing for blueprints in­stead of ac­tions. (p148)

  7. The line be­tween ge­nies and sovereigns is fine. (p149)

  8. All of the castes could em­u­late all of the other castes more or less, so they do not differ in their ul­ti­mate ca­pa­bil­ities. How­ever they rep­re­sent differ­ent ap­proaches to the con­trol prob­lem. (p150)

  9. The or­der­ing of safety of these castes is not as ob­vi­ous as it may seem, once we con­sider fac­tors such as de­pen­dence on a sin­gle hu­man, and added dan­gers of cre­at­ing strong agents whose goals don’t match our own (even if they are tame ‘do­mes­ti­cated’ goals). (p150)

Another view

An old re­sponse to sug­ges­tions of or­a­cle AI, from Eliezer Yud­kowsky (I don’t know how closely this matches his cur­rent view):

When some­one rein­vents the Or­a­cle AI, the most com­mon open­ing re­mark runs like this:

“Why not just have the AI an­swer ques­tions, in­stead of try­ing to do any­thing? Then it wouldn’t need to be Friendly. It wouldn’t need any goals at all. It would just an­swer ques­tions.”

To which the re­ply is that the AI needs goals in or­der to de­cide how to think: that is, the AI has to act as a pow­er­ful op­ti­miza­tion pro­cess in or­der to plan its ac­qui­si­tion of knowl­edge, effec­tively dis­till sen­sory in­for­ma­tion, pluck “an­swers” to par­tic­u­lar ques­tions out of the space of all pos­si­ble re­sponses, and of course, to im­prove its own source code up to the level where the AI is a pow­er­ful in­tel­li­gence. All these events are “im­prob­a­ble” rel­a­tive to ran­dom or­ga­ni­za­tions of the AI’s RAM, so the AI has to hit a nar­row tar­get in the space of pos­si­bil­ities to make su­per­in­tel­li­gent an­swers come out.

Now, why might one think that an Or­a­cle didn’t need goals? Be­cause on a hu­man level, the term “goal” seems to re­fer to those times when you said, “I want to be pro­moted”, or “I want a cookie”, and when some­one asked you “Hey, what time is it?” and you said “7:30” that didn’t seem to in­volve any goals. Im­plic­itly, you wanted to an­swer the ques­tion; and im­plic­itly, you had a whole, com­pli­cated, func­tion­ally op­ti­mized brain that let you an­swer the ques­tion; and im­plic­itly, you were able to do so be­cause you looked down at your highly op­ti­mized watch, that you bought with money, us­ing your skill of turn­ing your head, that you ac­quired by virtue of cu­ri­ous crawl­ing as an in­fant. But that all takes place in the in­visi­ble back­ground; it didn’t feel like you wanted any­thing.

Thanks to em­pathic in­fer­ence, which uses your own brain as an un­opened black box to pre­dict other black boxes, it can feel like “ques­tion-an­swer­ing” is a de­tach­able thing that comes loose of all the op­ti­miza­tion pres­sures be­hind it—even the ex­is­tence of a pres­sure to an­swer ques­tions!


1. What are the axes we are talk­ing about?

This chap­ter talks about differ­ent types or ‘castes’ of AI. But there are lots of differ­ent ways you could di­vide up kinds of AI (e.g. ear­lier we saw brain em­u­la­tions vs. syn­thetic AI). So in what ways are we di­vid­ing them here? They are re­lated to differ­ent ap­proaches to the con­trol prob­lem, but don’t ap­pear to be straight­for­wardly defined by them.

It seems to me we are look­ing at some­thing close to these two axes:

  • Goal-di­rect­ed­ness: the ex­tent to which the AI acts in ac­cor­dance with a set of prefer­ences (in­stead of for in­stance re­act­ing di­rectly to stim­uli, or fol­low­ing rules with­out re­gard to con­se­quence)

  • Over­sight: the de­gree to which hu­mans have an on­go­ing say in what the AI does (in­stead of the AI mak­ing all de­ci­sions it­self)

The castes fit on these axes some­thing like this:

They don’t quite neatly fit—tools are spread be­tween two places, and or­a­cles are a kind of tool (or a kind of ge­nie if they are of the highly con­strained agent va­ri­ety). But I find this a use­ful way to think about these kinds of AI.

Note that when we think of ‘tools’, we usu­ally think of them hav­ing a lot of over­sight—that is, be­ing used by a hu­man, who is mak­ing de­ci­sions all the time. How­ever you might also imag­ine what I have called ‘au­tonomous tools’, which run on their own but aren’t goal di­rected. For in­stance an AI that con­tinu­ally reads sci­en­tific pa­pers and turns out ac­cu­rate and en­gag­ing sci­ence books, with­out par­tic­u­larly op­ti­miz­ing for do­ing this more effi­ciently or try­ing to get any par­tic­u­lar out­come.

We have two weeks on this chap­ter, so I think it will be good to fo­cus a bit on goal di­rect­ed­ness one week and over­sight the other, alongside the ad­ver­tised top­ics of spe­cific castes. So this week let’s fo­cus on over­sight, since tools (next week) pri­mar­ily differ from the other castes men­tioned in not be­ing goal-di­rected.

2. What do goal-di­rect­ed­ness and over­sight have to do with each other?

Why con­sider goal-di­rect­ed­ness and over­sight to­gether? It seems to me there are a cou­ple of rea­sons.

Goal-di­rect­ed­ness and over­sight are sub­sti­tutes, broadly. The more you di­rect a ma­chine, the less it needs to di­rect it­self. Some­how the ma­chine has to as­sist with some goals, so ei­ther you or the ma­chine needs to care about those goals and di­rect the ma­chine ac­cord­ing to them. The ‘au­tonomous tools’ I men­tioned ap­pear to be ex­cep­tions, but they only seem plau­si­ble for a limited range of tasks where min­i­mal goal di­rec­tion is needed be­yond what a de­signer can do ahead of time.

Another way goal-di­rect­ed­ness and over­sight are con­nected is that we might ex­pect both to change as we be­come bet­ter able to al­ign an AI’s goals with our own. In or­der for an AI to be al­igned with our goals, the AI must nat­u­rally be goal-di­rected. Also, bet­ter al­ign­ment should make over­sight less nec­es­sary.

3. A note on names

‘Sovereign AI’ sounds pow­er­ful and far reach­ing. Note that more mun­dane AIs would also fit un­der this cat­e­gory. For in­stance, an AI who works at an office and doesn’t take over the world would also be a sovereign AI. You would be a sovereign AI if you were ar­tifi­cial.

4. Costs of oversight

Bostrom dis­cussed some prob­lems with ge­nies. I’ll men­tion a few oth­ers.

One clear down­side of a ma­chine which fol­lows your in­struc­tions and awaits your con­sent is that you have to be there giv­ing in­struc­tions and con­sent­ing to things. In a world full of pow­er­ful AIs which needed such over­sight, there might be plenty of spare hu­man la­bor around to do this at the start, if each AI doesn’t need too much over­sight. How­ever a need for hu­man over­sight might bot­tle­neck the pro­lifer­a­tion of such AIs.

Another down­side of us­ing hu­man la­bor be­yond the cost to the hu­man is that it might be pro­hibitively slow, de­pend­ing on the over­sight re­quired. If you only had to check in with the AI daily, and it did uni­mag­in­ably many tasks the rest of the time, over­sight prob­a­bly wouldn’t be a great cost. How­ever if you had to be in the loop and fully un­der­stand the de­ci­sion ev­ery time the AI chose how to al­lo­cate its in­ter­nal re­sources, things could get very slow.

Even if these costs are minor com­pared to the value of avoid­ing catas­tro­phe, they may be too large to al­low well over­seen AIs to com­pete with more au­tonomous AIs. Espe­cially if the over­sight is mostly to avoid low prob­a­bil­ity ter­rible out­comes.

5. How use­ful is over­sight?

Sup­pose you have a ge­nie that doesn’t to­tally un­der­stand hu­man val­ues, but tries hard to listen to you and ex­plain things and do what it thinks you want. How use­ful is it that you can in­ter­act with this ge­nie and have a say in what it does rather than it just be­ing a sovereign?

If the ge­nie’s un­der­stand­ing of your val­ues is wrong such that its in­tended ac­tions will bring about a catas­tro­phe, it’s not clear that the ge­nie can de­scribe the out­come to you such that you will no­tice this. The fu­ture is po­ten­tially pretty big and com­pli­cated, es­pe­cially com­pared to your brain, or a short con­ver­sa­tion be­tween you and a ge­nie. So the ge­nie would need to sum­ma­rize a lot. For you to no­tice the sub­tle de­tails that would make the fu­ture worth­less (re­mem­ber that the ge­nie ba­si­cally un­der­stands your val­ues, so they are prob­a­bly not re­ally blatant de­tails) the ge­nie will need to di­rect your at­ten­tion to them. So your situ­a­tion would need to be in a mid­dle ground where the AI knew about some fea­tures of a po­ten­tial fu­ture that might bother you (so that it could point them out), but wasn’t sure if you re­ally would hate them. It seems hard for the AI giv­ing you a ‘pre­view’ to help if the AI is just wrong about your val­ues and doesn’t know how it is wrong.

6. More on oracles

Think­ing in­side the box seems to be the main pa­per on the topic. Chris­ti­ano’s post on how to use an un­friendly AI is again rele­vant to how you might use an or­a­cle.

In-depth investigations

If you are par­tic­u­larly in­ter­ested in these top­ics, and want to do fur­ther re­search, these are a few plau­si­ble di­rec­tions, some in­spired by Luke Muehlhauser’s list, which con­tains many sug­ges­tions re­lated to parts of Su­per­in­tel­li­gence. Th­ese pro­jects could be at­tempted at var­i­ous lev­els of depth.

  1. How sta­ble are the castes? Bostrom men­tioned that these castes mostly have equiv­a­lent long-run ca­pa­bil­ities, be­cause they can be used to make one an­other. A re­lated ques­tion is how likely they are to turn into one an­other. Another re­lated ques­tion is how likely an at­tempt to cre­ate one is to lead to a differ­ent one (e.g. Yud­kowsky’s view above sug­gests that if you try to make an or­a­cle, it might end up be­ing a sovereign). Another re­lated ques­tion, is which ones are likely to win out if they were de­vel­oped in par­allel and available for similar ap­pli­ca­tions? (e.g. How well would ge­nies pros­per in a world with many sovereigns?)

  2. How use­ful is over­sight likely to be? (e.g. At what scale might it be nec­es­sary? Could an AI use­fully com­mu­ni­cate its pre­dic­tions to you such that you can eval­u­ate the out­comes of de­ci­sions? Is there likely to be di­rect com­pe­ti­tion be­tween AIs which are over­seen by peo­ple and those that are not?)

  3. Are non-goal-di­rected or­a­cles likely to be fea­si­ble?

If you are in­ter­ested in any­thing like this, you might want to men­tion it in the com­ments, and see whether other peo­ple have use­ful thoughts.

How to proceed

This has been a col­lec­tion of notes on the chap­ter. The most im­por­tant part of the read­ing group though is dis­cus­sion, which is in the com­ments sec­tion. I pose some ques­tions for you there, and I in­vite you to add your own. Please re­mem­ber that this group con­tains a va­ri­ety of lev­els of ex­per­tise: if a line of dis­cus­sion seems too ba­sic or too in­com­pre­hen­si­ble, look around for one that suits you bet­ter!

Next week, we will talk about the last caste of this chap­ter: the tool AI. To pre­pare, read “Tool-AIs” and “Com­par­i­son” from Chap­ter 10. The dis­cus­sion will go live at 6pm Pa­cific time next Mon­day De­cem­ber 29. Sign up to be no­tified here.