Superintelligence 15: Oracles, genies and sovereigns

This is part of a weekly reading group on Nick Bostrom’s book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI’s reading guide.


Welcome. This week we discuss the fifteenth section in the reading guide: Oracles, genies, and sovereigns. This corresponds to the first part of Chapter ten.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Oracles” and “Genies and Sovereigns” from Chapter 10


Summary

  1. Strong AIs might come in different forms or ‘castes’, such as oracles, genies, sovereigns and tools. (p145)

  2. Oracle: an AI that does nothing but answer questions. (p145)

    1. The ability to make a good oracle probably allows you to make a generally capable AI. (p145)

    2. Narrow superintelligent oracles exist: e.g. calculators. (p145-6)

    3. An oracle could be a non-agentlike ‘tool’ (more next week) or it could be a rational agent constrained to only act through answering questions (p146)

    4. There are various ways to try to constrain an oracle, through motivation selection (see last week) and capability control (see the previous week) (p146-7)

    5. An oracle whose goals are not aligned with yours might still be useful (p147-8)

    6. An oracle might be misused, even if it works as intended (p148)

  3. Genie: an AI that carries out a high level command, then waits for another. (p148)

    1. It would be nice if a genie sought to understand and obey your intentions, rather than your exact words. (p149)

  4. Sovereign: an AI that acts autonomously in the world, in pursuit of potentially long range objectives (p148)

  5. A genie or a sovereign might have preview functionality, where it describes what it will do before doing it. (p149)

  6. A genie seems more dangerous than an oracle: if you are going to strongly physically contain the oracle, you may have been better just denying it so much access to the world and asking for blueprints instead of actions. (p148)

  7. The line between genies and sovereigns is fine. (p149)

  8. All of the castes could emulate all of the other castes more or less, so they do not differ in their ultimate capabilities. However they represent different approaches to the control problem. (p150)

  9. The ordering of safety of these castes is not as obvious as it may seem, once we consider factors such as dependence on a single human, and added dangers of creating strong agents whose goals don’t match our own (even if they are tame ‘domesticated’ goals). (p150)

Another view

An old response to suggestions of oracle AI, from Eliezer Yudkowsky (I don’t know how closely this matches his current view):

When someone reinvents the Oracle AI, the most common opening remark runs like this:

“Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn’t need to be Friendly. It wouldn’t need any goals at all. It would just answer questions.”

To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck “answers” to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are “improbable” relative to random organizations of the AI’s RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out.

Now, why might one think that an Oracle didn’t need goals? Because on a human level, the term “goal” seems to refer to those times when you said, “I want to be promoted”, or “I want a cookie”, and when someone asked you “Hey, what time is it?” and you said “7:30” that didn’t seem to involve any goals. Implicitly, you wanted to answer the question; and implicitly, you had a whole, complicated, functionally optimized brain that let you answer the question; and implicitly, you were able to do so because you looked down at your highly optimized watch, that you bought with money, using your skill of turning your head, that you acquired by virtue of curious crawling as an infant. But that all takes place in the invisible background; it didn’t feel like you wanted anything.

Thanks to empathic inference, which uses your own brain as an unopened black box to predict other black boxes, it can feel like “question-answering” is a detachable thing that comes loose of all the optimization pressures behind it—even the existence of a pressure to answer questions!

Notes

1. What are the axes we are talking about?

This chapter talks about different types or ‘castes’ of AI. But there are lots of different ways you could divide up kinds of AI (e.g. earlier we saw brain emulations vs. synthetic AI). So in what ways are we dividing them here? They are related to different approaches to the control problem, but don’t appear to be straightforwardly defined by them.

It seems to me we are looking at something close to these two axes:

  • Goal-directedness: the extent to which the AI acts in accordance with a set of preferences (instead of for instance reacting directly to stimuli, or following rules without regard to consequence)

  • Oversight: the degree to which humans have an ongoing say in what the AI does (instead of the AI making all decisions itself)

The castes fit on these axes something like this:

They don’t quite neatly fit—tools are spread between two places, and oracles are a kind of tool (or a kind of genie if they are of the highly constrained agent variety). But I find this a useful way to think about these kinds of AI.

Note that when we think of ‘tools’, we usually think of them having a lot of oversight—that is, being used by a human, who is making decisions all the time. However you might also imagine what I have called ‘autonomous tools’, which run on their own but aren’t goal directed. For instance an AI that continually reads scientific papers and turns out accurate and engaging science books, without particularly optimizing for doing this more efficiently or trying to get any particular outcome.

We have two weeks on this chapter, so I think it will be good to focus a bit on goal directedness one week and oversight the other, alongside the advertised topics of specific castes. So this week let’s focus on oversight, since tools (next week) primarily differ from the other castes mentioned in not being goal-directed.

2. What do goal-directedness and oversight have to do with each other?

Why consider goal-directedness and oversight together? It seems to me there are a couple of reasons.

Goal-directedness and oversight are substitutes, broadly. The more you direct a machine, the less it needs to direct itself. Somehow the machine has to assist with some goals, so either you or the machine needs to care about those goals and direct the machine according to them. The ‘autonomous tools’ I mentioned appear to be exceptions, but they only seem plausible for a limited range of tasks where minimal goal direction is needed beyond what a designer can do ahead of time.

Another way goal-directedness and oversight are connected is that we might expect both to change as we become better able to align an AI’s goals with our own. In order for an AI to be aligned with our goals, the AI must naturally be goal-directed. Also, better alignment should make oversight less necessary.

3. A note on names

‘Sovereign AI’ sounds powerful and far reaching. Note that more mundane AIs would also fit under this category. For instance, an AI who works at an office and doesn’t take over the world would also be a sovereign AI. You would be a sovereign AI if you were artificial.

4. Costs of oversight

Bostrom discussed some problems with genies. I’ll mention a few others.

One clear downside of a machine which follows your instructions and awaits your consent is that you have to be there giving instructions and consenting to things. In a world full of powerful AIs which needed such oversight, there might be plenty of spare human labor around to do this at the start, if each AI doesn’t need too much oversight. However a need for human oversight might bottleneck the proliferation of such AIs.

Another downside of using human labor beyond the cost to the human is that it might be prohibitively slow, depending on the oversight required. If you only had to check in with the AI daily, and it did unimaginably many tasks the rest of the time, oversight probably wouldn’t be a great cost. However if you had to be in the loop and fully understand the decision every time the AI chose how to allocate its internal resources, things could get very slow.

Even if these costs are minor compared to the value of avoiding catastrophe, they may be too large to allow well overseen AIs to compete with more autonomous AIs. Especially if the oversight is mostly to avoid low probability terrible outcomes.

5. How useful is oversight?

Suppose you have a genie that doesn’t totally understand human values, but tries hard to listen to you and explain things and do what it thinks you want. How useful is it that you can interact with this genie and have a say in what it does rather than it just being a sovereign?

If the genie’s understanding of your values is wrong such that its intended actions will bring about a catastrophe, it’s not clear that the genie can describe the outcome to you such that you will notice this. The future is potentially pretty big and complicated, especially compared to your brain, or a short conversation between you and a genie. So the genie would need to summarize a lot. For you to notice the subtle details that would make the future worthless (remember that the genie basically understands your values, so they are probably not really blatant details) the genie will need to direct your attention to them. So your situation would need to be in a middle ground where the AI knew about some features of a potential future that might bother you (so that it could point them out), but wasn’t sure if you really would hate them. It seems hard for the AI giving you a ‘preview’ to help if the AI is just wrong about your values and doesn’t know how it is wrong.

6. More on oracles

Thinking inside the box seems to be the main paper on the topic. Christiano’s post on how to use an unfriendly AI is again relevant to how you might use an oracle.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser’s list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How stable are the castes? Bostrom mentioned that these castes mostly have equivalent long-run capabilities, because they can be used to make one another. A related question is how likely they are to turn into one another. Another related question is how likely an attempt to create one is to lead to a different one (e.g. Yudkowsky’s view above suggests that if you try to make an oracle, it might end up being a sovereign). Another related question, is which ones are likely to win out if they were developed in parallel and available for similar applications? (e.g. How well would genies prosper in a world with many sovereigns?)

  2. How useful is oversight likely to be? (e.g. At what scale might it be necessary? Could an AI usefully communicate its predictions to you such that you can evaluate the outcomes of decisions? Is there likely to be direct competition between AIs which are overseen by people and those that are not?)

  3. Are non-goal-directed oracles likely to be feasible?

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the last caste of this chapter: the tool AI. To prepare, read “Tool-AIs” and “Comparison” from Chapter 10. The discussion will go live at 6pm Pacific time next Monday December 29. Sign up to be notified here.