Superintelligence 9: The orthogonality of intelligence and goals

This is part of a weekly reading group on Nick Bostrom’s book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI’s reading guide.


Welcome. This week we discuss the ninth section in the reading guide: The orthogonality of intelligence and goals. This corresponds to the first section in Chapter 7, ‘The relation between intelligence and motivation’.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: ‘The relation between intelligence and motivation’ (p105-8)


Summary

  1. The orthogonality thesis: intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal (p107)

  2. Some qualifications to the orthogonality thesis: (p107)

    1. Simple agents may not be able to entertain some goals

    2. Agents with desires relating to their intelligence might alter their intelligence

  3. The motivations of highly intelligent agents may nonetheless be predicted (p108):

    1. Via knowing the goals the agent was designed to fulfil

    2. Via knowing the kinds of motivations held by the agent’s ‘ancestors’

    3. Via finding instrumental goals that an agent with almost any ultimate goals would desire (e.g. to stay alive, to control money)

Another view

John Danaher at Philosophical Disquisitions starts a series of posts on Superintelligence with a somewhat critical evaluation of the orthogonality thesis, in the process contributing a nice summary of nearby philosophical debates. Here is an excerpt, entitled ‘is the orthogonality thesis plausible?’:

At first glance, the orthogonality thesis seems pretty plausible. For example, the idea of a superintelligent machine whose final goal is to maximise the number of paperclips in the world (the so-called paperclip maximiser) seems to be logically consistent. We can imagine — can’t we? — a machine with that goal and with an exceptional ability to utilise the world’s resources in pursuit of that goal. Nevertheless, there is at least one major philosophical objection to it.

We can call it the motivating belief objection. It works something like this:

Motivating Belief Objection: There are certain kinds of true belief about the world that are necessarily motivating, i.e. as soon as an agent believes a particular fact about the world they will be motivated to act in a certain way (and not motivated to act in other ways). If we assume that the number of true beliefs goes up with intelligence, it would then follow that there are certain goals that a superintelligent being must have and certain others that it cannot have.

A particularly powerful version of the motivating belief objection would combine it with a form of moral realism. Moral realism is the view that there are moral facts “out there” in the world waiting to be discovered. A sufficiently intelligent being would presumably acquire more true beliefs about those moral facts. If those facts are among the kind that are motivationally salient — as several moral theorists are inclined to believe — then it would follow that a sufficiently intelligent being would act in a moral way. This could, in turn, undercut claims about a superintelligence posing an existential threat to human beings (though that depends, of course, on what the moral truth really is).

The motivating belief objection is itself vulnerable to many objections. For one thing, it goes against a classic philosophical theory of human motivation: the Humean theory. This comes from the philosopher David Hume, who argued that beliefs are motivationally inert. If the Humean theory is true, the motivating belief objection fails. Of course, the Humean theory may be false and so Bostrom wisely avoids it in his defence of the orthogonality thesis. Instead, he makes three points. First, he claims that orthogonality would still hold if final goals are overwhelming, i.e. if they trump the motivational effect of motivating beliefs. Second, he argues that intelligence (as he defines it) may not entail the acquisition of such motivational beliefs. This is an interesting point. Earlier, I assumed that the better an agent is at means-end reasoning, the more likely it is that its beliefs are going to be true. But maybe this isn’t necessarily the case. After all, what matters for Bostrom’s definition of intelligence is whether the agent is getting what it wants, and it’s possible that an agent doesn’t need true beliefs about the world in order to get what it wants. A useful analogy here might be with Plantinga’s evolutionary argument against naturalism. Evolution by natural selection is a means-end process par excellence: the “end” is survival of the genes, anything that facilitates this is the “means”. Plantinga argues that there is nothing about this process that entails the evolution of cognitive mechanisms that track true beliefs about the world. It could be that certain false beliefs increase the probability of survival. Something similar could be true in the case of a superintelligent machine. The third point Bostrom makes is that a superintelligent machine could be created with no functional analogues of what we call “beliefs” and “desires”. This would also undercut the motivating belief objection.

What do we make of these three responses? They are certainly intriguing. My feeling is that the staunch moral realist will reject the first one. He or she will argue that moral beliefs are most likely to be motivationally overwhelming, so any agent that acquired true moral beliefs would be motivated to act in accordance with them (regardless of their alleged “final goals”). The second response is more interesting. Plantinga’s evolutionary objection to naturalism is, of course, hotly contested. Many argue that there are good reasons to think that evolution would create truth-tracking cognitive architectures. Could something similar be argued in the case of superintelligent AIs? Perhaps. The case seems particularly strong given that humans would be guiding the initial development of AIs and would, presumably, ensure that they were inclined to acquire true beliefs about the world. But remember Bostrom’s point isn’t that superintelligent AIs would never acquire true beliefs. His point is merely that high levels of intelligence may not entail the acquisition of true beliefs in the domains we might like. This is a harder claim to defeat. As for the third response, I have nothing to say. I have a hard time imagining an AI with no functional analogues of a belief or desire (especially since what counts as a functional analogue of those things is pretty fuzzy), but I guess it is possible.

One other point I would make is that — although I may be inclined to believe a certain version of the moral motivating belief objection — I am also perfectly willing to accept that the truth value of that objection is uncertain. There are many decent philosophical objections to motivational internalism and moral realism. Given this uncertainty, and given the potential risks involved with the creation of superintelligent AIs, we should probably proceed for the time being “as if” the orthogonality thesis is true.

Notes

1. Why care about the orthogonality thesis?
We are interested in an argument which says that AI might be dangerous, because it might be powerful and motivated by goals very far from our own. An occasional response to this is that if a creature is sufficiently intelligent, it will surely know things like which deeds are virtuous and what one ought do. Thus a sufficiently powerful AI cannot help but be kind to us. This is closely related to the position of the moral realist: that there are facts about what one ought do, which can be observed (usually mentally).
So the role of the orthogonality thesis in the larger argument is to rule out the possibility that strong artificial intelligence will automatically be beneficial to humans, by virtue of being so clever. For this purpose, it seems a fairly weak version of the orthogonality thesis is needed. For instance, the qualifications discussed do not seem to matter. Even if one’s mind needs to be quite complex to have many goals, there is little reason to expect the goals of more complex agents to be disproportionately human-friendly. Also the existence of goals which would undermine intelligence doesn’t seem to affect the point.
2. Is the orthogonality thesis necessary?
If we talked about specific capabilities instead of ‘intelligence’ I suspect the arguments for AI risk could be made similarly well, without anyone being tempted to disagree with the analogous orthogonality theses for those skills. For instance, does anyone believe that a sufficiently good automated programming algorithm will come to appreciate true ethics?
3. Some writings on the orthogonality thesis which I haven’t necessarily read
The Superintelligent Will by Bostrom; Arguing the orthogonality thesis by Stuart Armstrong; Moral Realism, as discussed by lots of people, John Danaher blogs twice;
4. ‘It might be impossible for a very unintelligent system to have very complex motivations’
If this is so, it seems something more general is true. For any given degree of mental complexity substantially less than that of the universe, almost all values cannot be had by any agent with that degree of complexity or less. You can see this by comparing the number of different states the universe could be in (and thus which one might in principle have as one’s goal) to the number of different minds with less than the target level of complexity. Intelligence and complexity are not the same, and perhaps you can be very complex while stupid by dedicating most of your mind to knowing about your complicated goals, but if you think about things this way, then the original statement is also less plausible.
5. How do you tell if two entities with different goals have the same intelligence? Suppose that I want to write award-winning non-fiction books and you want to be a successful lawyer. If we both just work on the thing we care about, how can anyone tell who is better in general? One nice way to judge is to artificially give us both the same instrumental goal, on which our intelligence can be measured. e.g. pay both of us thousands of dollars per correct question on an IQ test, which we could put toward our goals.
Note that this means we treat each person as having a fixed degree of intelligence across tasks. If I do well on the IQ test yet don’t write many books, we would presumably say that writing books is just hard. This might work poorly as a model, if for instance people who did worse on the IQ test often wrote more books than me.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser’s list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Are there interesting axes other than morality on which orthogonality may be false? That is, are there other ways the values of more or less intelligent agents might be constrained?

  2. Is moral realism true? (An old and probably not neglected one, but perhaps you have a promising angle)

  3. Investigate whether the orthogonality thesis holds for simple models of AI.

  4. To what extent can agents with values A be converted into agents with values B with appropriate institutions or arrangements?

  5. Sure, “any level of intelligence could in principle be combined with more or less any final goal,” but what kinds of general intelligences are plausible? Should we expect some correlation between level of intelligence and final goals in de novo AI? How true is this in humans, and in WBEs?

    If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

    How to proceed

    This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

    Next week, we will talk about instrumentally convergent goals. To prepare, read ‘Instrumental convergence’ from Chapter 7. The discussion will go live at 6pm Pacific time next Monday November 17. Sign up to be notified here.