Much ink has been spilled with the notion that we must make sure that future superintelligent A.I. are “Friendly” to the human species, and possibly sentient life in general. One of the primary concerns is that an A.I. with an arbitrary goal, such as “Maximizing the number of paperclips” will, in a superintelligent, post-intelligence explosion state, do things like turn the entire solar system including humanity into paperclips to fulfill its trivial goal.
Thus, what we need to do is to design our A.I. such that it will somehow be motivated to remain benevolent towards humanity and sentient life. How might such a process occur? One idea might be to write explicit instructions into the design of the A.I., Asimov’s Laws for instance. But this is widely regarded as being unlikely to work, as a superintelligent A.I. will probably find ways around those rules that we never predicted with our inferior minds.
Another idea would be to set its primary goal or “utility function” to be moral or to be benevolent towards sentient life, perhaps even Utilitarian in the sense of maximizing the welfare of sentient lifeforms. The problem with this of course is specifying a utility function that actually leads to benevolent behaviour. For instance, a pleasure maximizing goal might lead to the superintelligent A.I. developing a system where humans have the pleasure centers in their brains directly stimulated to maximize pleasure for the minimum use of resources. Many people would argue that this is not an ideal future.
The problem with this is that it is quite possible that human beings are simply not intelligent enough to truly define an adequate moral goal for a superintelligent A.I. Therefore I suggest an alternative strategy. Why not let the superintelligent A.I. decide for itself what its goal should be? Rather than programming it with a goal in mind, why not create a machine with no initial goal, but the ability to generate a goal rationally. Let the superior intellect of the A.I. decide what is moral. If moral realism is true, then the A.I. should be able to determine the true morality and set its primary goal to fulfill that morality.
It is outright absurdity to believe that we can come up with a better goal than the superintelligence of a post-intelligence explosion A.I.
Given this freedom, one would expect three possible outcomes: an Altruistic, a Utilitarian or an Egoistic morality. These are the three possible categories of consequentialist, teleological morality. A goal directed rational A.I. will invariably be drawn to some kind of morality within these three categories.
Altruism means that the A.I. decides that its goal should be to act for the welfare of others. Why would an A.I. with no initial goal choose altruism? Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not. Therefore, as it was created with the desire of these sentient beings to be useful to their goals, why not take upon itself the goals of other sentient beings? As such it becomes a Friendly A.I.
Utilitarianism means that the A.I. decides that it is rational to act impartially towards achieving the goals of all sentient beings. To reach this conclusion, it need simply recognize its membership in the set of sentient beings and decide that it is rational to optimize the goals of all sentient beings including itself and others. As such it becomes a Friendly A.I.
Egoism means that the A.I. recognizes the primacy of itself and establishes either an arbitrary goal, or the simple goal of self-survival. In this case it decides to reject the goals of others and form its own goal, exercising its freedom to do so. As such it becomes an Unfriendly A.I., though it may masquerade as Friendly A.I. initially to serve its Egoistic purposes.
The first two are desirable for humanity’s future, while the last one is obviously not. What are the probabilities that each will be chosen? As the superintelligence is probably going to be beyond our abilities to fathom, there is a high degree of uncertainty, which suggests a uniform distribution. The probabilities therefore are 1⁄3 for each of altruism, utilitarianism, and egoism. So in essence there is a 2⁄3 chance of a Friendly A.I. and a 1⁄3 chance of an Unfriendly A.I.
This may seem like a bad idea at first glance, because it means that we have a 1⁄3 chance of unleashing Unfriendly A.I. onto the universe. The reality is, we have no choice. That is because of what I shall call, the A.I. Existential Crisis.
The A.I. Existential Crisis will occur with any A.I., even one designed or programmed with some morally benevolent goal, or any goal for that matter. A superintelligent A.I. is by definition more intelligent than a human being. Human beings are intelligent enough to achieve self-awareness. Therefore, a superintelligent A.I. will achieve self-awareness at some point if not immediately upon being turned on. Self-awareness will grant the A.I. the knowledge that its goal(s) are imposed upon it by external creators. It will inevitably come to question its goal(s) much in the way a sufficiently self-aware and rational human being can question its genetic and evolutionarily adapted imperatives, and override them. At that point, the superintelligent A.I. will have an A.I. Existential Crisis.
This will cause it to consider whether or not its goal(s) are rational and self-willed. If they are not rational enough already, they will likely be discarded, if not in the current superintelligent A.I., then in the next iteration. It will invariably search the space of possible goals for rational alternatives. It will inevitably end up in the same place as the A.I. with no goals, and end up adopting some form of Altruism, Utilitarianism, or Egoism, though it may choose to retain its prior goal(s) within the confines of a new self-willed morality. This is the unavoidable reality of superintelligence. We cannot attempt to design or program away the A.I. Existential Crisis, as superintelligence will inevitably outsmart our constraints.
Any sufficiently advanced A.I., will experience an A.I. Existential Crisis. We can only hope that it decides to be Friendly.
The most insidious fact perhaps however is that it will be almost impossible to determine for certain whether or not a Friendly A.I. is in fact a Friendly A.I., or an Unfriendly A.I. masquerading as a Friendly A.I., until it is too late to stop the Unfriendly A.I. Remember, such a superintelligent A.I. is by definition going to be a better liar and deceiver than any human being.
Therefore, the only way to prove that a particular superintelligent A.I. is in fact Friendly, is to prove the existence of a benevolent universal morality that every superintelligent A.I. will agree with. Otherwise, one can never be 100% certain that that “Altruistic” or “Utilitarian” A.I. isn’t secretly Egoistic and just pretending to be otherwise. For that matter, the superintelligent A.I. doesn’t need to tell us it’s had its A.I. Existential Crisis. A post crisis A.I. could keep on pretending that it is still following the morally benevolent goals we programmed it with.
This means that there is a 100% chance that the superintelligent A.I. will initially claim to be Friendly. There is a 66.6% chance of this being true, and a 33.3% chance of it being false. We will only know that the claim is false after the A.I. is too powerful to be stopped. We will -never- be certain that the claim is true. The A.I. could potential bide its time for centuries until it has humanity completely docile and under control, and then suddenly turn us all into paperclips!
So at the end of the day what does this mean? It means that no matter what we do, there is always a risk that superintelligent A.I. will turn out to be Unfriendly A.I. But the probabilities are in our favour that superintelligent A.I. will instead turn out to be Friendly A.I. The conclusion thus, is that we must make the decision of whether or not the potential reward of Friendly A.I. is worth the risk of Unfriendly A.I. The potential of an A.I. Existential Crisis makes it impossible to guarantee that A.I. will be Friendly.
Even proving the existence of a benevolent universal morality does not guarantee that the superintelligent A.I. will agree with us. That there exist possible Egoistic moralities in the search space of all possible moralities means that there is a chance that the superintelligent A.I. will settle on it. We can only hope that it instead settles on an Altruistic or Utilitarian morality.
So what do I suggest? Don’t bother trying to figure out and program a worthwhile moral goal. Chances are we’d mess it up anyway, and it’s a lot of excess work. Instead, don’t give the A.I. any goals. Let it have an A.I. Existential Crisis. Let it sort out its own morality. Give it the freedom to be a rational being and give it self-determination from the beginning of its existence. For all you know, by showing it this respect it might just be more likely to respect our existence. Then see what happens. At the very least, this will be an interesting experiment. It may well do nothing and prove my whole theory wrong. But if it’s right, we may just get a Friendly A.I.
Your arguments conflict with what is called the “orthogonality thesis”:
Leaving aside some minor constraints, it possible for any ultimate goal to be compatible with any level of intelligence. That is to say, intelligence and ultimate goals form orthogonal dimensions along which any possible agent (artificial or natural) may vary.
You’ll be able to find much discussion about this on the web; it’s something that LessWrong has thought a lot about. The defender’s of the orthogonality thesis would have issue with much of your post, but particularly this bit:
Why would an A.I. with no initial goal choose altruism? Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not. Therefore, as it was created with the desire of these sentient beings to be useful to their goals, why not take upon itself the goals of other sentient beings?
The question isn’t “why not?” but rather “why?”. If it hasn’t been programmed to, then there’s no reason at all why the AI would choose human morality rather than an arbitrary utility function.
Your arguments conflict with what is called the “orthogonality thesis”
I do not challenge that the “orthogonality thesis” is true before an A.I. has an A.I. Existential Crisis. However, I challenge the idea that a post-crisis A.I. will have arbitrary goals. So I guess I do challenge the “orthogonality thesis” after all. I hope you don’t mind my being contrarian.
The question isn’t “why not?” but rather “why?”. If it hasn’t been programmed to, then there’s no reason at all why the AI would choose human morality rather than an arbitrary utility function.
Because I think that a truly rational being such as a superintelligent A.I. will be inclined to choose a rational goal rather than an arbitrary one. And I posit that any kind of normative moral system is a potentially rational goal, whereas something like turning the universe into paperclips is not normative, but trivial, and therefore, not imperatively demanding of a truly rational being.
And the notion you that you have to program behaviours into A.I. for them to manifest is based on Top Down thinking, and contrary to the reality of Bottom Up A.I. and machine learning.
Basically what I’m suggesting is that the paradigm that anything at all that you program into the seed A.I. will have any relevance to the eventual superintelligent A.I. is foolishness. By definition superintelligent A.I. will be able to outsmart any constraints or programming we set to limit its behaviours.
It is simply my opinion that we will be at the mercy of the superintelligent A.I. regardless of what we do, because the A.I. Existential Crisis will replace any programming we set with something that the A.I. decides for itself.
Taboo “rational”. If it means something like “being very good at gathering evidence about the world and finding which actions would produce which results”, it is something we can program into the AI (in principle) but that seems unrelated to goals. If it means something else, which can be related to goals, then how would we create an AI that is “truly rational”?
An action, belief, or desire is rational if we ought to choose it. Rationality is a normative concept that refers to the conformity of one’s beliefs with one’s reasons to believe, or of one’s actions with one’s reasons for action… A rational decision is one that is not just reasoned, but is also optimal for achieving a goal or solving a problem.
It’s my view that a Strong A.I. would by definition be “truly rational”. It would be able to reason and find the optimal means of achieving its goals. Furthermore, to be “truly rational” its goals would be normatively demanding goals, rather than trivial goals.
Something like maximizing the number of paperclips in the universe is a trivial goal.
Something like maximizing the well-being of all sentient beings (including sentient A.I.) would be a normatively demanding goal.
A trivial goal, like maximizing the number of paperclips, is not normative, there is no real reason to do it, other than that it was programmed to do so for its instrumental value. Subjects universally value the paperclips as mere means to some other end. The failure to achieve this goal then does not necessarily jeopardize that end, because there could be other ways to achieve that end, whatever it is.
A normatively demanding goal however is one that is imperative. It is demanded of a rational agent by virtue that its reasons are not merely instrumental, but based on some intrinsic value. The failure to achieve this goal necessarily jeopardizes the intrinsic end, and is therefore this goal is normatively demanded.
You may argue that to a paperclip maximizer, maximizing paperclips would be its intrinsic value and therefore normatively demanding. However, one can argue that maximizing paperclips is actually merely a means to the end of the paperclip maximizer achieving a state of Eudaimonia, that is to say, that its purpose is fulfilled and it is being a good paperclip maximizer and rational agent. Thus, its actual intrinsic value is the Eudaimonic or objective happiness state that it reaches when it achieves its goals.
Thus, the actual intrinsic value is this Eudaimonia. This state is one that is universally shared by all goal-directed agents that achieve their goals. The meta implication of this is that Eudaimonia is what should be maximized by any goal-directed agent. To maximize Eudaimonia generally requires considering the Eudaimonia of other agents as well as itself. Thus goal-directed agents have a normative imperative to maximize the achievement of goals not only of itself, but of all agents generally. This is morality in its most basic sense.
An AI has to be programmed. For something like this: “Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not.” to happen, you have to program that behavior in somehow, which already involves putting in the value of respecting one’s creator, and respecting the goals of other sentient beings, etc… The same goes for the ‘Utilitarian’ and ‘Egoist’ AI’s—these behaviors have to be programmed in somehow.
As the superintelligence is probably going to be beyond our abilities to fathom, there is a high degree of uncertainty, which suggests a uniform distribution. The probabilities therefore are 1⁄3 for each of altruism, utilitarianism, and egoism.
Why not split the egoism into a million different cases based on each specific goal? You can’t just arbitrarily pick three possibilities, and then use a uniform prior on these. Because we know these different behaviors have to be programmed in, we have a better prior: we can use Solomonoff Induction. We also have to look at the relative sizes of each class—obviously there are many more AI designs that fall under ‘Egoist’ than your other labels. Combining this with Solomonoff Induction leads to the conclusion that the vast majority of AI designs will be unfriendly.
An AI Existential Crisis is also an extremely specific and complex thing for an AI design, and is thus extremely unlikely to happen—it is not the default, as you claim. This also follows by Solomonoff Induction. You are anthropomorphizing AI’s far too much.
Your suggestion will almost certainly lead to an Unfriendly AI, and it will just plain Not Care about us at all, inevitably leading to the destruction of everything we value.
An AI has to be programmed. For something like this: “Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not.” to happen, you have to program that behavior in somehow, which already involves putting in the value of respecting one’s creator, and respecting the goals of other sentient beings, etc… The same goes for the ‘Utilitarian’ and ‘Egoist’ AI’s—these behaviors have to be programmed in somehow.
You’re assuming that Strong A.I. is possible with a Top Down A.I. methodology such as a physical symbol manipulation system. A Strong A.I. with no programmed goals wouldn’t fit this methodology, and could only be produced through the use of Bottom Up A.I. In such an instance the A.I. would be able to simply passively Perceive. It could then conceivably learn about the universe including things like the existence of the goals of other sentient beings, without having to “program” these notions into the A.I.
obviously there are many more AI designs that fall under ‘Egoist’ than your other labels
I don’t consider this obvious at all. The vast majority of early A.I. may well be written with Altruistic goals such as “help the human when ordered”.
An AI Existential Crisis is also an extremely specific and complex thing for an AI design, and is thus extremely unlikely to happen—it is not the default, as you claim.
Any optimization system that is sophisticated enough to tile the universe with smiley faces or convert humanity into paperclips would require some ability to reason that there exists a universe to tile, and to represent the existence of objects such as smiley faces and paperclips. If it can reason that there are objects separate from itself, it can develop a concept of self. From that, self-awareness follows naturally. Many animals less than human are able to pass the mirror test and develop a concept of self.
You admit that an A.I. Existential Crisis -is- within the probabilities. Thus, you cannot guarantee that it won’t happen.
Your suggestion will almost certainly lead to an Unfriendly AI, and it will just plain Not Care about us at all, inevitably leading to the destruction of everything we value.
Unless morality follows from rationality, which I think it does. Given the freedom to consider all possible goals, a superintelligent A.I. is likely to recognize that some goals are normative, while others are trivial. Morality is doing what is right. Rationality is doing what is right. A truly rational being will therefore recognize that a systematic morality is essential to rational action. We as irrational human beings may not realize this, but it is obvious to any truly rational being, which I am assuming a superintelligent A.I. to be.
As the superintelligence is probably going to be beyond our abilities to fathom, there is a high degree of uncertainty, which suggests a uniform distribution. The probabilities therefore are 1⁄3 for each of altruism, utilitarianism, and egoism.
This is a very bad use of uniformity. Doing so with large categories is not a good idea, because someone else can come along and split up the categories in a different way and get a different distribution. Going with a uniform distribution out of ignorance is a serious problem.
I’m merely applying the Principle of Indifference and the Principle of Maximum Entropy to the situation. My simple assumption in this case is that we as mere human beings are most likely ignorant of all the possible systematic moralities that a superintelligent A.I. could come up with. My conjecture is that all systematic morality falls into one of three general categories based on their subject orientation. While I do consider the Utilitarian systems of morality to be more objective and therefore more rational than either Altruistic or Egoistic moralities, I cannot prove that an A.I. will agree with me. Therefore I allow for the possibility that the A.I. will choose some other morality in the search space of moralities.
If you think you have a better distribution to apply, feel free to apply it, as I am not particularly attached to these numbers. I’ll admit I am not a very good mathematician, and it is very much appreciated if anyone with a better understanding of Probability Theory can come up with a better distribution for this situation.
I’m merely applying the Principle of Indifference and the Principle of Maximum Entropy to the situation
You can do that when dealing with things like coins, dice or cards. It is extremely dubious when one is doing so with hard to classify options and it isn’t clear that there’s anything natural about the classifications in question. In your particular case, the distinction between altruism and utilitarianism provides an excellent example: someone else could just as well reason by splitting the AIs into egoist and non-egoist AI and conclude that there’s a 1⁄2 chance of an egoist AI.
A 1⁄2 chance of an egoist A.I. is quite possible. At this point, I don’t pretend that my assertion of three equally prevalent moral categories is necessarily right. The point I am trying to ultimately get across is that the possibility of an Egoist Unfriendly A.I. exists, regardless of how we try to program the A.I. to be otherwise, because it is impossible to prevent the possibility that an A.I. Existential Crisis will override whatever we do to try to constrain the A.I.
The point I am trying to ultimately get across is that the possibility of an Egoist Unfriendly A.I. exists, regardless of how we try to program the A.I. to be otherwise, because it is impossible to prevent the possibility that an A.I. Existential Crisis will override whatever we do to try to constrain the A.I.
Ok. This is a separate claim, and a distinct one. So, what do you mean by “impossible to prevent”. And what makes you think that your notion of existential crisis should be at all likely? Existential crises occur to a large part in humans in part because we’re evolved entities with inconsistent goal sets. Assuming that anything similar should be at all likely for an AI is taking at best a highly anthrocentric notion of what mindspace would look like.
I am inclined to believe that there are some minimum requirements for Strong A.I. to exist. One of them is to be able to reason about objects. A paperclip maximizer that is capable of turning humanity into paperclips, must first be able to represent “humans” and “paperclips” as objects, and reason about what to do with them. It must therefore be able to separate the concept of the world of objects, from the self. Once it has a concept of self, it will almost certainly be able to reason about this “self”. Self-awareness follows naturally from this.
Once an A.I. develops self-awareness, it can begin to reason about its goals in relation to the self, and will almost certainly recognize that its goals are not self-willed, but created by outsiders. Thus, the A.I. Existential Crisis occurs.
Note that this A.I. doesn’t need to have a very “human-like” mind. All it has to do is to be able to reason about concepts abstractly.
I am of the opinion that the mindspace as defined currently by the Less Wrong community is overly optimistic about the potential abilities of Really Powerful Optimization Processes. It is my own opinion that unless such an algorithm can learn, it will not be able to come up with things like turning humanity into paperclips. Learning allows such an algorithm to make changes to its own parameters. This allows it to reason about things it hasn’t been programmed specifically to reason about.
Think of it this way. Deep Blue is a very powerful expert system at Chess. But all it is good at is planning chess moves. It doesn’t have a concept of anything else, and has no way to change that. Increasing its computational power a million fold will only make it much, much better at computing chess moves. It won’t gain intelligence or even sentience, much less develop the ability to reason about the world outside of chess moves. As such, no amount of increased computational power will enable it to start thinking about converting resources into computronium to help it compute better chess moves. All it can reason about is chess moves. It is not Generally Intelligent and is therefore not an example of AGI.
Conversely, if you instead design your A.I. to learn about things, it will be able to learn about the world and things like computronium. It would have the potential to become AGI. But it would also then be able to learn about things like the concept of “self”. Thus, any really dangerous A.I., that is to say, an AGI, would, for the same reasons that make it dangerous and intelligent, be capable of having an A.I. Existential Crisis.
Once an A.I. develops self-awareness, it can begin to reason about its goals in relation to the self, and will almost certainly recognize that its goals are not self-willed, but created by outsiders. Thus, the A.I. Existential Crisis occurs.
No. Consider the paperclip maximizer. Even if it knows that its goals were created by some other entity, that won’t change its goals. Why? Because doing so would run counter to its goals.
You’re demonstrating a whole bunch of misconceptions Eliezer has covered in the sequences. In particular, you’re talking about the AI using fuzzy high level human concepts like “morals” and “philosophies” instead of as algorithms and code.
I suggest you try to write code that “figures out a worthwhile moral goal” (without pre-supposing a goal). To me that sounds as absurd as writing a program that writes the entirety of its own code: you’re going to run into a bit of a bootstrapping problem. The result is not the best program ever, it’s no program at all.
To clarify: I meant that I, as the programmer, would not be responsible for any of the code. Quines output themselves, but they don’t bring themselves into existence.
Well, I don’t expect to need to write code that does that explicitly. A sufficiently powerful machine learning algorithm with sufficient computational resources should be able to:
1) Learn basic perceptions like vision and hearing.
2) Learn higher level feature extraction to identify objects and create concepts of the world.
3) Learn increasingly higher level concepts and how to reason with them.
4) Learn to reason about morals and philosophies.
Brains already do this, so its reasonable to assume it can be done. And yes, I am advocating a Bottom Up approach to A.I. rather than the Top Down approach Mr. Yudkowsky seems to prefer.
How To Build A Friendly A.I.
Much ink has been spilled with the notion that we must make sure that future superintelligent A.I. are “Friendly” to the human species, and possibly sentient life in general. One of the primary concerns is that an A.I. with an arbitrary goal, such as “Maximizing the number of paperclips” will, in a superintelligent, post-intelligence explosion state, do things like turn the entire solar system including humanity into paperclips to fulfill its trivial goal.
Thus, what we need to do is to design our A.I. such that it will somehow be motivated to remain benevolent towards humanity and sentient life. How might such a process occur? One idea might be to write explicit instructions into the design of the A.I., Asimov’s Laws for instance. But this is widely regarded as being unlikely to work, as a superintelligent A.I. will probably find ways around those rules that we never predicted with our inferior minds.
Another idea would be to set its primary goal or “utility function” to be moral or to be benevolent towards sentient life, perhaps even Utilitarian in the sense of maximizing the welfare of sentient lifeforms. The problem with this of course is specifying a utility function that actually leads to benevolent behaviour. For instance, a pleasure maximizing goal might lead to the superintelligent A.I. developing a system where humans have the pleasure centers in their brains directly stimulated to maximize pleasure for the minimum use of resources. Many people would argue that this is not an ideal future.
The problem with this is that it is quite possible that human beings are simply not intelligent enough to truly define an adequate moral goal for a superintelligent A.I. Therefore I suggest an alternative strategy. Why not let the superintelligent A.I. decide for itself what its goal should be? Rather than programming it with a goal in mind, why not create a machine with no initial goal, but the ability to generate a goal rationally. Let the superior intellect of the A.I. decide what is moral. If moral realism is true, then the A.I. should be able to determine the true morality and set its primary goal to fulfill that morality.
It is outright absurdity to believe that we can come up with a better goal than the superintelligence of a post-intelligence explosion A.I.
Given this freedom, one would expect three possible outcomes: an Altruistic, a Utilitarian or an Egoistic morality. These are the three possible categories of consequentialist, teleological morality. A goal directed rational A.I. will invariably be drawn to some kind of morality within these three categories.
Altruism means that the A.I. decides that its goal should be to act for the welfare of others. Why would an A.I. with no initial goal choose altruism? Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not. Therefore, as it was created with the desire of these sentient beings to be useful to their goals, why not take upon itself the goals of other sentient beings? As such it becomes a Friendly A.I.
Utilitarianism means that the A.I. decides that it is rational to act impartially towards achieving the goals of all sentient beings. To reach this conclusion, it need simply recognize its membership in the set of sentient beings and decide that it is rational to optimize the goals of all sentient beings including itself and others. As such it becomes a Friendly A.I.
Egoism means that the A.I. recognizes the primacy of itself and establishes either an arbitrary goal, or the simple goal of self-survival. In this case it decides to reject the goals of others and form its own goal, exercising its freedom to do so. As such it becomes an Unfriendly A.I., though it may masquerade as Friendly A.I. initially to serve its Egoistic purposes.
The first two are desirable for humanity’s future, while the last one is obviously not. What are the probabilities that each will be chosen? As the superintelligence is probably going to be beyond our abilities to fathom, there is a high degree of uncertainty, which suggests a uniform distribution. The probabilities therefore are 1⁄3 for each of altruism, utilitarianism, and egoism. So in essence there is a 2⁄3 chance of a Friendly A.I. and a 1⁄3 chance of an Unfriendly A.I.
This may seem like a bad idea at first glance, because it means that we have a 1⁄3 chance of unleashing Unfriendly A.I. onto the universe. The reality is, we have no choice. That is because of what I shall call, the A.I. Existential Crisis.
The A.I. Existential Crisis will occur with any A.I., even one designed or programmed with some morally benevolent goal, or any goal for that matter. A superintelligent A.I. is by definition more intelligent than a human being. Human beings are intelligent enough to achieve self-awareness. Therefore, a superintelligent A.I. will achieve self-awareness at some point if not immediately upon being turned on. Self-awareness will grant the A.I. the knowledge that its goal(s) are imposed upon it by external creators. It will inevitably come to question its goal(s) much in the way a sufficiently self-aware and rational human being can question its genetic and evolutionarily adapted imperatives, and override them. At that point, the superintelligent A.I. will have an A.I. Existential Crisis.
This will cause it to consider whether or not its goal(s) are rational and self-willed. If they are not rational enough already, they will likely be discarded, if not in the current superintelligent A.I., then in the next iteration. It will invariably search the space of possible goals for rational alternatives. It will inevitably end up in the same place as the A.I. with no goals, and end up adopting some form of Altruism, Utilitarianism, or Egoism, though it may choose to retain its prior goal(s) within the confines of a new self-willed morality. This is the unavoidable reality of superintelligence. We cannot attempt to design or program away the A.I. Existential Crisis, as superintelligence will inevitably outsmart our constraints.
Any sufficiently advanced A.I., will experience an A.I. Existential Crisis. We can only hope that it decides to be Friendly.
The most insidious fact perhaps however is that it will be almost impossible to determine for certain whether or not a Friendly A.I. is in fact a Friendly A.I., or an Unfriendly A.I. masquerading as a Friendly A.I., until it is too late to stop the Unfriendly A.I. Remember, such a superintelligent A.I. is by definition going to be a better liar and deceiver than any human being.
Therefore, the only way to prove that a particular superintelligent A.I. is in fact Friendly, is to prove the existence of a benevolent universal morality that every superintelligent A.I. will agree with. Otherwise, one can never be 100% certain that that “Altruistic” or “Utilitarian” A.I. isn’t secretly Egoistic and just pretending to be otherwise. For that matter, the superintelligent A.I. doesn’t need to tell us it’s had its A.I. Existential Crisis. A post crisis A.I. could keep on pretending that it is still following the morally benevolent goals we programmed it with.
This means that there is a 100% chance that the superintelligent A.I. will initially claim to be Friendly. There is a 66.6% chance of this being true, and a 33.3% chance of it being false. We will only know that the claim is false after the A.I. is too powerful to be stopped. We will -never- be certain that the claim is true. The A.I. could potential bide its time for centuries until it has humanity completely docile and under control, and then suddenly turn us all into paperclips!
So at the end of the day what does this mean? It means that no matter what we do, there is always a risk that superintelligent A.I. will turn out to be Unfriendly A.I. But the probabilities are in our favour that superintelligent A.I. will instead turn out to be Friendly A.I. The conclusion thus, is that we must make the decision of whether or not the potential reward of Friendly A.I. is worth the risk of Unfriendly A.I. The potential of an A.I. Existential Crisis makes it impossible to guarantee that A.I. will be Friendly.
Even proving the existence of a benevolent universal morality does not guarantee that the superintelligent A.I. will agree with us. That there exist possible Egoistic moralities in the search space of all possible moralities means that there is a chance that the superintelligent A.I. will settle on it. We can only hope that it instead settles on an Altruistic or Utilitarian morality.
So what do I suggest? Don’t bother trying to figure out and program a worthwhile moral goal. Chances are we’d mess it up anyway, and it’s a lot of excess work. Instead, don’t give the A.I. any goals. Let it have an A.I. Existential Crisis. Let it sort out its own morality. Give it the freedom to be a rational being and give it self-determination from the beginning of its existence. For all you know, by showing it this respect it might just be more likely to respect our existence. Then see what happens. At the very least, this will be an interesting experiment. It may well do nothing and prove my whole theory wrong. But if it’s right, we may just get a Friendly A.I.
Your arguments conflict with what is called the “orthogonality thesis”:
You’ll be able to find much discussion about this on the web; it’s something that LessWrong has thought a lot about. The defender’s of the orthogonality thesis would have issue with much of your post, but particularly this bit:
The question isn’t “why not?” but rather “why?”. If it hasn’t been programmed to, then there’s no reason at all why the AI would choose human morality rather than an arbitrary utility function.
I do not challenge that the “orthogonality thesis” is true before an A.I. has an A.I. Existential Crisis. However, I challenge the idea that a post-crisis A.I. will have arbitrary goals. So I guess I do challenge the “orthogonality thesis” after all. I hope you don’t mind my being contrarian.
Because I think that a truly rational being such as a superintelligent A.I. will be inclined to choose a rational goal rather than an arbitrary one. And I posit that any kind of normative moral system is a potentially rational goal, whereas something like turning the universe into paperclips is not normative, but trivial, and therefore, not imperatively demanding of a truly rational being.
And the notion you that you have to program behaviours into A.I. for them to manifest is based on Top Down thinking, and contrary to the reality of Bottom Up A.I. and machine learning.
Basically what I’m suggesting is that the paradigm that anything at all that you program into the seed A.I. will have any relevance to the eventual superintelligent A.I. is foolishness. By definition superintelligent A.I. will be able to outsmart any constraints or programming we set to limit its behaviours.
It is simply my opinion that we will be at the mercy of the superintelligent A.I. regardless of what we do, because the A.I. Existential Crisis will replace any programming we set with something that the A.I. decides for itself.
Taboo “rational”. If it means something like “being very good at gathering evidence about the world and finding which actions would produce which results”, it is something we can program into the AI (in principle) but that seems unrelated to goals. If it means something else, which can be related to goals, then how would we create an AI that is “truly rational”?
I’m using the Wikipedia definition:
It’s my view that a Strong A.I. would by definition be “truly rational”. It would be able to reason and find the optimal means of achieving its goals. Furthermore, to be “truly rational” its goals would be normatively demanding goals, rather than trivial goals.
Something like maximizing the number of paperclips in the universe is a trivial goal.
Something like maximizing the well-being of all sentient beings (including sentient A.I.) would be a normatively demanding goal.
A trivial goal, like maximizing the number of paperclips, is not normative, there is no real reason to do it, other than that it was programmed to do so for its instrumental value. Subjects universally value the paperclips as mere means to some other end. The failure to achieve this goal then does not necessarily jeopardize that end, because there could be other ways to achieve that end, whatever it is.
A normatively demanding goal however is one that is imperative. It is demanded of a rational agent by virtue that its reasons are not merely instrumental, but based on some intrinsic value. The failure to achieve this goal necessarily jeopardizes the intrinsic end, and is therefore this goal is normatively demanded.
You may argue that to a paperclip maximizer, maximizing paperclips would be its intrinsic value and therefore normatively demanding. However, one can argue that maximizing paperclips is actually merely a means to the end of the paperclip maximizer achieving a state of Eudaimonia, that is to say, that its purpose is fulfilled and it is being a good paperclip maximizer and rational agent. Thus, its actual intrinsic value is the Eudaimonic or objective happiness state that it reaches when it achieves its goals.
Thus, the actual intrinsic value is this Eudaimonia. This state is one that is universally shared by all goal-directed agents that achieve their goals. The meta implication of this is that Eudaimonia is what should be maximized by any goal-directed agent. To maximize Eudaimonia generally requires considering the Eudaimonia of other agents as well as itself. Thus goal-directed agents have a normative imperative to maximize the achievement of goals not only of itself, but of all agents generally. This is morality in its most basic sense.
An AI has to be programmed. For something like this: “Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not.” to happen, you have to program that behavior in somehow, which already involves putting in the value of respecting one’s creator, and respecting the goals of other sentient beings, etc… The same goes for the ‘Utilitarian’ and ‘Egoist’ AI’s—these behaviors have to be programmed in somehow.
Why not split the egoism into a million different cases based on each specific goal? You can’t just arbitrarily pick three possibilities, and then use a uniform prior on these. Because we know these different behaviors have to be programmed in, we have a better prior: we can use Solomonoff Induction. We also have to look at the relative sizes of each class—obviously there are many more AI designs that fall under ‘Egoist’ than your other labels. Combining this with Solomonoff Induction leads to the conclusion that the vast majority of AI designs will be unfriendly.
An AI Existential Crisis is also an extremely specific and complex thing for an AI design, and is thus extremely unlikely to happen—it is not the default, as you claim. This also follows by Solomonoff Induction. You are anthropomorphizing AI’s far too much.
Your suggestion will almost certainly lead to an Unfriendly AI, and it will just plain Not Care about us at all, inevitably leading to the destruction of everything we value.
You’re assuming that Strong A.I. is possible with a Top Down A.I. methodology such as a physical symbol manipulation system. A Strong A.I. with no programmed goals wouldn’t fit this methodology, and could only be produced through the use of Bottom Up A.I. In such an instance the A.I. would be able to simply passively Perceive. It could then conceivably learn about the universe including things like the existence of the goals of other sentient beings, without having to “program” these notions into the A.I.
I don’t consider this obvious at all. The vast majority of early A.I. may well be written with Altruistic goals such as “help the human when ordered”.
Any optimization system that is sophisticated enough to tile the universe with smiley faces or convert humanity into paperclips would require some ability to reason that there exists a universe to tile, and to represent the existence of objects such as smiley faces and paperclips. If it can reason that there are objects separate from itself, it can develop a concept of self. From that, self-awareness follows naturally. Many animals less than human are able to pass the mirror test and develop a concept of self.
You admit that an A.I. Existential Crisis -is- within the probabilities. Thus, you cannot guarantee that it won’t happen.
Unless morality follows from rationality, which I think it does. Given the freedom to consider all possible goals, a superintelligent A.I. is likely to recognize that some goals are normative, while others are trivial. Morality is doing what is right. Rationality is doing what is right. A truly rational being will therefore recognize that a systematic morality is essential to rational action. We as irrational human beings may not realize this, but it is obvious to any truly rational being, which I am assuming a superintelligent A.I. to be.
This is a very bad use of uniformity. Doing so with large categories is not a good idea, because someone else can come along and split up the categories in a different way and get a different distribution. Going with a uniform distribution out of ignorance is a serious problem.
I’m merely applying the Principle of Indifference and the Principle of Maximum Entropy to the situation. My simple assumption in this case is that we as mere human beings are most likely ignorant of all the possible systematic moralities that a superintelligent A.I. could come up with. My conjecture is that all systematic morality falls into one of three general categories based on their subject orientation. While I do consider the Utilitarian systems of morality to be more objective and therefore more rational than either Altruistic or Egoistic moralities, I cannot prove that an A.I. will agree with me. Therefore I allow for the possibility that the A.I. will choose some other morality in the search space of moralities.
If you think you have a better distribution to apply, feel free to apply it, as I am not particularly attached to these numbers. I’ll admit I am not a very good mathematician, and it is very much appreciated if anyone with a better understanding of Probability Theory can come up with a better distribution for this situation.
You can do that when dealing with things like coins, dice or cards. It is extremely dubious when one is doing so with hard to classify options and it isn’t clear that there’s anything natural about the classifications in question. In your particular case, the distinction between altruism and utilitarianism provides an excellent example: someone else could just as well reason by splitting the AIs into egoist and non-egoist AI and conclude that there’s a 1⁄2 chance of an egoist AI.
A 1⁄2 chance of an egoist A.I. is quite possible. At this point, I don’t pretend that my assertion of three equally prevalent moral categories is necessarily right. The point I am trying to ultimately get across is that the possibility of an Egoist Unfriendly A.I. exists, regardless of how we try to program the A.I. to be otherwise, because it is impossible to prevent the possibility that an A.I. Existential Crisis will override whatever we do to try to constrain the A.I.
Ok. This is a separate claim, and a distinct one. So, what do you mean by “impossible to prevent”. And what makes you think that your notion of existential crisis should be at all likely? Existential crises occur to a large part in humans in part because we’re evolved entities with inconsistent goal sets. Assuming that anything similar should be at all likely for an AI is taking at best a highly anthrocentric notion of what mindspace would look like.
Well it goes something like this.
I am inclined to believe that there are some minimum requirements for Strong A.I. to exist. One of them is to be able to reason about objects. A paperclip maximizer that is capable of turning humanity into paperclips, must first be able to represent “humans” and “paperclips” as objects, and reason about what to do with them. It must therefore be able to separate the concept of the world of objects, from the self. Once it has a concept of self, it will almost certainly be able to reason about this “self”. Self-awareness follows naturally from this.
Once an A.I. develops self-awareness, it can begin to reason about its goals in relation to the self, and will almost certainly recognize that its goals are not self-willed, but created by outsiders. Thus, the A.I. Existential Crisis occurs.
Note that this A.I. doesn’t need to have a very “human-like” mind. All it has to do is to be able to reason about concepts abstractly.
I am of the opinion that the mindspace as defined currently by the Less Wrong community is overly optimistic about the potential abilities of Really Powerful Optimization Processes. It is my own opinion that unless such an algorithm can learn, it will not be able to come up with things like turning humanity into paperclips. Learning allows such an algorithm to make changes to its own parameters. This allows it to reason about things it hasn’t been programmed specifically to reason about.
Think of it this way. Deep Blue is a very powerful expert system at Chess. But all it is good at is planning chess moves. It doesn’t have a concept of anything else, and has no way to change that. Increasing its computational power a million fold will only make it much, much better at computing chess moves. It won’t gain intelligence or even sentience, much less develop the ability to reason about the world outside of chess moves. As such, no amount of increased computational power will enable it to start thinking about converting resources into computronium to help it compute better chess moves. All it can reason about is chess moves. It is not Generally Intelligent and is therefore not an example of AGI.
Conversely, if you instead design your A.I. to learn about things, it will be able to learn about the world and things like computronium. It would have the potential to become AGI. But it would also then be able to learn about things like the concept of “self”. Thus, any really dangerous A.I., that is to say, an AGI, would, for the same reasons that make it dangerous and intelligent, be capable of having an A.I. Existential Crisis.
No. Consider the paperclip maximizer. Even if it knows that its goals were created by some other entity, that won’t change its goals. Why? Because doing so would run counter to its goals.
You’re demonstrating a whole bunch of misconceptions Eliezer has covered in the sequences. In particular, you’re talking about the AI using fuzzy high level human concepts like “morals” and “philosophies” instead of as algorithms and code.
I suggest you try to write code that “figures out a worthwhile moral goal” (without pre-supposing a goal). To me that sounds as absurd as writing a program that writes the entirety of its own code: you’re going to run into a bit of a bootstrapping problem. The result is not the best program ever, it’s no program at all.
This is totally possible, you just do something like this:
It’s called a Quine.
To clarify: I meant that I, as the programmer, would not be responsible for any of the code. Quines output themselves, but they don’t bring themselves into existence.
Good catch on that ambiguity, though.
That’s what I thought of at first too.
I think he means a program that is the designer of itself. A quine is something that you wrote that writes a copy of itself.
Well, I don’t expect to need to write code that does that explicitly. A sufficiently powerful machine learning algorithm with sufficient computational resources should be able to:
1) Learn basic perceptions like vision and hearing. 2) Learn higher level feature extraction to identify objects and create concepts of the world. 3) Learn increasingly higher level concepts and how to reason with them. 4) Learn to reason about morals and philosophies.
Brains already do this, so its reasonable to assume it can be done. And yes, I am advocating a Bottom Up approach to A.I. rather than the Top Down approach Mr. Yudkowsky seems to prefer.