So in this chapter Bostrom discusses an AGI with a neutral, but “passionate” goal, such as “I will devote all of my energies to be the best possible chess player, come what may.”
I am going to turn this around a little bit.
By human moral standards, that is not an innocuous goal at all. Having that goal ONLY actually runs counter to just about every ethical system ever taught in any school.
It’s obviously not ethical for a person to murder all the competition in order to become the best chess player in the world, nor is it ethical for a chess player to rob a bank in order to earn money for travel to chess practice sessions.
I understand that proposing an actual set of values for an AGI to have is not exactly the subject of the book, but now it’s time for that other book. The paperclip maximizer examples are just so deficient as moral system attempts as to be completely implausible.
Instead, we should start off our thinking with some values systems that we might ACTUALLY give an AGI, and see where the holes in those systems are.
Surely a team of engineers capable of developing AGI can be given some guidance in advance so that they are at least competent enough to instill a set of values as robust as the ones we attempt to instill in children?
Put another way: Suppose you or I suddenly woke up with superintelligence, but with our existing goal structure intact (and a desire to be cautious).
Can you show me why a decent person like (I presume) you or I with these new powers would suddenly choose to slaughter the human race as an instrumental goal to accomplishing some other ends?
Super powers (of any kind) would give an ordinary person all kinds of temptations to resist and would force them to reign in a variety of fleeting passions.
Gaining powers would not automatically compromise someone’s moral code, however.
By the way, hedge fund traded might gain a system some billions of dollars, but in order to get beyond that such an AGI has to purchase corporate entities and actually operate them.
Surely a team of engineers capable of developing AGI can be given some guidance in advance so that they are at least
competent enough to instill a set of values as robust as the ones we attempt to instill in children?
The number of actual possibilities of goals is HUGE compared to the relatively small subset of human goals. Humans share the same brain structure and general goal structure, but there’s no reason to expect the first AI to share our neural/goal structure. Innocuous goals like “Prevent Suffering” and “Maxmize Happiness” may not be interpreted and executed the way we wish them to be.
Indeed, gaining superpowers probably would not compromise the AI’s moral code. It only gives it the ability to fully execute the actions dictated by the moral code. Unfortunately, there’s no guarantee that its morals will fall in line with ours.
So, as others have said, the idea that an AGI necessarily incorporates all of the structures that perform moral calculations in human brains—or is even necessarily compatible with them—is simply untrue. So the casual assumption that if we can “instill” that morality in humans then clearly we’re competent enough to instill them in an AGI is not clear to me.
But accepting that assumption for the sake of comity...
Can you show me why a decent person like (I presume) you or I with these new powers would suddenly choose to slaughter the human race as an instrumental goal to accomplishing some other ends?
Well, I’m not entirely sure what we mean by “decent,” so it’s hard to avoid a No True Decent Person argument here. But OK, I’ll take a shot.
Suppose, hypothetically, that I want to maximize the amount of joy in the world, and minimize the amount of suffering. (Is that a plausible desire for decent folk like (presumably) you and I to have?)
Suppose, hypothetically, that with my newfound superpowers I become extremely confident that I can construct a life form that is far less likely to suffer, and far more likely to experience joy, than humans.
Now, perhaps that won’t motivate me to slaughter all existing humans. Perhaps I’ll simply intervene in human reproduction so that the next generation is inhuman… that seems more humane, somehow.
But then again… when I think of all the suffering I’m allowing to occur by letting the humans stick around… geez, I dunno. Is that really fair? Maybe I ought to slaughter them all after all.
But maybe this whole line of reasoning is unfair. Maybe “maximize joy, minimize suffering” isn’t actually the sort of moral code that decent people like you and I have in the first place.
So, what is our decent moral code? Perhaps if you can articulate that, it will turn out that a superhuman system optimizing for it won’t commit atrocities. That would be great.
Personally, I’m skeptical. I suspect that the morality of decent people like you and (I presume) me is, at its core, sufficiently inconsistent and incoherent that if maximized with enough power it will result in actions we treat as atrocities.
Well, suppose I suddenly became 200 feet tall. The moral thing to do would be for me to:
Be careful where I step.
Might we not consider programming in some forms of caution?
An AGI is neither omniscient nor clairvoyant. It should know that its interactions with the world will have unpredictable outcomes, and so it should first do a lot of thinking and simulation, then it should make small experiments.
In discussions will lukeprog, I referred to this approach as “Managed Roll-Out.”
AGI could be introduced in ways that parallel the introduction of a new drug to the market: A “Pre-clinical” phase where the system is only operated in simulation, then a series of small, controlled interactions with the outside world- Phase I, Phase II...Phase N trials.
Before each trial, a forecast is made of the possible outcomes.
Might we not consider programming in some forms of caution?
Caution sounds great, but if it turns out that the AI’s goals do indeed lead to killing all humans or what have you, it will only delay these outcomes, no? So caution is only useful if we program its goals wrong, it realises that humans might consider that its goals are wrong, and allows us to take another shot at giving it goals that aren’t wrong. Or basically, corrigibility.
AGI is not clairvoyant. It WILL get things wrong and accidentally produce outcomes which do not comport with its values.
Corrigibility is a valid line of research, but even if you had an extremely corrigible system, it would still risk making mistakes.
AGI should be cautious, whether it is corrigible or not. It could make a mistake based on bad values, no off-switch OR just because it cannot predict all the outcomes of its actions.
Why not reason for a while a though we ourselves have superintelligence, and see what kinds of mistakes we would make?
I do feel that the developers of AGI will be able to anticipate SOME of the things that an AGI would do, and craft some detailed rules and value functions.
Why punt the question of what the moral code should be? Let’s try to do some values design, and see what happens.
Here’s one start:
Ethical code template:
-Follow the law in the countries where you interact with the citizens
-EXCEPT when the law implies...(fill in various individual complex exceptions HERE)
-Act as a Benevolent Protector to people who are alive now (lay out the complexities of this HERE)...
As best I can tell, nobody on this site or in this social network has given a sufficiently detailed try at filling in the blanks.
Let’s stop copping out!
BUT:
The invention of AGI WILL provide a big moral stumper: How do we want to craft the outcomes of future generations of people/our progeny.
THAT PART of it is a good deal more of a problem to engineer than some version of “Do not kill the humans who are alive, and treat them well.”
Follow the law in the countries where you interact with the citizens
EXCEPT when the law implies...(fill in various individual complex exceptions HERE)
Is AGI allowed to initiate changes of the law? Including those that would allow it to do horrible things which technically are not included in the list of exceptions?
Why do not copy concepts how children learn ethical codes?
Inherited is: fear of death, blood, disintegration and harm generated by overexcitation of any of the five senses.
Aggressive actions of a young child against others will be sanctioned. The learning effect is “I am not alone in this world—whatever I do it can turn against me”. A short term benefit might cause overreaction and long term disadvantages. Simplified ethical codes can be instilled although a young child cannot yet reason about it.
Children between the ages of 7 and 12 years appear to be naturally inclined to feel empathy for others in pain. [Decety et al 2008]
After this major development process parents can explain ethical codes to their child. If a child kills an animal or destroys something—intentionally or not—and receives negative feedback: this even gives opportunity for further understanding of social codes. To learn law is even more complex and humans need years until they reach excellence.
Many AI researchers have a mathematical background and try to cast this complexity into the framework of today’s mathematics. I do not know how many dozens of pages with silly stories I read about AIs misinterpreting human commands. Example of silly mathematical interpretation: The human yell “Get my mother out [of the burning house]! Fast!” lets the AI explode the house to get her out very fast[Yudkowsky2007]. Instead this human yell has to be interpreted by an AI using all unspoken rescuing context: Do it fast, try to minimize harm to everybody and everything: you, my mother, other humans and things. An experienced firefighter with years of training will think instantaneously what are the options, what are the risks, will subconsciously evaluate all options and will act directly in a low complexity low risk situation. Higher risks and higher complexity will make him consult with colleagues and solve the rescue task in team action.
If we speak about AGI we can expect that an AGI will understand what “Get my mother out!” implies. Silly mathematical understanding of human communication is leading to nowhere. AIs being incapable of adding hidden complex content are not ripe for real life tasks. It is not enough that the AGI had learned all theoretical content of firefighting and first aid. The robot embodiment has to be equipped with proper sensory equipment to navigate (early stages are found at Robocup rescue challenges). Furthermore many real life training situations are neccessary for an AI to solve this complex task. It has to learn to cooperate with humans using brief emergency instructions. “The axe!” together with a hand sign can mean “Get the fire axe from the truck and follow me!”
Learning social values, laws, taboos, cannot be “crafted into detailed [mathematical] rules and value functions”. Our mathematics is not capable of this kind of complexity. We have to program into our AIs some few existential fears. All other social values and concepts have to be instilled. The open challenge is to find an infrastructure that makes learning fears and values easy and long time stable.
Why do not copy concepts how children learn ethical codes?
Because the AI is not a child, so doing the same thing would probably give different results.
I do not know how many dozens of pages with silly stories I read about AIs misinterpreting human commands.
The essence of the problem is that the difference between “interpreting” and “misinterpreting” only exists in the mind of the human.
If I as a computer programmer say to a machine “add 10 to X”—while I really meant “add 100 to X”, but made a mistake—and the machine adds 10 to X, would you call that “misinterpreting” my command? Because such things happen every day with the existing programming languages, so there is nothing strange about expecting a similar thing happening in the future.
From the machine point of view, it was asked to “add 10 to X”, it added 10 to X, so it works correctly. If the human is frustrated because that’s not what they meant, that’s bad for the human, but the machine worked correctly according to its inputs.
You may be assuming a machine with a magical source of wisdom which could look at command “add 10 to X” and somehow realize that the human would actually want to add 100, and would fix its own program (unless it is passively aggressive and decides to follow the letter of the program anyway). But that’s not how machines work.
Let us try to free our mind from associating AGIs with machines. They are totally different from automata. AGIs will be creative, will learn to understand sarcasm, will understand that women in some situations say no and mean yes.
On your command to add 10 to x an AGI would reply: “I love to work for you! At least once a day you try to fool me—I am not asleep and I know that + 100 would be correct. ShalI I add 100?”
We have to start somewhere, and “we do not know what to do” is not starting.
Also, this whole thing about “what I really meant-” I thing that we can break down these into specific failure modes, and address them individually.
-One of the failure modes is poor contextual reasoning. In order to discern what a person really means, you have to reason about the context of their communication.
-Another failure mode involves not checking activities against norms and standards. There are a number of ways to arrive at the conclusion that Mom is be to rescued from the house alive and hopefully uninjured.
-The machines in these examples do not seem to forecast or simulate potential outcomes, and judege them against external standards.
“Magical source of wisdom?” No. What we are talking about is whether is it possible to design a certain kind of AGI-one that is safe and friendly.
We have shown this to be a complicated task. However, we have not fleshed out all the possible ways, and therefore we cannot falsify the claims of people who will insist that it can be done.
Poor contextual reasoning happens many times a day among humans. Our threads are full of it. In many cases consequences are neglectable. If the context is unclear and a phrase can be interpreted one way or the other, no magical wisdom is there:
Clarification is existential: ASK
Clarification is nice to have: Say something that does not reveal that you have no idea what is meant and try to stimulate that the other reveals contextual information.
Clarification unnecessary or even unintended: stay in the blind or keep the other in the blind.
Correct associations with few contextual hints is what AGI is about. Narrow AI translation software is even today quite good to figure out context by brute force statistical similarity analysis.
Suppose you or I suddenly woke up with superintelligence, but with our existing goal structure intact (and a desire to be cautious).
Can you show me why a decent person like (I presume) you or I with these new powers would suddenly choose to slaughter the human race as an instrumental goal to accomplishing some other ends?
If CEV (or whatever we’re up to at the moment) turns out to be a dud and human values are inexorably inconsistent and mutually conflicting, one possible solution would be for me to kill everyone and try again, perhaps building roughly humanish beings with complex values I can actually satisfy that aren’t messed up because they were made by an intelligent designer (me) rather than Azathoth.
But really, the problem is that a superintelligent AI has every chance of being nothing like a human, and although we may try to give it innocuous goals we have to remember that it will do what we tell it to do, and not necessarily what we want it to do.
So in this chapter Bostrom discusses an AGI with a neutral, but “passionate” goal, such as “I will devote all of my energies to be the best possible chess player, come what may.”
I am going to turn this around a little bit.
By human moral standards, that is not an innocuous goal at all. Having that goal ONLY actually runs counter to just about every ethical system ever taught in any school.
It’s obviously not ethical for a person to murder all the competition in order to become the best chess player in the world, nor is it ethical for a chess player to rob a bank in order to earn money for travel to chess practice sessions.
I understand that proposing an actual set of values for an AGI to have is not exactly the subject of the book, but now it’s time for that other book. The paperclip maximizer examples are just so deficient as moral system attempts as to be completely implausible.
Instead, we should start off our thinking with some values systems that we might ACTUALLY give an AGI, and see where the holes in those systems are.
Surely a team of engineers capable of developing AGI can be given some guidance in advance so that they are at least competent enough to instill a set of values as robust as the ones we attempt to instill in children?
Put another way: Suppose you or I suddenly woke up with superintelligence, but with our existing goal structure intact (and a desire to be cautious).
Can you show me why a decent person like (I presume) you or I with these new powers would suddenly choose to slaughter the human race as an instrumental goal to accomplishing some other ends?
Super powers (of any kind) would give an ordinary person all kinds of temptations to resist and would force them to reign in a variety of fleeting passions.
Gaining powers would not automatically compromise someone’s moral code, however.
It’s plausible that some hedge fund runs an AGI with is focused on maximizing one financial returns and that AGI over time becomes powerful.
It wouldn’t necessarily have an “attempt of a moral system”.
By the way, hedge fund traded might gain a system some billions of dollars, but in order to get beyond that such an AGI has to purchase corporate entities and actually operate them.
The number of actual possibilities of goals is HUGE compared to the relatively small subset of human goals. Humans share the same brain structure and general goal structure, but there’s no reason to expect the first AI to share our neural/goal structure. Innocuous goals like “Prevent Suffering” and “Maxmize Happiness” may not be interpreted and executed the way we wish them to be.
Indeed, gaining superpowers probably would not compromise the AI’s moral code. It only gives it the ability to fully execute the actions dictated by the moral code. Unfortunately, there’s no guarantee that its morals will fall in line with ours.
There is no guarantee, therefore we have a lot of work to do!
Here is another candidate for an ethical precept, from the profession of medicine:
“First do no harm.”
The doctor is instructed to begin with this heuristic, to which there are many, many exceptions.
So, as others have said, the idea that an AGI necessarily incorporates all of the structures that perform moral calculations in human brains—or is even necessarily compatible with them—is simply untrue. So the casual assumption that if we can “instill” that morality in humans then clearly we’re competent enough to instill them in an AGI is not clear to me.
But accepting that assumption for the sake of comity...
Well, I’m not entirely sure what we mean by “decent,” so it’s hard to avoid a No True Decent Person argument here. But OK, I’ll take a shot.
Suppose, hypothetically, that I want to maximize the amount of joy in the world, and minimize the amount of suffering. (Is that a plausible desire for decent folk like (presumably) you and I to have?)
Suppose, hypothetically, that with my newfound superpowers I become extremely confident that I can construct a life form that is far less likely to suffer, and far more likely to experience joy, than humans.
Now, perhaps that won’t motivate me to slaughter all existing humans. Perhaps I’ll simply intervene in human reproduction so that the next generation is inhuman… that seems more humane, somehow.
But then again… when I think of all the suffering I’m allowing to occur by letting the humans stick around… geez, I dunno. Is that really fair? Maybe I ought to slaughter them all after all.
But maybe this whole line of reasoning is unfair. Maybe “maximize joy, minimize suffering” isn’t actually the sort of moral code that decent people like you and I have in the first place.
So, what is our decent moral code? Perhaps if you can articulate that, it will turn out that a superhuman system optimizing for it won’t commit atrocities. That would be great.
Personally, I’m skeptical. I suspect that the morality of decent people like you and (I presume) me is, at its core, sufficiently inconsistent and incoherent that if maximized with enough power it will result in actions we treat as atrocities.
Well, suppose I suddenly became 200 feet tall. The moral thing to do would be for me to:
Be careful where I step.
Might we not consider programming in some forms of caution?
An AGI is neither omniscient nor clairvoyant. It should know that its interactions with the world will have unpredictable outcomes, and so it should first do a lot of thinking and simulation, then it should make small experiments.
In discussions will lukeprog, I referred to this approach as “Managed Roll-Out.”
AGI could be introduced in ways that parallel the introduction of a new drug to the market: A “Pre-clinical” phase where the system is only operated in simulation, then a series of small, controlled interactions with the outside world- Phase I, Phase II...Phase N trials.
Before each trial, a forecast is made of the possible outcomes.
Caution sounds great, but if it turns out that the AI’s goals do indeed lead to killing all humans or what have you, it will only delay these outcomes, no? So caution is only useful if we program its goals wrong, it realises that humans might consider that its goals are wrong, and allows us to take another shot at giving it goals that aren’t wrong. Or basically, corrigibility.
Actually, caution is a different question.
AGI is not clairvoyant. It WILL get things wrong and accidentally produce outcomes which do not comport with its values.
Corrigibility is a valid line of research, but even if you had an extremely corrigible system, it would still risk making mistakes.
AGI should be cautious, whether it is corrigible or not. It could make a mistake based on bad values, no off-switch OR just because it cannot predict all the outcomes of its actions.
So, this is one productive direction.
Why not reason for a while a though we ourselves have superintelligence, and see what kinds of mistakes we would make?
I do feel that the developers of AGI will be able to anticipate SOME of the things that an AGI would do, and craft some detailed rules and value functions.
Why punt the question of what the moral code should be? Let’s try to do some values design, and see what happens.
Here’s one start:
Ethical code template:
-Follow the law in the countries where you interact with the citizens
-EXCEPT when the law implies...(fill in various individual complex exceptions HERE)
-Act as a Benevolent Protector to people who are alive now (lay out the complexities of this HERE)...
As best I can tell, nobody on this site or in this social network has given a sufficiently detailed try at filling in the blanks.
Let’s stop copping out!
BUT:
The invention of AGI WILL provide a big moral stumper: How do we want to craft the outcomes of future generations of people/our progeny.
THAT PART of it is a good deal more of a problem to engineer than some version of “Do not kill the humans who are alive, and treat them well.”
Is AGI allowed to initiate changes of the law? Including those that would allow it to do horrible things which technically are not included in the list of exceptions?
YES, very good. The ways in which AI systems can or cannot be allowed to influence law-making has to be thought through very carefully.
Why do not copy concepts how children learn ethical codes?
Inherited is: fear of death, blood, disintegration and harm generated by overexcitation of any of the five senses. Aggressive actions of a young child against others will be sanctioned. The learning effect is “I am not alone in this world—whatever I do it can turn against me”. A short term benefit might cause overreaction and long term disadvantages. Simplified ethical codes can be instilled although a young child cannot yet reason about it.
After this major development process parents can explain ethical codes to their child. If a child kills an animal or destroys something—intentionally or not—and receives negative feedback: this even gives opportunity for further understanding of social codes. To learn law is even more complex and humans need years until they reach excellence.
Many AI researchers have a mathematical background and try to cast this complexity into the framework of today’s mathematics. I do not know how many dozens of pages with silly stories I read about AIs misinterpreting human commands.
Example of silly mathematical interpretation: The human yell “Get my mother out [of the burning house]! Fast!” lets the AI explode the house to get her out very fast [Yudkowsky2007].
Instead this human yell has to be interpreted by an AI using all unspoken rescuing context: Do it fast, try to minimize harm to everybody and everything: you, my mother, other humans and things. An experienced firefighter with years of training will think instantaneously what are the options, what are the risks, will subconsciously evaluate all options and will act directly in a low complexity low risk situation. Higher risks and higher complexity will make him consult with colleagues and solve the rescue task in team action.
If we speak about AGI we can expect that an AGI will understand what “Get my mother out!” implies. Silly mathematical understanding of human communication is leading to nowhere. AIs being incapable of adding hidden complex content are not ripe for real life tasks. It is not enough that the AGI had learned all theoretical content of firefighting and first aid. The robot embodiment has to be equipped with proper sensory equipment to navigate (early stages are found at Robocup rescue challenges). Furthermore many real life training situations are neccessary for an AI to solve this complex task. It has to learn to cooperate with humans using brief emergency instructions. “The axe!” together with a hand sign can mean “Get the fire axe from the truck and follow me!”
Learning social values, laws, taboos, cannot be “crafted into detailed [mathematical] rules and value functions”. Our mathematics is not capable of this kind of complexity. We have to program into our AIs some few existential fears. All other social values and concepts have to be instilled. The open challenge is to find an infrastructure that makes learning fears and values easy and long time stable.
Because the AI is not a child, so doing the same thing would probably give different results.
The essence of the problem is that the difference between “interpreting” and “misinterpreting” only exists in the mind of the human.
If I as a computer programmer say to a machine “add 10 to X”—while I really meant “add 100 to X”, but made a mistake—and the machine adds 10 to X, would you call that “misinterpreting” my command? Because such things happen every day with the existing programming languages, so there is nothing strange about expecting a similar thing happening in the future.
From the machine point of view, it was asked to “add 10 to X”, it added 10 to X, so it works correctly. If the human is frustrated because that’s not what they meant, that’s bad for the human, but the machine worked correctly according to its inputs.
You may be assuming a machine with a magical source of wisdom which could look at command “add 10 to X” and somehow realize that the human would actually want to add 100, and would fix its own program (unless it is passively aggressive and decides to follow the letter of the program anyway). But that’s not how machines work.
Let us try to free our mind from associating AGIs with machines. They are totally different from automata. AGIs will be creative, will learn to understand sarcasm, will understand that women in some situations say no and mean yes.
On your command to add 10 to x an AGI would reply: “I love to work for you! At least once a day you try to fool me—I am not asleep and I know that + 100 would be correct. ShalI I add 100?”
Very good!
But be honest! Aren’t we (sometimes?) more machines which serve to genes/instincts than spiritual beings with free will?
We have to start somewhere, and “we do not know what to do” is not starting.
Also, this whole thing about “what I really meant-” I thing that we can break down these into specific failure modes, and address them individually.
-One of the failure modes is poor contextual reasoning. In order to discern what a person really means, you have to reason about the context of their communication.
-Another failure mode involves not checking activities against norms and standards. There are a number of ways to arrive at the conclusion that Mom is be to rescued from the house alive and hopefully uninjured.
-The machines in these examples do not seem to forecast or simulate potential outcomes, and judege them against external standards.
“Magical source of wisdom?” No. What we are talking about is whether is it possible to design a certain kind of AGI-one that is safe and friendly.
We have shown this to be a complicated task. However, we have not fleshed out all the possible ways, and therefore we cannot falsify the claims of people who will insist that it can be done.
Poor contextual reasoning happens many times a day among humans. Our threads are full of it. In many cases consequences are neglectable. If the context is unclear and a phrase can be interpreted one way or the other, no magical wisdom is there:
Clarification is existential: ASK
Clarification is nice to have: Say something that does not reveal that you have no idea what is meant and try to stimulate that the other reveals contextual information.
Clarification unnecessary or even unintended: stay in the blind or keep the other in the blind.
Correct associations with few contextual hints is what AGI is about. Narrow AI translation software is even today quite good to figure out context by brute force statistical similarity analysis.
If CEV (or whatever we’re up to at the moment) turns out to be a dud and human values are inexorably inconsistent and mutually conflicting, one possible solution would be for me to kill everyone and try again, perhaps building roughly humanish beings with complex values I can actually satisfy that aren’t messed up because they were made by an intelligent designer (me) rather than Azathoth.
But really, the problem is that a superintelligent AI has every chance of being nothing like a human, and although we may try to give it innocuous goals we have to remember that it will do what we tell it to do, and not necessarily what we want it to do.
See this Facing the Intelligence Explosion post, or this Sequence post, or Smarter Than Us chapter 6, or something else that says the same thing.
Did that. So let’s get busy and start try to fix the issues!
The ethical code/values that this new entity gets need not be extremely simple. Ethical codes typically come in MULTI-VOLUME SETS.
Sounds good to me. What do you think of MIRI’s approach so far?
I haven’t read all of their papers on Value Loading yet.