I have been saying for years that I don’t think provable Friendliness is possible, basically for the reasons given here. But I have kept thinking about it, and a relatively minor point that occurred to me is that a bungled attempt at Friendliness might be worse than none. Depending on how it was done, the AI could consider the attempt as a continuing threat.
What’s your sense of how a bungled attempt at Friendliness compares to other things humans might do, in terms of how likely an AI would be to consider it a threat?
Fairly low. But that’s because I don’t think the first AIs are likely to be built by people trying to guarantee Friendliness. If a FriendlyAI proponent tries to rush to get done before another team could finish it could be a much bigger risk.
For my part, if I think about things people might do that might cause a powerful AI to feel threatened and thereby have significantly bad results, FAI theory and implementation not only doesn’t float to the top of the list, it’s hardly even visible in the hypothesis space (unless, as here, I privilege it inordinately by artificially priming it).
It’s still not even clear to me that “friendliness” is a coherent concept. What is a human-friendly intelligence? Not “what is an unfriendly intelligence”—I’m asking what it is, not what it isn’t. (I’ve asked this before, as have others.) Humans aren’t, for example, or this wouldn’t even be a problem. But SIAI needs a friendly intelligence that values human values.
Humans are most of the way to human-friendly. A human given absolute power might use it to accumulate wealth at the expense of others, or punish people that displease her in cruel ways, or even utterly annihilate large groups of people based on something silly like nationality or skin color. But a human wouldn’t misunderstand human values. There is no chance the human would, if she decided to make everyone as happy as possible, kill everyone to use their atoms to tile the universe with pictures of smiley faces (to use a familiar example).
I mean, sure, I agree with the example: a well-meaning human would not kill everyone to tile the universe with pictures of smiley faces. There’s a reason that example is familiar; it was chosen by humans to illustrate something humans instinctively agree is the wrong answer, but a nonhuman optimizer might not.
But to generalize from this to the idea that humans wouldn’t misunderstand human values, or that a well-meaning human granted superhuman optimization abilities won’t inadvertently destroy the things humans value most, seems unjustified.
Well, there’s the problem of getting the human to be sufficiently well-meaning, as opposed to using Earth as The Sims 2100 before moving on to bigger and better galaxies. But if Friendliness is a coherent concept to begin with, why wouldn’t the well-meaning superhuman figure it out after spending some time thinking about it?
Edit: What I’m saying is that if the candidate Friendly AI is actually a superhuman, then we don’t have to worry about Step 1 of friendliness: explaining the problem. Step 2 is convincing the superhuman to care about the problem, and I don’t know how likely that is. And finally Step 3 is figuring out the solution, and assuming the human is sufficiently super that wouldn’t be difficult (all this requires is intelligence, which is what we’re giving the human to begin with).
Agreed that a sufficiently intelligent human would be no less capable of understanding human values, given data and time, than an equally intelligent nonhuman.
No-one is seriously worried that an AGI will misunderstand human values. The worry is that an AGI will understand human values perfectly well, and go on to optimize what it was built to optimize.
Right, so I’m still thinking about it from the “what it was built to optimize” step. You want to try to build the AGI to optimize for human values, right? So you do your best to explain to it what you mean by your human values. But then you fail at explaining and it starts optimizing something else instead.
But suppose the AGI is a super-intelligent human. Now you can just ask it to “optimize for human values” in those exact words (although you probably want to explain it a bit better, just to be on the safe side).
The term “Friendly AI” refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals.
“non-human-harming” is still defining it as what it isn’t, rather than what it is. I appreciate it’s the result we’re after, but it has no explanatory power as to what it is—as an answer, it’s only a mysterious answer.
Morality has a long tradition of negative phrasing. “Thou shalt not” dates back to biblical times. Many laws are prohibitions. Bad deeds often get given more weight than good ones. That is just part of the nature of the beast—IMHO.
That’s nice, but precisely fails to answer the issue I’m raising: what is a “friendly intelligence”, in terms other than stating what it isn’t? What answer makes the term less mysterious?
To paraphrase a radio conversation with one of SI’s employees:
Humans are made of atoms which can be used for other purposes. Instead of building an AI which takes humans and uses their atoms to make cupcakes, we’d want an AI that takes humans and uses their atoms to make ‘human value’, which presumably we’d be fine with.
and then do find/replace on “human value” with Eliezer’s standard paragraph:
Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one’s own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.
Not that I agree this is the proper definition, just one which I’ve pieced together from SI’s public comments.
The obvious loophole in your paraphrase is that this accounts for the atoms the humans are made of, but not for other atoms the humans are interested in.
But yes, this is a bit closer to an answer not phrased as a negation.
Here is the podcast where the Skeptics’ Guide to the Universe (SGU) interviews Michael Vassar (MV) on 23-Sep-2009. The interview begins at 26:10 and the transcript below is 45:50 to 50:11.
SGU: Let me back up a little bit. So we’re talking about, how do we keep a artificial intelligent or self recursive improving technology from essentially taking over the world and deciding that humanity is irrelevant or they would rather have a universe where we’re not around or maybe where we’re batteries or slaves or whatever. So one way that I think you’ve been focusing on so far is the “Laws of Robotics” approach. The Asimov approach.
MV: Err, no. Definitely not.
SGU: Well, in the broadest concept, in that you constrain the artificial intelligence in such a way...
MV: No. You never constrain, you never constrain a god.
SGU: But if you can’t constrain it, then how can you keep it from deciding that we’re irrelevent at some point?
MV: You don’t need to constrain something that you’re creating. If you create something, you get to designate all of its preferences, if you merely decide to do so.
SGU: Well, I think we’re stumbling on semantics then. Because to constrain...
MV: No, we’re not. We’re completely not. We had a whole media campaign called “Three Laws Bad” back in 2005.
SGU: I wasn’t, I didn’t mean to specifically refer to the Three Laws, but to the overall concept of...
MV: No, constraint in the most general sense is suicide.
SGU: So I’m not sure I understand that. Essentially, we’re saying we want the AI to be benign, to take a broad concept, and not malignant. Right? So we’re trying to close down certain paths by which it might develop or improve itself to eliminate those paths that will lead to a malignant outcome.
MV: You don’t need to close down anything. You don’t need to eliminate anything. We’re creating the AI. Everything about it, we get to specify, as its creators. This is not like a child or a human that has instincts and impulses. A machine is incredibly hard not to anthropmorphize here. There’s really very little hope of managing it well if you don’t. We are creating a system, and therefore we’re designating every feature of the system. Creating it to want to destroy us and then constrainimg it so that it doesn’t do so is a very, very bad way of doing things.
SGU: Well, that’s not what I’m talking about. Let me further clarify, because we’re talking about two different things. You’re talking about creating it in a certain form, but I’m talking about, once it gets to the point where then it starts recreating itself, we have to constrain the way it might create and evolve itself so that it doesn’t lead to something that wants to destroy us. Obviously, we’re not going to create something that wants to destroy us and then keep it from doing so. We’re going to create something that maybe its initial state may be benign, but since you’re also talking about recursive self improvement, we have to also keep it from evolving into something malignant. That’s what I mean by constraining it.
MV: If we’re talking a single AI, not an economy, or an ecosystem, if we’re not talking about something that involves randomness, if we’re not talking about something that is made from a human, changes in goals do not count as improvements. Changes in goals are necessarily accidents or compromises. But a unchecked, unconstrained AI that wants ice cream will never, however smart it becomes, decide that it wants chocolate candy instead.
SGU: But it could decide that the best way to make ice cream is out of human brains.
MV: Right. But it will only decide that the best way to make ice cream is out of human brains.
SGU: Right, that’s what I’m talking about. So how do we keep it from deciding that it wants to make ice cream out of human brains? Which is kind of a silly analogy to arrive at, but…
MV: Well, no… uh… how do we do so? We… okay. The Singularity Institute’s approach has always been that we have to make it want to create human value. And if it creates human value out of human brains, that’s okay. But human value is not an easy thing for humans to talk about or describe. In fact, it’s only going to be able to create human value, with all probability, by looking at human brains.
SGU: Ah, that’s interesting. But do you mean it will value human life?
MV: No, I mean it will value whatever it is that humans value.
The original quote had: “human-benefiting” as well as “non-human-harming”. You are asking for “human-benefiting” to be spelled out in more detail? Can’t we just invoke the ‘pornography’ rule here?
Right, but surely they’d the the first to admit that the details about how to do that just aren’t yet available. They do have their `moon-onna-stick’ wishlist.
I have been saying for years that I don’t think provable Friendliness is possible, basically for the reasons given here. But I have kept thinking about it, and a relatively minor point that occurred to me is that a bungled attempt at Friendliness might be worse than none. Depending on how it was done, the AI could consider the attempt as a continuing threat.
What’s your sense of how a bungled attempt at Friendliness compares to other things humans might do, in terms of how likely an AI would be to consider it a threat?
Fairly low. But that’s because I don’t think the first AIs are likely to be built by people trying to guarantee Friendliness. If a FriendlyAI proponent tries to rush to get done before another team could finish it could be a much bigger risk.
OK.
For my part, if I think about things people might do that might cause a powerful AI to feel threatened and thereby have significantly bad results, FAI theory and implementation not only doesn’t float to the top of the list, it’s hardly even visible in the hypothesis space (unless, as here, I privilege it inordinately by artificially priming it).
It’s still not even clear to me that “friendliness” is a coherent concept. What is a human-friendly intelligence? Not “what is an unfriendly intelligence”—I’m asking what it is, not what it isn’t. (I’ve asked this before, as have others.) Humans aren’t, for example, or this wouldn’t even be a problem. But SIAI needs a friendly intelligence that values human values.
Humans are most of the way to human-friendly. A human given absolute power might use it to accumulate wealth at the expense of others, or punish people that displease her in cruel ways, or even utterly annihilate large groups of people based on something silly like nationality or skin color. But a human wouldn’t misunderstand human values. There is no chance the human would, if she decided to make everyone as happy as possible, kill everyone to use their atoms to tile the universe with pictures of smiley faces (to use a familiar example).
That is not at all clear to me.
I mean, sure, I agree with the example: a well-meaning human would not kill everyone to tile the universe with pictures of smiley faces. There’s a reason that example is familiar; it was chosen by humans to illustrate something humans instinctively agree is the wrong answer, but a nonhuman optimizer might not.
But to generalize from this to the idea that humans wouldn’t misunderstand human values, or that a well-meaning human granted superhuman optimization abilities won’t inadvertently destroy the things humans value most, seems unjustified.
Well, there’s the problem of getting the human to be sufficiently well-meaning, as opposed to using Earth as The Sims 2100 before moving on to bigger and better galaxies. But if Friendliness is a coherent concept to begin with, why wouldn’t the well-meaning superhuman figure it out after spending some time thinking about it?
Edit: What I’m saying is that if the candidate Friendly AI is actually a superhuman, then we don’t have to worry about Step 1 of friendliness: explaining the problem. Step 2 is convincing the superhuman to care about the problem, and I don’t know how likely that is. And finally Step 3 is figuring out the solution, and assuming the human is sufficiently super that wouldn’t be difficult (all this requires is intelligence, which is what we’re giving the human to begin with).
Agreed that a sufficiently intelligent human would be no less capable of understanding human values, given data and time, than an equally intelligent nonhuman.
No-one is seriously worried that an AGI will misunderstand human values. The worry is that an AGI will understand human values perfectly well, and go on to optimize what it was built to optimize.
Right, so I’m still thinking about it from the “what it was built to optimize” step. You want to try to build the AGI to optimize for human values, right? So you do your best to explain to it what you mean by your human values. But then you fail at explaining and it starts optimizing something else instead.
But suppose the AGI is a super-intelligent human. Now you can just ask it to “optimize for human values” in those exact words (although you probably want to explain it a bit better, just to be on the safe side).
Does this clarify at all?
“non-human-harming” is still defining it as what it isn’t, rather than what it is. I appreciate it’s the result we’re after, but it has no explanatory power as to what it is—as an answer, it’s only a mysterious answer.
Morality has a long tradition of negative phrasing. “Thou shalt not” dates back to biblical times. Many laws are prohibitions. Bad deeds often get given more weight than good ones. That is just part of the nature of the beast—IMHO.
That’s nice, but precisely fails to answer the issue I’m raising: what is a “friendly intelligence”, in terms other than stating what it isn’t? What answer makes the term less mysterious?
To paraphrase a radio conversation with one of SI’s employees:
and then do find/replace on “human value” with Eliezer’s standard paragraph:
Not that I agree this is the proper definition, just one which I’ve pieced together from SI’s public comments.
The obvious loophole in your paraphrase is that this accounts for the atoms the humans are made of, but not for other atoms the humans are interested in.
But yes, this is a bit closer to an answer not phrased as a negation.
Here is the podcast where the Skeptics’ Guide to the Universe (SGU) interviews Michael Vassar (MV) on 23-Sep-2009. The interview begins at 26:10 and the transcript below is 45:50 to 50:11.
The original quote had: “human-benefiting” as well as “non-human-harming”. You are asking for “human-benefiting” to be spelled out in more detail? Can’t we just invoke the ‘pornography’ rule here?
No, not if the claimed goal is (as it is) to be able to build one from toothpicks and string.
Right, but surely they’d the the first to admit that the details about how to do that just aren’t yet available. They do have their `moon-onna-stick’ wishlist.