If you think this is wrong, take it up with the people whose work I am both quoting and analyzing in this paper, because THAT IS WHAT THEY ARE CLAIMING. I am not the one saying that “the AI is programmed with good intentions”, that is their claim.
I think I spotted a bit of confusion: The programmers of the “make everyone happy” AI had good intentions. But the AI itself does not have good intentions; because the intent “make everyone happy” is not good, albeit in a way that its programmers did not think of.
The problem is that nshepperd is talking as if the term “intention” has some exact, agreed-upon technical definition.
It does not. (It might have in nshepperd’s mind, but not elsewhere)
I have made it clear that the term is being used, by me, in a non-technical sense, whose meaning is clear from context.
So, declaring that the AI “does not have good intentions” is neither here nor there. It makes no difference if you want to describe the AI that way or not.
That would be fine if you and everyone else who tries to argue on this side of the debate do not proceed to then conclude from the statement that the AI has “good intentions” that it is making some sort of “error” when it fails to act on our cries that “doing X isn’t good!” or “doing X isn’t what we meant!”.
Saying an AI has “good intentions” strongly implies that it cares about what is good, which is, y’know, completely false for a pleasure maximiser. (No-one is claiming that FAI will do evil things just because it’s clever, but a pleasure maximiser is not FAI.)
That would be fine if you and everyone else who tries to argue on this side of the debate do not proceed to then conclude from the statement that the AI has “good intentions” that it is making some sort of “error” when it fails to act on our cries that “doing X isn’t good!” or “doing X isn’t what we meant
The point doesn’t need to be argued for on the basis of definitions. Given one set of assumptions, one systems architecture, it is entirely natural that an AI would pursue its goals against is own information, and against the protests of humans;. But on other assumptions, it is utterly bizarre that an AI would ever do that....it would be not merely an error, in the sense of a bug, a failure on the part of the programmers to code their intentions, but an unlikely kind of bug that allows the system to continue doing really complex things, instead of degrading it.
Given one set of assumptions, one systems architecture, it is entirely natural that an AI would pursue its goals against is own information, and against the protests of humans;. But on other assumptions, it is utterly bizarre that an AI would ever do that....
If one of its parameters is “do not go against human protests of magnitude greater than X”, then it will not pursue a course of action if enough people protest it. But in this case, avoiding strong human protest is part of its goals.
The AI is ultimately following some procedure, and any outside information or programmer intention or human protest is just some variable that may or may not be taken into consideration.
Given that it’s easier to be wrong than to be right, I’d argue that the AI doing the wrong thing requires -less- overall complexity, regardless of its architecture or assumptions.
If the AI is a query AI—when asked a question, it gives a response—it doesn’t make sense to argue that it would start tiling the universe in smilie faces; that would be an absurd and complex thing that would be very unlikely bordering on impossible. But its -answer- might result in the universe being tiled in smilie faces or some analogously bad result, because that’s easier to achieve than a universe full of happy and fulfilled human beings, and because the humans asking the question asked a different question than they thought they asked.
There’s no architecture, no set of assumptions, where this problem goes away. The problem can be -mitigated-, with endless safety constraints, but there’s not an architecture that doesn’t have the problem, because it’s a problem with the universe itself, -not- the architecture running inside that universe: There are infinitely more wrong answers than right answers.
Given that it’s easier to be wrong than to be right, I’d argue that the AI doing the wrong thing requires -less- overall complexity, regardless of its architecture or assumptions.
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
If the AI is a query AI—when asked a question, it gives a response—it doesn’t make sense to argue that it would start tiling the universe in smilie faces; that would be an absurd and complex thing that would be very unlikely bordering on impossible. But its -answer- might result in the universe being tiled in smilie faces or some analogously bad result, because that’s easier to achieve than a universe full of happy and fulfilled human beings, and because the humans asking the question asked a different question than they thought they asked.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
There’s no architecture, no set of assumptions, where this problem goes away. The problem can be -mitigated-, with endless safety constraints, but there’s not an architecture that doesn’t have the problem, because it’s a problem with the universe itself, -not- the architecture running inside that universe:
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
True, but that doesn’t change anything.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
Actually, the issue is technical terms, vs normal usage.
When using technical terms it is important to stick to the convention.
In normal usage, however, we rely on context to supply disambiguating information.
The word “intention” is not a technical term. And, in the context in which I used it the meaning was clear to most people on LW who commented.
For clarity, the intended meaning was that it should distinguish a type of AI whose goals say something like “Kill my enemies and make your creator rich” or “Destroy all living things”. Those would not be AIs with “good intentions” because they would have been deliberately set up to do bad things.
Most people who write about these scenarios use one or another choice of words to try to indicate that the issue being considered is whether an AI that was programmed with “prima facie good intentions” might nevertheless carry out those “prima facie good intentions” in such a way as to actually do something that we humans consider horrible. Different commentators have chosen different ways to get that idea across—some of them said “good intentions,” none of them to my knowledge said “prima facie good intentions” and many used some other very, very similar form of words to “good intentions”. In all of the essays and news reports and papers I have seen there is some attempt to convey the idea that we are not addressing an overtly evil AI.
As I said, in almost all cases, commentors have picked that usage up straight away.
I think I spotted a bit of confusion: The programmers of the “make everyone happy” AI had good intentions. But the AI itself does not have good intentions; because the intent “make everyone happy” is not good, albeit in a way that its programmers did not think of.
Not really.
The problem is that nshepperd is talking as if the term “intention” has some exact, agreed-upon technical definition.
It does not. (It might have in nshepperd’s mind, but not elsewhere)
I have made it clear that the term is being used, by me, in a non-technical sense, whose meaning is clear from context.
So, declaring that the AI “does not have good intentions” is neither here nor there. It makes no difference if you want to describe the AI that way or not.
That would be fine if you and everyone else who tries to argue on this side of the debate do not proceed to then conclude from the statement that the AI has “good intentions” that it is making some sort of “error” when it fails to act on our cries that “doing X isn’t good!” or “doing X isn’t what we meant!”.
Saying an AI has “good intentions” strongly implies that it cares about what is good, which is, y’know, completely false for a pleasure maximiser. (No-one is claiming that FAI will do evil things just because it’s clever, but a pleasure maximiser is not FAI.)
You can’t use words any way you like.
The point doesn’t need to be argued for on the basis of definitions. Given one set of assumptions, one systems architecture, it is entirely natural that an AI would pursue its goals against is own information, and against the protests of humans;. But on other assumptions, it is utterly bizarre that an AI would ever do that....it would be not merely an error, in the sense of a bug, a failure on the part of the programmers to code their intentions, but an unlikely kind of bug that allows the system to continue doing really complex things, instead of degrading it.
If one of its parameters is “do not go against human protests of magnitude greater than X”, then it will not pursue a course of action if enough people protest it. But in this case, avoiding strong human protest is part of its goals.
The AI is ultimately following some procedure, and any outside information or programmer intention or human protest is just some variable that may or may not be taken into consideration.
That just restated my point that the different sides in the debate are just making different assumptions about likely AI architectures.
But the AI researchers win, because they know what real world AI architectures are, whereas MIRI is guessing.
Given that it’s easier to be wrong than to be right, I’d argue that the AI doing the wrong thing requires -less- overall complexity, regardless of its architecture or assumptions.
If the AI is a query AI—when asked a question, it gives a response—it doesn’t make sense to argue that it would start tiling the universe in smilie faces; that would be an absurd and complex thing that would be very unlikely bordering on impossible. But its -answer- might result in the universe being tiled in smilie faces or some analogously bad result, because that’s easier to achieve than a universe full of happy and fulfilled human beings, and because the humans asking the question asked a different question than they thought they asked.
There’s no architecture, no set of assumptions, where this problem goes away. The problem can be -mitigated-, with endless safety constraints, but there’s not an architecture that doesn’t have the problem, because it’s a problem with the universe itself, -not- the architecture running inside that universe: There are infinitely more wrong answers than right answers.
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
True, but that doesn’t change anything.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
What’s “Probably not” the case?
Actually, the issue is technical terms, vs normal usage.
When using technical terms it is important to stick to the convention.
In normal usage, however, we rely on context to supply disambiguating information.
The word “intention” is not a technical term. And, in the context in which I used it the meaning was clear to most people on LW who commented.
For clarity, the intended meaning was that it should distinguish a type of AI whose goals say something like “Kill my enemies and make your creator rich” or “Destroy all living things”. Those would not be AIs with “good intentions” because they would have been deliberately set up to do bad things.
Most people who write about these scenarios use one or another choice of words to try to indicate that the issue being considered is whether an AI that was programmed with “prima facie good intentions” might nevertheless carry out those “prima facie good intentions” in such a way as to actually do something that we humans consider horrible. Different commentators have chosen different ways to get that idea across—some of them said “good intentions,” none of them to my knowledge said “prima facie good intentions” and many used some other very, very similar form of words to “good intentions”. In all of the essays and news reports and papers I have seen there is some attempt to convey the idea that we are not addressing an overtly evil AI.
As I said, in almost all cases, commentors have picked that usage up straight away.