That would be fine if you and everyone else who tries to argue on this side of the debate do not proceed to then conclude from the statement that the AI has “good intentions” that it is making some sort of “error” when it fails to act on our cries that “doing X isn’t good!” or “doing X isn’t what we meant
The point doesn’t need to be argued for on the basis of definitions. Given one set of assumptions, one systems architecture, it is entirely natural that an AI would pursue its goals against is own information, and against the protests of humans;. But on other assumptions, it is utterly bizarre that an AI would ever do that....it would be not merely an error, in the sense of a bug, a failure on the part of the programmers to code their intentions, but an unlikely kind of bug that allows the system to continue doing really complex things, instead of degrading it.
Given one set of assumptions, one systems architecture, it is entirely natural that an AI would pursue its goals against is own information, and against the protests of humans;. But on other assumptions, it is utterly bizarre that an AI would ever do that....
If one of its parameters is “do not go against human protests of magnitude greater than X”, then it will not pursue a course of action if enough people protest it. But in this case, avoiding strong human protest is part of its goals.
The AI is ultimately following some procedure, and any outside information or programmer intention or human protest is just some variable that may or may not be taken into consideration.
Given that it’s easier to be wrong than to be right, I’d argue that the AI doing the wrong thing requires -less- overall complexity, regardless of its architecture or assumptions.
If the AI is a query AI—when asked a question, it gives a response—it doesn’t make sense to argue that it would start tiling the universe in smilie faces; that would be an absurd and complex thing that would be very unlikely bordering on impossible. But its -answer- might result in the universe being tiled in smilie faces or some analogously bad result, because that’s easier to achieve than a universe full of happy and fulfilled human beings, and because the humans asking the question asked a different question than they thought they asked.
There’s no architecture, no set of assumptions, where this problem goes away. The problem can be -mitigated-, with endless safety constraints, but there’s not an architecture that doesn’t have the problem, because it’s a problem with the universe itself, -not- the architecture running inside that universe: There are infinitely more wrong answers than right answers.
Given that it’s easier to be wrong than to be right, I’d argue that the AI doing the wrong thing requires -less- overall complexity, regardless of its architecture or assumptions.
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
If the AI is a query AI—when asked a question, it gives a response—it doesn’t make sense to argue that it would start tiling the universe in smilie faces; that would be an absurd and complex thing that would be very unlikely bordering on impossible. But its -answer- might result in the universe being tiled in smilie faces or some analogously bad result, because that’s easier to achieve than a universe full of happy and fulfilled human beings, and because the humans asking the question asked a different question than they thought they asked.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
There’s no architecture, no set of assumptions, where this problem goes away. The problem can be -mitigated-, with endless safety constraints, but there’s not an architecture that doesn’t have the problem, because it’s a problem with the universe itself, -not- the architecture running inside that universe:
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
True, but that doesn’t change anything.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
The point doesn’t need to be argued for on the basis of definitions. Given one set of assumptions, one systems architecture, it is entirely natural that an AI would pursue its goals against is own information, and against the protests of humans;. But on other assumptions, it is utterly bizarre that an AI would ever do that....it would be not merely an error, in the sense of a bug, a failure on the part of the programmers to code their intentions, but an unlikely kind of bug that allows the system to continue doing really complex things, instead of degrading it.
If one of its parameters is “do not go against human protests of magnitude greater than X”, then it will not pursue a course of action if enough people protest it. But in this case, avoiding strong human protest is part of its goals.
The AI is ultimately following some procedure, and any outside information or programmer intention or human protest is just some variable that may or may not be taken into consideration.
That just restated my point that the different sides in the debate are just making different assumptions about likely AI architectures.
But the AI researchers win, because they know what real world AI architectures are, whereas MIRI is guessing.
Given that it’s easier to be wrong than to be right, I’d argue that the AI doing the wrong thing requires -less- overall complexity, regardless of its architecture or assumptions.
If the AI is a query AI—when asked a question, it gives a response—it doesn’t make sense to argue that it would start tiling the universe in smilie faces; that would be an absurd and complex thing that would be very unlikely bordering on impossible. But its -answer- might result in the universe being tiled in smilie faces or some analogously bad result, because that’s easier to achieve than a universe full of happy and fulfilled human beings, and because the humans asking the question asked a different question than they thought they asked.
There’s no architecture, no set of assumptions, where this problem goes away. The problem can be -mitigated-, with endless safety constraints, but there’s not an architecture that doesn’t have the problem, because it’s a problem with the universe itself, -not- the architecture running inside that universe: There are infinitely more wrong answers than right answers.
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
True, but that doesn’t change anything.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
What’s “Probably not” the case?