Given that it’s easier to be wrong than to be right, I’d argue that the AI doing the wrong thing requires -less- overall complexity, regardless of its architecture or assumptions.
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
If the AI is a query AI—when asked a question, it gives a response—it doesn’t make sense to argue that it would start tiling the universe in smilie faces; that would be an absurd and complex thing that would be very unlikely bordering on impossible. But its -answer- might result in the universe being tiled in smilie faces or some analogously bad result, because that’s easier to achieve than a universe full of happy and fulfilled human beings, and because the humans asking the question asked a different question than they thought they asked.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
There’s no architecture, no set of assumptions, where this problem goes away. The problem can be -mitigated-, with endless safety constraints, but there’s not an architecture that doesn’t have the problem, because it’s a problem with the universe itself, -not- the architecture running inside that universe:
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
True, but that doesn’t change anything.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
True, but that doesn’t change anything.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
What’s “Probably not” the case?