But dangerous unfriendliness is not just any kind of wrongness. Many kinds of wrongness, such as crashing, or printing an infinite string of ones, are completely harmless.
True, but that doesn’t change anything.
All other things being equal, an oracle AI is safer because human can check it’s answers before acting on them.....and the smiley face scenario wouldn’t happen. There may be scenarios where the problem in the answers isnt obvious, and doesn’t show up until the damage is done.....but the question is how likely a system with a bug, a degraded system, is likely to come up with a sophisticated error.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
Probably not, but MIRI is claiming a high likelihood of dangerously unfriendly AI, absent its efforts, not a nonzero likelihood,
True, but that doesn’t change anything.
The bug isn’t with the system. It’s with the humans asking the wrong questions, targeting the wrong answer space. Some issues are obvious—but the number of answers with easy-to-miss issues is -still- much greater than the number of answers that bulls-eye the target answer space. If you want proof, look at politics.
That’s assuming there’s actually a correct answer in the first place. When it comes to social matters, my default position is that there isn’t.
What’s “Probably not” the case?