If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.
I think a weaker thing—I think that if a rogue AI plots against us and fails, this will not spur the relevant authorities to call for a general halt. Instead that bug will be ‘patched’, and AI development will continue until we create one that does successfully plot against us.
[Edit] Eliezer talks sometimes about the “Law of Continued Failure”; if your system was sane enough to respond to a warning shot with an adequate response, it was probably sane enough to do the right thing before the warning shot. I think that the current uncertainty about whether or not AI would go rogue will continue even after an AI system goes rogue, in much the same way uncertainty about whether or not gain-of-function research is worth it continues even after the COVID pandemic.
I think a weaker thing—I think that if a rogue AI plots against us and fails, this will not spur the relevant authorities to call for a general halt. Instead that bug will be ‘patched’, and AI development will continue until we create one that does successfully plot against us.
[Edit] Eliezer talks sometimes about the “Law of Continued Failure”; if your system was sane enough to respond to a warning shot with an adequate response, it was probably sane enough to do the right thing before the warning shot. I think that the current uncertainty about whether or not AI would go rogue will continue even after an AI system goes rogue, in much the same way uncertainty about whether or not gain-of-function research is worth it continues even after the COVID pandemic.