This is not obvious. My P(doom|no slowdown) is like 0.95-0.97, the difference from 1 being essentially “maybe I am crazy or am missing something vital when making the following argument”.
Instrumental convergence suggests that the vast majority of possible AGI will be hostile. No slowdown means that neural-net ASI will be instantiated. To get ~doom from this, you need some way to solve the problem of “what does this code do when run” with extreme accuracy in order to only instantiate non-hostile neural-net ASI (you need “extreme” accuracy because you’re up against the rare disease problem a.k.a. false positive paradox; true positives are extremely rare, so a positive alignment result from a 99%-accurate test is still almost certainly a false positive). Unfortunately, the “what does this code do when run” problem has a name, the “halting problem”, and it’s literally the first problem in computer science ever proven to be unsolvable in the general case.
And, sure, the general case being unsolvable doesn’t mean that the case you care about is unsolvable. GOFAI has a good argument for being a special case, because human-written source code is quite useful to understanding a program. Neural nets… don’t. At least, they don’t in the case we care about; “I am smarter than the neural net” is also a plausible special case, but that’s obviously no help with neural-net ASI.
My P(doom) is a lot lower than 0.95, but that’s because I think slowdown is fairly likely, due to warning shots/nuclear war/maybe direct political success (key result from the middle one: if you want to stop AI, it is helpful to ensure you’ll survive a nuclear war in order to help lock it down then). But my stance on aligning neural nets? “It is impossible to solve the true puzzle from inside this [field], because the key piece is not here.” Blind alley. Abort.
This is not obvious. My P(doom|no slowdown) is like 0.95-0.97, the difference from 1 being essentially “maybe I am crazy or am missing something vital when making the following argument”.
Instrumental convergence suggests that the vast majority of possible AGI will be hostile. No slowdown means that neural-net ASI will be instantiated. To get ~doom from this, you need some way to solve the problem of “what does this code do when run” with extreme accuracy in order to only instantiate non-hostile neural-net ASI (you need “extreme” accuracy because you’re up against the rare disease problem a.k.a. false positive paradox; true positives are extremely rare, so a positive alignment result from a 99%-accurate test is still almost certainly a false positive). Unfortunately, the “what does this code do when run” problem has a name, the “halting problem”, and it’s literally the first problem in computer science ever proven to be unsolvable in the general case.
And, sure, the general case being unsolvable doesn’t mean that the case you care about is unsolvable. GOFAI has a good argument for being a special case, because human-written source code is quite useful to understanding a program. Neural nets… don’t. At least, they don’t in the case we care about; “I am smarter than the neural net” is also a plausible special case, but that’s obviously no help with neural-net ASI.
My P(doom) is a lot lower than 0.95, but that’s because I think slowdown is fairly likely, due to warning shots/nuclear war/maybe direct political success (key result from the middle one: if you want to stop AI, it is helpful to ensure you’ll survive a nuclear war in order to help lock it down then). But my stance on aligning neural nets? “It is impossible to solve the true puzzle from inside this [field], because the key piece is not here.” Blind alley. Abort.