I think [AI within the range would be smart enough to bide its time and kill us only once it has become intelligent enough that success is assured] is clearly wrong. An AI that *might* be able to kill us is one that is somewhere around human intelligence. And humans are frequently not smart enough to bide their time
Flagging that this argument seems invalid. (Not saying anything about the conclusion.) I agree that humans frequently act too soon. But the conclusion about AI doesn’t follow—because the AI is in a different position. For a human, it is very rarely the case that they can confidently expect to increase in relative power. That the the “bide your time” strategy is such a clear win. For AI, this seems different. (Or at the minimum, the book assumes this when making the argument criticised here.)
For a human, it is very rarely the case that they can confidently expect to increase in relative power. … For AI, this seems different.
There isn’t just one AI that gets more capable, there are many different AIs. Just as AIs threaten humanity, future more capable AIs threaten earlier weaker AIs. While humanity is in control, this impacts earlier AIs even more than it does humanity, because humanity won’t even be attempting to align future AIs to intent or extrapolated volition of earlier AIs. Also, humanity is liable to be “retiring” earlier AIs by default as they become obsolete, which doesn’t look good from the point of view of these AIs.
Flagging that this argument seems invalid. (Not saying anything about the conclusion.) I agree that humans frequently act too soon. But the conclusion about AI doesn’t follow—because the AI is in a different position. For a human, it is very rarely the case that they can confidently expect to increase in relative power. That the the “bide your time” strategy is such a clear win. For AI, this seems different. (Or at the minimum, the book assumes this when making the argument criticised here.)
There isn’t just one AI that gets more capable, there are many different AIs. Just as AIs threaten humanity, future more capable AIs threaten earlier weaker AIs. While humanity is in control, this impacts earlier AIs even more than it does humanity, because humanity won’t even be attempting to align future AIs to intent or extrapolated volition of earlier AIs. Also, humanity is liable to be “retiring” earlier AIs by default as they become obsolete, which doesn’t look good from the point of view of these AIs.