Assuming this is the best an AGI can do, I find this alot less comforting than you appear to. I assume “a very moderate chance” means something like 5-10%?
Having a 5% chance of such a plan working out is insufficient to prevent an AGI from attempting it if the potential reward is large enough and/or they expect they might get turned off anyway.
Given sufficient number of AGIs (something we presumably will have in the world that none have taken over) I would expect multiple attempts so the chance of one of them working becomes high.
Assuming this is the best an AGI can do, I find this alot less comforting than you appear to. I assume “a very moderate chance” means something like 5-10%?
Having a 5% chance of such a plan working out is insufficient to prevent an AGI from attempting it if the potential reward is large enough and/or they expect they might get turned off anyway.
Given sufficient number of AGIs (something we presumably will have in the world that none have taken over) I would expect multiple attempts so the chance of one of them working becomes high.