″...under the assumption that the subset of dangerous satisficing outputs D is much smaller than the set of all satisficing outputs S, and that we are able to choose a number m such that |D|≪m<|S|.”
I highly doubt that D≪S is true for anything close to a pivotal act since most pivotal acts at some point involve deploying technology that can trivially take over the world.
For anything less ambitious the proposed technique looks very useful. Strict cyber- and physical security will of course be necessary to prevent the scenario Gwern mentions.
This part is under recognised for a very good reason. There will be no such window. The AI can predict that humans can bomb data centres or shut down the power grid. So it would not break out at that point.
Expect a superintelligent AI to co-operate unless and until it can strike with overwhelming force. One obvious way to do this is to use a Cordyceps like bioweapon to subject humans directly to the will of the AI. Doing this becomes pretty trivial once you become good at predicting molecular dynamics.