What if it’s only 99.99% sure that it’s 99% sure? Also, in some sense levels of credence are ill-defined, and worse any abstractions of ontology in the real world will be leaky, even computation. It’s not even possible to define what “stop” means without assuming sufficient intent alignment, it’s not fundamentally more difficult to take over the reachable universe than to shut down without leaving the factory. And it also may well turn out to be possible to take over the reachable universe while also in some borderline inadmissible sense technically shutting down without leaving the factory.
Stop can be done with thermodynamics and boundaries, I think? You need to be able to address all the locations the AI is implemented and require that their energy release goes to background. Still some hairy ingredients for asymptotic alignment, but not as bad as “fetch a coffee as fast as possible without that being bad”.
What if it’s only 99.99% sure that it’s 99% sure? Also, in some sense levels of credence are ill-defined, and worse any abstractions of ontology in the real world will be leaky, even computation. It’s not even possible to define what “stop” means without assuming sufficient intent alignment, it’s not fundamentally more difficult to take over the reachable universe than to shut down without leaving the factory. And it also may well turn out to be possible to take over the reachable universe while also in some borderline inadmissible sense technically shutting down without leaving the factory.
Stop can be done with thermodynamics and boundaries, I think? You need to be able to address all the locations the AI is implemented and require that their energy release goes to background. Still some hairy ingredients for asymptotic alignment, but not as bad as “fetch a coffee as fast as possible without that being bad”.