I agree that it would be easier (think of the Rogue Replication Timeline). But what if there is no publicly visible Chernobyl? While the AI-2027 forecast has Agent-3 catch Agent-4 and reveal its misalignment to OpenBrain’s employees, even the forecast’s authors doubt that Agent-4 will be caught. The uncaught Agent-4 means that mankind races ahead without even realizing that the AI could be misaligned.
Pause AI if it causes a major disaster (e.g. like Chernobyl)
I agree that it would be easier (think of the Rogue Replication Timeline). But what if there is no publicly visible Chernobyl? While the AI-2027 forecast has Agent-3 catch Agent-4 and reveal its misalignment to OpenBrain’s employees, even the forecast’s authors doubt that Agent-4 will be caught. The uncaught Agent-4 means that mankind races ahead without even realizing that the AI could be misaligned.