Signer comments on A non-review of “If Anyone Builds It, Everyone Dies”

Signer 1 Oct 2025 2:29 UTC
2 points
0

I just think that it wouldn’t be the case that we had one shot but we missed it, but rather had many shots and missed them all.

This interpretation only works if by missed shots you mean “missed opportunities to completely solve alignment”. Otherwise you can observe multiple failures along the way and fix observable scheming, but you only need to miss one alignment failure on the last capability level. The point is just that your monitoring methods, even improved after many failures to catch scheming in pre-takeover regime, are finally tested only when AI is really can take over. Because real ability to take over is hard to fake. And you can’t repeat this test after you improved your monitoring, if you failed. Maybe your alignment training after previous observed failure in pre-takeover regime really made AI non-scheming. But maybe you just missed some short thought where AI decided to not think about takeover since it can’t win yet. And you’ll need to rely on your monitoring without actually testing whether it can catch all such possibilities that depend on actual environment that allows takeover.