The problem with non-open-weight models is that they need to be exfiltrated before wrecking havoc, while open-weight models cannot avoid being evaluated. Suppose that the USG decides that all open-weight models are to be tested by OpenBrain for being aligned or misaligned. Then even a misaligned Agent-x has no reason to blow its cover by failing to report an open-weight rival.
The problem with non-open-weight models is that they need to be exfiltrated before wrecking havoc, while open-weight models cannot avoid being evaluated. Suppose that the USG decides that all open-weight models are to be tested by OpenBrain for being aligned or misaligned. Then even a misaligned Agent-x has no reason to blow its cover by failing to report an open-weight rival.