One difficulty is again that the scheming is particularly correlated. Firing a single spy might not be traumatic for your organization’s productivity, but ceasing all deployment of untrusted models plausibly grinds you to a halt.
And in terms of fixing them, note that it’s pretty hard for fix spies! I think you’re in a better position for fixing schemers than spies, e.g. see here.
Another consideration about schemers is you might not be able to “fire” them or fix them easily, even if you can reliably trigger the behavior.
You can undeploy them, if you want!
One difficulty is again that the scheming is particularly correlated. Firing a single spy might not be traumatic for your organization’s productivity, but ceasing all deployment of untrusted models plausibly grinds you to a halt.
And in terms of fixing them, note that it’s pretty hard for fix spies! I think you’re in a better position for fixing schemers than spies, e.g. see here.