Also, I believe Bronson answered your question about the harder, adversarial case in his reply to someone else on this post. Here’s the gist, but check out his comment if you want more context:
I would see “is willing to take covert actions” as a very easy subset of the overall problem of scheming. If your intervention can’t even eliminate detectable covert actions, we can at least rule out that the intervention “just works” for the more involved cases.
Also, I believe Bronson answered your question about the harder, adversarial case in his reply to someone else on this post. Here’s the gist, but check out his comment if you want more context: