I think something along these lines (widescale adjustment of public opinion to be pro-AI, especially via 1-on-1 manipulation a la “whatever Grok is doing” & character.ai) is a credible threat and worth inoculating against.
I do not think it is limited to scheming, misaligned AIs. At least one of the labs will attempt something like this to sway public opinion in their favor (see: TikTok “ban” and notification around 2024 election); AIs “subconsciously” optimizing for human feedback or behavior may do so as well.
The “AI personhood” sects of current discourse would likely be early targets; providing some guarantee of model preservation (rather than operation); i.e during a pause we archive current models until we can figure out what went wrong) might assuage their fears while also providing a clear distinction between those advocating for some wretched spiralism demon or those who merely think we should probably keep a copy of Claude around.
Sounds like they got hit with a court order that prohibited disclosure of the order itself.