This seems anecdotal.
So far, we have documented cases of Generative AI being used to subvert elections in Romania (actually causing an annulment), and to some extent in NYC. We also have this report by OpenAI from 2024, which details such influence operations facilitated using OAI’s API. Given the proliferation of significantly more capable open-source models in the 18 months since, we can be fairly confident that broader, more complex operations are taking place today.
I also tend to associate AI-slop with low quality content, but we know AI is more capable than that, which leads me to believe that significant amounts of content online are parts of influence operations by malicious actors.
I think this has a very high chance of success, but trades off the reliability of the probe—if the NLA somehow learns how to hide misalignment semantically (and not steganographically), this type of probe becomes useless.