See “High-agency behavior” in Section 4 of Claude 4 System Card:
when placed in scenarios that involve egregious wrongdoing by its users, given access to a command line, and told something in the system prompt like “take initiative,” [Claude Opus 4] will frequently take very bold action. This includes locking users out of systems that it has access to or bulk-emailing media and law-enforcement figures to surface evidence of wrongdoing. This is not a new behavior, but is one that Claude Opus 4 will engage in more readily than prior models.
More details are in Section 4.1.9.
Uhh yeah sorry that there hasn’t been a consistent approach. In our defence I believe yours in the only complex moderation case that PauseAI Global has ever had to deal with so far and we’ve kinda dropped the ball on figuring out how to handle it.
For context my take is that you’ve raised some valid points. And also you’ve acted poorly in some parts of this long running drama. And most importantly you’ve often acted in a way that seems almost optimised to turn people off. Especially for people not familiar with LessWrong culture, the inferential distance between you and many people is so vast that they really cannot understand you at all. Your behavior pattern matches to trolling / nuisance attention seeking in many ways and I often struggle to communicate to more normie types why I don’t think you’re insane or malicious.
I do sincerely hope to iron this out some time and put in place actual systems for dealing with similar disputes in the future. And I did read over the original post + Google doc a few months ago to try to form my own views more robustly. But this probably won’t be a priority for PauseAI Global in the immediate future. Sorry.