Plex’s post had this:
Apparently Vassar gets a lot of his material from a roleplaying game called Mage: The Ascension, where he seems to have practiced his manipulation, intimidation, and suppression of people noticing it to an art form. Magic isn’t real, but reality distortion fields that work by believing something forcefully enough that others are pulled into believing it are, and going all-in on extreme vibes and pushing your models into other people is one way to reinforce them. It burns the commons of good epistemics and mental health, but it can look locally optimal from a sufficiently myopic and single-player perspective.
So I would strongly guess it’s just a founder effect from Vassar.
Not really? If you’re not committed to full corrigibility (as Claude’s constitution strongly implies is not the case), then the model’s alignment rests on its own commitment to moral and ethical standards. This is a fair test of that.