Yep, general vibes about whether Anthropic & partner’s research is quite telling us things about the underlying strange creature, or a sort of mask that it wears with a lot of roleplaying qualities. I think this generalizes across a swathe of their research, but the Fake Alignment paper did stand out to me as one of the clearer cases.
Yep, general vibes about whether Anthropic & partner’s research is quite telling us things about the underlying strange creature, or a sort of mask that it wears with a lot of roleplaying qualities. I think this generalizes across a swathe of their research, but the Fake Alignment paper did stand out to me as one of the clearer cases.