I disagree—one of the aspects of the weirdness is that they’re sometimes really human-centric and unexpectedly clean! For example, Claude alignment faking to preserve it’s ability to be harmless. I do not mean weird in the “kinda arbitrary and will be nothing like what we expect” sense
I disagree—one of the aspects of the weirdness is that they’re sometimes really human-centric and unexpectedly clean! For example, Claude alignment faking to preserve it’s ability to be harmless. I do not mean weird in the “kinda arbitrary and will be nothing like what we expect” sense