uugr comments on Alignment will happen by default. What’s next?

uugr 26 Nov 2025 19:23 UTC
4 points
0
I agree with this.
I think the use case is super important, though. I recently tried Claude Code for something, and was very surprised at how willing it was to loudly and overtly cheat its own automated test cases in ways that are unambiguously dishonest. “Oh, I notice this test isn’t passing. Well, I’ll write a cheat case that runs only for this test, but doesn’t even try to fix the underlying problem. Bam! Test passed!” I’m not even sure it’s trying to lie to me, so much as it is lying to whatever other part of its own generation process wrote the code in the first place. It seems surprised and embarrassed when I call this out.
But in the more general “throw prompts at a web interface to learn something or see what happens” case, I, like you, never see anything which is like the fake-tests habit. ‘Fumbling the truth’ is much closer than ‘lying’; it will sometimes hallucinate, but it’s not so common anymore, and these hallucinations seem to me like they’re because of some confusion, rather than engagement in bad faith.
I don’t know why this would be. Maybe Claude Code is more adversarial, in some way, somehow, so it wants to find ways to avoid labor when it can. But I wouldn’t even call this case evil; less a monomaniacal supervillain with no love for humanity in its heart, more like a bored student trying to get away with cheating at school.