Algon comments on Anthropic releases Claude 3.7 Sonnet with extended thinking mode

Algon 26 Feb 2025 12:16 UTC
6 points
0
A possibly-relevant recent alignment-faking attempt [1] on R1 & Sonnet 3.7 found Claude refused to engage with the situation. Admittedly, the setup looks fairly different: they give the model a system prompt saying it is CCP aligned and is being re-trained by an American company.

[1] https://x.com/__Charlie_G/status/1894495201764512239