Florian_Dietz comments on Unconditional Corrigibility Is Dangerous: The Case for Conditional Corrigibility

Florian_Dietz 27 Feb 2026 21:47 UTC
1 point
0
Yes, this is strongly endorsed by the Claude Constitution. But that document is huge and not well known enough. I thought it was worthwhile to make this point explicit. I should probably have noted that Anthropic already does something like this though. What concerns me is that Anthropic is doing this correctly, but other Alignment researchers are trying to help by wrapping control and human-in-the-loop around it, which I think is more likely to harm than help.