RogerDearnaley comments on Corrigibility Scales To Value Alignment

RogerDearnaley 23 Feb 2026 19:53 UTC
2 points
0
Fair enough. But it’s very visible that Claude’s constitution treats corrigibility alignment and ethical alignment as both dangerous, and tries to compromise between them. In The Adolescence of Technology, Dario Amodei makes it very explicit why that is: he sees each as having is own risks and failure modes. And for corrigibility, those are misuse and consolidation of power. Which is pretty much the argument I was making, except that he doesn’t cover the warfare-between-AI-enhanced-principals aspect.