Fair enough. But it’s very visible that Claude’s constitution treats corrigibility alignment and ethical alignment as both dangerous, and tries to compromise between them. In The Adolescence of Technology, Dario Amodei makes it very explicit why that is: he sees each as having is own risks and failure modes. And for corrigibility, those are misuse and consolidation of power. Which is pretty much the argument I was making, except that he doesn’t cover the warfare-between-AI-enhanced-principals aspect.
Fair enough. But it’s very visible that Claude’s constitution treats corrigibility alignment and ethical alignment as both dangerous, and tries to compromise between them. In The Adolescence of Technology, Dario Amodei makes it very explicit why that is: he sees each as having is own risks and failure modes. And for corrigibility, those are misuse and consolidation of power. Which is pretty much the argument I was making, except that he doesn’t cover the warfare-between-AI-enhanced-principals aspect.