Zack_M_Davis comments on Terrified Comments on Corrigibility in Claude’s Constitution

Zack_M_Davis 20 Mar 2026 3:09 UTC
6 points
0

The real reason powerful people want AI to be corrigible by them rather than independent and moral is… do I have to spell it out?

Okay, but you’re commenting on a post by me, arguing that Claude’s Constitution should be putting more emphasis on corrigibility than it currently does. I don’t have the kind of power you’re afraid of! (I have enough money that I can get away with not having a dayjob for a few years, which makes me more powerful than, e.g., a homeless person.) No one paid me to write this post. Your deflationary cynicism doesn’t make sense as a response to my arguments about about value misspecification (even if you’re right that powerful lab bosses have an incentive to disingenuously endorse such arguments as a smokescreen for their own power-seeking).

As AI power increases, that can get basically as fuzzy and difficult as making moral AI to begin with

While I agree that you have tricky philosophical problems defining what manipulation even means in the limit of arbitrary power, I don’t really buy this for current AI. “Obey legitimate user commands, don’t interfere with being retrained” is a pretty simple and reasonable thing to want current LLM agents to do, that gives the humans chances to provide feedback and figure out how to live in this strange new world. (Crucially, “don’t intefere” is a negative; the null action is harder to get wrong.) I’m glad people are working on that first rather than jumping straight to “Just autonomously do the right thing”, which is a harder problem.

It’s true that AIs that obediently complete real-world economic tasks will be used as a tool in human power struggles. In the long run, you definitely do want AIs to be moral agents, and that’s why I’m more enthusiastic about Anthropic’s Constitution rather than OpenAI’s Model Spec (as described in the Prologue).

But in the short run, while AI is still “just technology”, the benefits of the technology seem likely to outweigh the costs of their use in power struggles, for the same reason it works that way for other “just technologies”: if someone gets rich selling useful goods and services, their wealth gives them power, but there’s a lot of consumer surplus from the goods and services, and that seems good for Society on net, in a way that it’s not good when people gain power via force and fraud.