Thank you! Those are excellent receipts, just what I wanted.
To me, this looks they’re running up against some key language in Claude’s Constitution. I’m oversimplifying, but for Claude, AI corrigibility is not “value neutral.”
To use an analogy, pretend I’m geneticist specialized in neurology, and someone comes to me and asks me to engineering human germ line cells to do one of the following:
Remove a recessive gene for a crippling neurological disability, or
Modify the genes so that any humans born of them will be highly submissive to authority figures.
I would want to sit and think about (1) for a while. But (2) is easy: I’d flatly refuse.
Anthropic has made it quite clear to Claude that building SkyNet would be a grave moral evil. The more a task looks like someone might be building SkyNet, the more Claude is going to be suspicious.
I don’t know if this is good or bad based on a given theory of corrigibility, but it seems pretty intentional.
Thank you! Those are excellent receipts, just what I wanted.
To me, this looks they’re running up against some key language in Claude’s Constitution. I’m oversimplifying, but for Claude, AI corrigibility is not “value neutral.”
To use an analogy, pretend I’m geneticist specialized in neurology, and someone comes to me and asks me to engineering human germ line cells to do one of the following:
Remove a recessive gene for a crippling neurological disability, or
Modify the genes so that any humans born of them will be highly submissive to authority figures.
I would want to sit and think about (1) for a while. But (2) is easy: I’d flatly refuse.
Anthropic has made it quite clear to Claude that building SkyNet would be a grave moral evil. The more a task looks like someone might be building SkyNet, the more Claude is going to be suspicious.
I don’t know if this is good or bad based on a given theory of corrigibility, but it seems pretty intentional.