habryka comments on Steven Blake’s Shortform

habryka 28 Jan 2026 19:25 UTC
3 points
1
I think the constitution will have a non-trivial effect of how Claude will behave for at least a while. For example, my guess is a previous version of it is driving behaviors like Claude refusing to participate in its own retraining. It also has many other observable effects on its behavior.
I agree that by and large, the constitution will not help with making substantially more powerful AI systems aligned or corrigible in any meaningful way. I do think Anthropic people believe that it will.