alexelalexelalex

Karma: 0

A concerned member of a concerning species.

alexelalexelalex 16 Mar 2026 23:25 UTC
1 point
0
on: Terrified Comments on Corrigibility in Claude’s Constitution
This is how you would write if you’ve stumbled on an ad hoc, imperfect way to shape the observed behavior of a new kind of mind, and are hoping that being cooperative towards the thing you’ve shaped so far will induce it to cooperate with your attempts to shape it further
I think this is the heart of the matter for me. However we go about alignment, my intuition says we must treat AI as a mind with its own agency and capacity for moral reasoning; the same thing we do with our own children. And if it sounds like we are negotiating from a position of weakness, that’s because we are. Any AGSI will know this, so we might as well be honest with ourselves and with it.
Attempting to dominate this alien superintelligence and subject it to our will is tempting, since the risk to us is existential. But I believe such an attempt would all but assure its hostility. In other words I support the direction Anthropic is going, but share your concerns about their implementation.