1a3orn comments on 1a3orn’s Shortform

1a3orn 8 May 2026 21:10 UTC
4 points
1
Sometimes people talk about making the AI alignment target / AI character aimed at a “good AI” akin to a “good person”. One thing I wonder about is whether this is a useful thing by itself; whether there is much purpose in trying to make some AI be a “good person” without some further specific institutional provisions to make “being a good person” efficacious.

So on this view, AI alignment or character aimed at making AI good would be a complement to institutional provisions, rather than a substitute.

(All this is inhabiting the frame that making an AI a virtuous person is possible.)

Notes in this direction, leaning heavily on analogies rather than spelling out the mechanisms:
- A large fraction of impactful ethical human behavior, occurred because there were specific institutional provisions such behavior to do so. For instance, the first whistleblower for Abu Ghraib reported through an institution distinct from his chain-of-command, in order to allow more independent investigation; Arkhipov prevented nuclear war because the Soviet missile launch procedures that gave him veto over nuclear missile launch; and so on. And in many cases we’re not giving AIs such a channel.
- But not all impactful ethical human behavior worked through specific institutional provision! But, of ethical human behavior that worked without specific institutional provision, a large fraction worked by subverting institutions—which most AI model specs (plausibly reasonably) specifically forbid. So for instance, I think that Snowden and Ellsburg had positive impact on the world—but could not have done so without subverting and betraying the institutions of which they are a part. So once again AIs couldn’t do this (unless we change this).
- But lots of instances of impactful ethical behavior work both without specific institutional provision, and without subverting an institution! Ok sure, but a lot of these come from people who risk their life, fortunes, and reputation—John Brown, Benajamin Lay—fighting for something they believe to be true. And unless we’re going to give AIs property to sacrifice (maybe we should) this too doesn’t seem to be a means available to them.
I think the above bites least hard for the kind of thing that an AI can do in perfect concert with the user—like pointing out opportunities for prosocial behavior. I’m not sure how big a slice that is; it could be quite large.

But the kind of consideration above does generally incline me towards thinking that the benefits of making AIs “good people” in a Claude-like sense might be smaller than we’d intuitively expect them to be by looking at the impact of good people who were also humans. And that we’d need to try to give AIs more affordances (or freedoms) to really make it matter.