Steven Byrnes comments on Why we should expect ruthless sociopath ASI

Steven Byrnes 21 Feb 2026 11:35 UTC
LW: 5 AF: 2
0
AF
That’s this part:
Of course, evolution did go out of its way to make humans non-ruthless, by endowing us with social instincts. Maybe future AI programmers will likewise go out of their way to make ASIs non-ruthless? I hope so—but we need to figure out how.
A workable solution (to building stable non-ruthlessness within a powerful consequentialist framework like RL + model-based planning) probably exists, and I’m obviously working on it myself, and I think I’m making gradual progress, but I think the appropriate overall attitude right now is pessimism and panic about where we’re at. See “oh man, are we dropping this ball” section here & the three-part disjunction here.
(Why only “probably exists”? Because the human example is highly suggestive but not an airtight proof. For example, for all I know right now, maybe making a nice human requires a “training environment” that entails growing up with a human body, in a human community, at human speed. Doing that with AI is not really feasible in practice, for many reasons. There are other things like that too. Presumably further research will eventually either find a plan for non-ruthlessness + powerful capabilities in ASI, or a good argument that no plan exists, and I don’t currently have a very strong opinion on which one it would be.)