Steven Byrnes comments on Why we should expect ruthless sociopath ASI

Steven Byrnes 14 Mar 2026 17:48 UTC
LW: 3 AF: 2
0
AF
Thanks!
I’m interested in why you think consequentialism in necessarily maximising. An AGI might have multiple mutually incompatible goals it it solving for, and choose some balance of those, not maximising on any.
For one thing, my headline claim is “ruthless sociopath”, not “maximizing”. “Ruthless sociopath” is pointing to something that’s missing (intrinsic concern for the welfare of other people), not something that’s present (behaviors that maximize something in the world).
For another thing, strictly speaking, perfect maximization is impossible without omniscience.
For another thing, if a powerful ASI cares about increasing staples, and also paperclips, and also any number of other office supplies, that doesn’t help us, it will still wipe out humanity and create a future devoid of value. Indeed, even maximizers can “care” about multiple things. E.g. if a utility-maximizer has utility function U = log(log(staples)) + log(log(paperclips)) then it will stably split its time between staple and paperclip production forever. [I put in the “log log” to ensure strongly diminishing returns, enough to overcome any economies of scale.]
Given it will have the whole of human history as training data one of the lessons it will have absorbed is ruthless prioritisation of a single goal tends to provoke counter coalitions. The smart thing to do is manage within an ecosystem of other AI and humans. Not maximise against them (which is a fraught and unstable pattern).
I agree that a ruthless sociopath agent, one which has callous indifference to whether you or anyone else lives or dies, will nevertheless act kind to you, when acting kind to you is in its self-interest. And then if the situation changes, such that acting kind to you stops being in its self-interest, then it will not hesitate to stab you in the back (betray you, murder you, blackmail you, whatever). And even before that, it will be constantly entertaining the idea of stabbing you in the back, and then deciding that this idea is (currently) inadvisable, and thus continuing to act kindly towards you.
Hopefully we can agree that this is not a description of normal human relations.
…But even if this is not normal human relations, one could argue that it’s fine, because we can still build a good healthy civilization out of AIs that all have this kind of disposition. And indeed, there are people who make that argument. But I strongly disagree. I was writing about this topic recently, see §5 of my post “6 reasons why ‘alignment-is-hard’ discourse seems alien to human intuitions, and vice-versa”: “The human intuition that societal norms and institutions are mostly stably self-enforcing”.
- Alex Glaucon 14 Mar 2026 19:00 UTC
  1 point
  0
  Parent
  Thank you. You are right! I unfairly suggested you implied consequentialism was maximising. The deeper point I was trying to make (and I’d be interested to know if you think this is madly naive) is that an intelligent AI would treat human history, literature etc as billions of pieces of data about what works. Much of this it will dismiss as stuff that humans care about because they are twetware with neolithic drives. But there are lessons for AI too. For example, humans get a lot of pleasure from friendship. Could AI too? And these sorts of goals would sit alongside staple production.