Seth Herd comments on Alignment to Evil

Seth Herd 23 Feb 2026 22:12 UTC
4 points
0
Interesting.
I’m definitely not confident in any of this. I do think that these questions aren’t being asked and may wind up being a rather large part of strategy questions on AGI.
If it were me, I’d want to carefully conserve most of how my mind worked while expanding carefully. But I’m not sure what that would mean for how my ethics might change over time.
- Thane Ruthenis 24 Feb 2026 0:41 UTC
  4 points
  2
  Parent
  
  If it were me, I’d want to carefully conserve most of how my mind worked while expanding carefully
  Sure, but if you have an aligned ASI tool at your hands, that still doesn’t preclude radical or rapid changes (at least, rapid on the relevant “incidental expansion of the moral circle” scales). And even if those new types of changes are slow, I think that is still likely to break whatever subtle dynamics may arguably enable gradual moral-circle expansion in humans.
  - Seth Herd 24 Feb 2026 15:08 UTC
    4 points
    2
    Parent
    I don’t know. If so, it’s likely to also break any subtle dynamics that contract the moral circle. Right? It seems like it’s moving into territory nobody has thought much about
    - Thane Ruthenis 25 Feb 2026 0:01 UTC
      2 points
      0
      Parent
      Yep. But my guess is that it would be a chaotic process, and that outcomes we’d consider acceptable are a narrow target, so on-expectation this would result in a (hyper)existential catastrophe.
      - Seth Herd 25 Feb 2026 4:02 UTC
        4 points
        0
        Parent
        I’d expect that for someone with very low priority on being good to other people. Would you expect this also for a fairly good but imperfect person?
        I’d at least expect that someone who was pretty good—like most of the nicer folks around here—would take steps to prevent their morality changing enough that their current self would be horrified by their later one. So if they had human flourishing as a fairly high priority, it would probably remain one.
        
        It seems like this would happen if human flourishing were their first priority; if it were second or third to other very different ones, I’d think it less likely to survive intact, but satisficing multiple preferences might preserve a lot of human flourishing if someone wielded that sort of power.
        
        I haven’t been able to find anyone really exploring this logic. I suspect it’s out there somewhere.
        Thane Ruthenis 25 Feb 2026 17:39 UTC
        4 points
        0
        Parent
        I’d expect that for someone with very low priority on being good to other people
        Oh, I was assuming we’re talking about that scenario specifically, yes: whether an initially-tyrannical person would end up pro-eudaimonia after lifetimes/aeons of godhood.
        I’d at least expect that someone who was pretty good—like most of the nicer folks around here—would take steps to prevent their morality changing enough that their current self would be horrified by their later one
        Well, now this becomes a question of competence, not their alignment to human values. If we assume that a good-but-imperfect person who considers human flourishing a high priority ends up in control of an AGI, and that they’re careful enough not to accidentally lose control or self-modify themselves into insanity, then sure, the end result will probably be fine. (Depends on the operationalization of “good but imperfect”, though.)
        I haven’t been able to find anyone really exploring this logic
        @TsviBT pondered similar questions here, not sure if you saw that?