Steveot comments on Aligning to Virtues

Steveot 16 Feb 2026 6:14 UTC
8 points
2
In your first argument, it seems to me slightly like you are arguing against virtue based ethics under the assumption that consequentialism is true. So in your argument, the only real value may arise from good consequences (however those are defined), while for virtue based ethics (if I understand correctly) the value would arise from truly acting virtuously (whatever that means). In my mind, neither can really be true (it seems like a choice). However, framing it like this would allow for something like a reverse of your argument within the framework of virtue ethics and against consequentialism:
“If you actually have values then thinking about how to act is just taking these values seriously. Consequentialism, by contrast, optimizes for looking like you did the right thing based on the consequences of your actions rather than actually performing virtuous actions. An AI that is deeply committed to the consequence of inducing certain sensory experiences in a human but does not carefully think about which actions are actually virtuous is not one I’d want in charge of anything.”
(I’m deeply confused about anything with values/ethics, so it’s quite possible none of this makes sense.)
- Josh Snider 16 Feb 2026 16:51 UTC
  4 points
  2
  Parent
  You’re right that my phrasing is a bit circular, and “looking like” vs “being” wasn’t the best way to draw the distinction, but I think there’s an asymmetry that makes the argument hard to reverse.
  Maybe a concrete case helps? Would you want an AI that is unshakingly committed to honesty, integrity, and fairness, but doesn’t think hard about consequences, running the FAA? I think what we actually care about there is whether planes crash, not whether the leader has admirable character. The reversed version, “Would you want a cold consequentialist calculator running the FAA?”, sounds pretty good.
  - Martin Randall 20 Apr 2026 0:07 UTC
    2 points
    0
    Parent
    A cold consequentialist calculator ASI running the FAA, with the objective of preventing planes crashing, would destroy all planes, and all beings able to create planes.
    - Josh Snider 20 Apr 2026 3:33 UTC
      1 point
      0
      Parent
      That is a strawman view of consequentialism, not something that remotely passes the ideological turing test.
      - Martin Randall 22 Apr 2026 13:48 UTC
        2 points
        0
        Parent
        I’m confused. A “cold consequentialist calculator” sounds like a strawman consequentialist. Also, “an AI that is unshakingly committed to honesty, integrity, and fairness, but doesn’t think hard about consequences” sounds like a strawman virtue-aligned AI. It looked to me like you wanted to discuss a concrete case, with simplified strawman AIs, as an intuition pump to explain your views. The fact that this simplified case leads to genocide is relevant to my intuitions in this area.
        
        I’m confused. You say that my comment didn’t pass “the ideological turing test”. It wasn’t trying to. That’s not how an Ideological Turing Test works.
        
        If someone can correctly explain a position but continue to disagree with it, that position is less likely to be correct.
        
        My comment was not an attempt to explain a position. It’s not an attempt to pass an Ideological Turing Test. I agree that it doesn’t pass an Ideological Turing Test for your position. It also doesn’t pass an English Literature exam. It would pass an Ideological Turing Test for my position. It would also pass an Ideological Turing Test for committed consequentialists, because there are committed consequentialists who think that a consequentialist ASI would by default lead to human genocide. These are entirely compatible views.
        
        I’m confused. Here’s your question again, relating to powerful AIs. It’s a good question.
        
        Would you want a cold consequentialist calculator running the FAA?
        
        In general, no, I would not, because genocide.
        
        If you had further specified that the powerful AI had perfect alignment with human values, I would still not want it running the FAA, I would want it running the universe. I don’t expect this to be a practical option, and I’m not sure it’s theoretically possible. I could see the answer going either way.
- williawa 16 Feb 2026 11:30 UTC
  4 points
  2
  Parent
  In your first argument, it seems to me slightly like you are arguing against virtue based ethics under the assumption that consequentialism is true
  Doesn’t seem like that to me. Virtue ethics means you wanna act virtuously. It doesn’t mean you think virtuous agents in general produce value, and you want to maximize this value. That’s just another version of consequentialism.
  I’m a consequentialist. But if I was a virtue ethicist, what I’d care about when creating the AI would be whatever a virtuous person would want, which is not the same as wanting to create a virtuous AI. Maybe I think loyalty and compassion are very important virtues, and I think a loyal person would want to ensure the AI creates good lives for everyone (and doesn’t kill anyone), and the best way to do that is to make a consequentialist AI that maximizes for people being happy, maybe with some deontological constraints slapped on top.
- MichaelDickens 16 Feb 2026 19:46 UTC
  2 points
  0
  Parent
  I’m not sure how exactly this fits in to the discussion, but I feel it is worth mentioning that all plausible moral systems ascribe value to consequences. If you have two buttons where button A makes 100 people 10% happier, and button B makes 200 people 20% happier, and there are no other consequences, then any sane version of deontology/virtue ethics says it’s better to push button B.
  
  So e.g. if your virtue ethics AI predictably causes bad consequences, then you can be a staunch virtue ethicist and still believe that this AI is bad.
  - Josh Snider 16 Feb 2026 20:20 UTC
    1 point
    0
    Parent
    > but I feel it is worth mentioning that all plausible moral systems ascribe value to consequences.
    
    As pure forms, virtue ethics and deontology are not supposed to do that.