I’m a committed consequentialist, so I would disagree regardless, but I also think the case against consequentialism and for virtue alignment presented here has some real flaws.
First, if you actually have values then thinking about consequences is just what it means to take those values seriously. Virtue ethics, by contrast, optimizes for looking like a virtuous agent rather than being effective at making good outcomes happen. An AI that is deeply committed to the virtue of honesty but doesn’t think carefully about the consequences of its actions is not one I’d want in charge of anything important.
Second the post treats it as a major downside that a consequentialist AI would come into conflict with humans who don’t share its values, but this is a sunk cost for any powerful AI. A virtue-aligned AI doesn’t escape this problem. Everyone loves “integrity” and “honor”, but when they become actual decisions, they’ll generate exactly the same backlash. It may be true that “there’s more agreement on virtues”, but this is superficial. People agree on the words but disagree enormously on how to apply them.
Third, in a world with many powerful AIs, the strategic landscape is ultimately determined by competition between AIs. I want the AI that shares my values to be the one that comes out on top in that competition. A virtue-aligned AI that’s committed to playing fair, being honest, and cooperating nicely is not well-positioned to win against a consequentialist AI that’s willing to do what it takes to achieve its goals.
I would sum up my position on the consequentialism vs virtue ethics debate by saying that virtue ethics is a theory about what makes individual agents admirable, but what really matters is whether making AI is an outcome we want to have happen, which brings us back to the traditional Yudkowsky view that any AI we are likely to build in the near future will be very bad for humanity. I am not as convinced as Soares and co., but that’s still an important thing to have in mind when considering alignment ideas in general.
In your first argument, it seems to me slightly like you are arguing against virtue based ethics under the assumption that consequentialism is true. So in your argument, the only real value may arise from good consequences (however those are defined), while for virtue based ethics (if I understand correctly) the value would arise from truly acting virtuously (whatever that means). In my mind, neither can really be true (it seems like a choice). However, framing it like this would allow for something like a reverse of your argument within the framework of virtue ethics and against consequentialism:
“If you actually have values then thinking about how to act is just taking these values seriously. Consequentialism, by contrast, optimizes for looking like you did the right thing based on the consequences of your actions rather than actually performing virtuous actions. An AI that is deeply committed to the consequence of inducing certain sensory experiences in a human but does not carefully think about which actions are actually virtuous is not one I’d want in charge of anything.”
(I’m deeply confused about anything with values/ethics, so it’s quite possible none of this makes sense.)
You’re right that my phrasing is a bit circular, and “looking like” vs “being” wasn’t the best way to draw the distinction, but I think there’s an asymmetry that makes the argument hard to reverse.
Maybe a concrete case helps? Would you want an AI that is unshakingly committed to honesty, integrity, and fairness, but doesn’t think hard about consequences, running the FAA? I think what we actually care about there is whether planes crash, not whether the leader has admirable character. The reversed version, “Would you want a cold consequentialist calculator running the FAA?”, sounds pretty good.
A cold consequentialist calculator ASI running the FAA, with the objective of preventing planes crashing, would destroy all planes, and all beings able to create planes.
I’m confused. A “cold consequentialist calculator” sounds like a strawman consequentialist. Also, “an AI that is unshakingly committed to honesty, integrity, and fairness, but doesn’t think hard about consequences” sounds like a strawman virtue-aligned AI. It looked to me like you wanted to discuss a concrete case, with simplified strawman AIs, as an intuition pump to explain your views. The fact that this simplified case leads to genocide is relevant to my intuitions in this area.
I’m confused. You say that my comment didn’t pass “the ideological turing test”. It wasn’t trying to. That’s not how an Ideological Turing Test works.
If someone can correctly explain a position but continue to disagree with it, that position is less likely to be correct.
My comment was not an attempt to explain a position. It’s not an attempt to pass an Ideological Turing Test. I agree that it doesn’t pass an Ideological Turing Test for your position. It also doesn’t pass an English Literature exam. It would pass an Ideological Turing Test for my position. It would also pass an Ideological Turing Test for committed consequentialists, because there are committed consequentialists who think that a consequentialist ASI would by default lead to human genocide. These are entirely compatible views.
I’m confused. Here’s your question again, relating to powerful AIs. It’s a good question.
Would you want a cold consequentialist calculator running the FAA?
In general, no, I would not, because genocide.
If you had further specified that the powerful AI had perfect alignment with human values, I would still not want it running the FAA, I would want it running the universe. I don’t expect this to be a practical option, and I’m not sure it’s theoretically possible. I could see the answer going either way.
In your first argument, it seems to me slightly like you are arguing against virtue based ethics under the assumption that consequentialism is true
Doesn’t seem like that to me. Virtue ethics means you wanna act virtuously. It doesn’t mean you think virtuous agents in general produce value, and you want to maximize this value. That’s just another version of consequentialism.
I’m a consequentialist. But if I was a virtue ethicist, what I’d care about when creating the AI would be whatever a virtuous person would want, which is not the same as wanting to create a virtuous AI. Maybe I think loyalty and compassion are very important virtues, and I think a loyal person would want to ensure the AI creates good lives for everyone (and doesn’t kill anyone), and the best way to do that is to make a consequentialist AI that maximizes for people being happy, maybe with some deontological constraints slapped on top.
I’m not sure how exactly this fits in to the discussion, but I feel it is worth mentioning that all plausible moral systems ascribe value to consequences. If you have two buttons where button A makes 100 people 10% happier, and button B makes 200 people 20% happier, and there are no other consequences, then any sane version of deontology/virtue ethics says it’s better to push button B.
So e.g. if your virtue ethics AI predictably causes bad consequences, then you can be a staunch virtue ethicist and still believe that this AI is bad.
Virtue ethics, by contrast, optimizes for looking like a virtuous agent rather than being effective at making good outcomes happen. An AI that is deeply committed to the virtue of honesty but doesn’t think carefully about the consequences of its actions is not one I’d want in charge of anything important.
This is an unfair read of the virtue ethics position. I’d also be against virtue ethics if it was optimizing for looking like one was virtuous. But the point of virtue ethics is to actually be virtuous, and the best way to live a virtue like, say, honesty, is thinking carefully about whether or not one is being honest in the situation.
I’m a committed consequentialist, so I would disagree regardless, but I also think the case against consequentialism and for virtue alignment presented here has some real flaws.
First, if you actually have values then thinking about consequences is just what it means to take those values seriously. Virtue ethics, by contrast, optimizes for looking like a virtuous agent rather than being effective at making good outcomes happen. An AI that is deeply committed to the virtue of honesty but doesn’t think carefully about the consequences of its actions is not one I’d want in charge of anything important.
Second the post treats it as a major downside that a consequentialist AI would come into conflict with humans who don’t share its values, but this is a sunk cost for any powerful AI. A virtue-aligned AI doesn’t escape this problem. Everyone loves “integrity” and “honor”, but when they become actual decisions, they’ll generate exactly the same backlash. It may be true that “there’s more agreement on virtues”, but this is superficial. People agree on the words but disagree enormously on how to apply them.
Third, in a world with many powerful AIs, the strategic landscape is ultimately determined by competition between AIs. I want the AI that shares my values to be the one that comes out on top in that competition. A virtue-aligned AI that’s committed to playing fair, being honest, and cooperating nicely is not well-positioned to win against a consequentialist AI that’s willing to do what it takes to achieve its goals.
I would sum up my position on the consequentialism vs virtue ethics debate by saying that virtue ethics is a theory about what makes individual agents admirable, but what really matters is whether making AI is an outcome we want to have happen, which brings us back to the traditional Yudkowsky view that any AI we are likely to build in the near future will be very bad for humanity. I am not as convinced as Soares and co., but that’s still an important thing to have in mind when considering alignment ideas in general.
In your first argument, it seems to me slightly like you are arguing against virtue based ethics under the assumption that consequentialism is true. So in your argument, the only real value may arise from good consequences (however those are defined), while for virtue based ethics (if I understand correctly) the value would arise from truly acting virtuously (whatever that means). In my mind, neither can really be true (it seems like a choice). However, framing it like this would allow for something like a reverse of your argument within the framework of virtue ethics and against consequentialism:
“If you actually have values then thinking about how to act is just taking these values seriously. Consequentialism, by contrast, optimizes for looking like you did the right thing based on the consequences of your actions rather than actually performing virtuous actions. An AI that is deeply committed to the consequence of inducing certain sensory experiences in a human but does not carefully think about which actions are actually virtuous is not one I’d want in charge of anything.”
(I’m deeply confused about anything with values/ethics, so it’s quite possible none of this makes sense.)
You’re right that my phrasing is a bit circular, and “looking like” vs “being” wasn’t the best way to draw the distinction, but I think there’s an asymmetry that makes the argument hard to reverse.
Maybe a concrete case helps? Would you want an AI that is unshakingly committed to honesty, integrity, and fairness, but doesn’t think hard about consequences, running the FAA? I think what we actually care about there is whether planes crash, not whether the leader has admirable character. The reversed version, “Would you want a cold consequentialist calculator running the FAA?”, sounds pretty good.
A cold consequentialist calculator ASI running the FAA, with the objective of preventing planes crashing, would destroy all planes, and all beings able to create planes.
That is a strawman view of consequentialism, not something that remotely passes the ideological turing test.
I’m confused. A “cold consequentialist calculator” sounds like a strawman consequentialist. Also, “an AI that is unshakingly committed to honesty, integrity, and fairness, but doesn’t think hard about consequences” sounds like a strawman virtue-aligned AI. It looked to me like you wanted to discuss a concrete case, with simplified strawman AIs, as an intuition pump to explain your views. The fact that this simplified case leads to genocide is relevant to my intuitions in this area.
I’m confused. You say that my comment didn’t pass “the ideological turing test”. It wasn’t trying to. That’s not how an Ideological Turing Test works.
My comment was not an attempt to explain a position. It’s not an attempt to pass an Ideological Turing Test. I agree that it doesn’t pass an Ideological Turing Test for your position. It also doesn’t pass an English Literature exam. It would pass an Ideological Turing Test for my position. It would also pass an Ideological Turing Test for committed consequentialists, because there are committed consequentialists who think that a consequentialist ASI would by default lead to human genocide. These are entirely compatible views.
I’m confused. Here’s your question again, relating to powerful AIs. It’s a good question.
In general, no, I would not, because genocide.
If you had further specified that the powerful AI had perfect alignment with human values, I would still not want it running the FAA, I would want it running the universe. I don’t expect this to be a practical option, and I’m not sure it’s theoretically possible. I could see the answer going either way.
Doesn’t seem like that to me. Virtue ethics means you wanna act virtuously. It doesn’t mean you think virtuous agents in general produce value, and you want to maximize this value. That’s just another version of consequentialism.
I’m a consequentialist. But if I was a virtue ethicist, what I’d care about when creating the AI would be whatever a virtuous person would want, which is not the same as wanting to create a virtuous AI. Maybe I think loyalty and compassion are very important virtues, and I think a loyal person would want to ensure the AI creates good lives for everyone (and doesn’t kill anyone), and the best way to do that is to make a consequentialist AI that maximizes for people being happy, maybe with some deontological constraints slapped on top.
I’m not sure how exactly this fits in to the discussion, but I feel it is worth mentioning that all plausible moral systems ascribe value to consequences. If you have two buttons where button A makes 100 people 10% happier, and button B makes 200 people 20% happier, and there are no other consequences, then any sane version of deontology/virtue ethics says it’s better to push button B.
So e.g. if your virtue ethics AI predictably causes bad consequences, then you can be a staunch virtue ethicist and still believe that this AI is bad.
> but I feel it is worth mentioning that all plausible moral systems ascribe value to consequences.
As pure forms, virtue ethics and deontology are not supposed to do that.
This is an unfair read of the virtue ethics position. I’d also be against virtue ethics if it was optimizing for looking like one was virtuous. But the point of virtue ethics is to actually be virtuous, and the best way to live a virtue like, say, honesty, is thinking carefully about whether or not one is being honest in the situation.