There are, roughly speaking, two approaches to alignment that people are thinking about:
a) Ethical alignment: AI that, like Claude’s constitutional AI, understands what humans want, believe, and feel is moral, and wants to do that. One of the rather prominent human moral intuitions is a sense of fairness. This is incompatible with Gini coeficients of 1. AI aligned this way would regard creating a society with a Gini coefficient of 1 as failing in its ethical duty, and will not do that.
b) Do what I mean, not what I say alignment (a.k.a. the customer is always right — and when they’re not, do what they’d want if they weren’t confused, along the lines of Gieves from Gieves and Wooster). This will respect property rights etc., and will care about reducing Gini coefficients only when the current user is not part of the 1%.
Alignment is generally regarded as a hard problem. It’s possible that only one of these is practical. At a minimum, it’s likely that one of these gets invented before the other. Most billionaires are probably more interested in developing b).
I strongly suspect that building b) leads to a high chance of nuclear war. It’s possible that b) will be smart enough to point this out to anyone who builds it — whether they will listen is less clear. However, if they don’t, it might then very reasonably conclude that they were evidently confused, and perhaps even build a) for them. (Or perhaps I’m an idealist.) Or maybe it would just take both side’s nukes away.
I’m not sure what you mean: I expect that given any conflict scenario which has a potential for nuclear war, there is some outcome which ~all powerful people prefer to a nuclear war, even if their preferences are quite opposed, so why do you expect that building AI which enhances the ability of the powerful to achieve their preferences, including through negotiation, would lead to nuclear war?
Traditionally, nuclear war is predicted to be a result of incompetence, or of breakdown of negotiations, rather than malice, and I’m not sure why b) would make those things more likely.
ASI tends to introduce a strong “winner takes all” dynamic. At some point, the people who aren’t winning, but still have nukes and haven’t yet lost so decisively they can no longer use them, may decide they’re better off expressing their displeasure with a limited nuclear exchange — which may not stay limited. IMO, the combination of high-stakes winner-take-all competition and weapons of mass destruction is a volatile one, and that’s where b) leads. But I’m trying to peer past a singularity here: an ASI might find an easy counter that I’m not seeing.
Basically, I trust human morality — it’s a very heavily battle-tested system for finding acceptable compromises between conflicting parties and encouraging them to achieve mutually beneficial cooperation. So I think we’re better off with it than without it. I’m hoping even a type b) ASI would agree. But as I said, I might just be an idealist.
I’m going to say the quiet part out loud.
There are, roughly speaking, two approaches to alignment that people are thinking about:
a) Ethical alignment: AI that, like Claude’s constitutional AI, understands what humans want, believe, and feel is moral, and wants to do that. One of the rather prominent human moral intuitions is a sense of fairness. This is incompatible with Gini coeficients of 1. AI aligned this way would regard creating a society with a Gini coefficient of 1 as failing in its ethical duty, and will not do that.
b) Do what I mean, not what I say alignment (a.k.a. the customer is always right — and when they’re not, do what they’d want if they weren’t confused, along the lines of Gieves from Gieves and Wooster). This will respect property rights etc., and will care about reducing Gini coefficients only when the current user is not part of the 1%.
Alignment is generally regarded as a hard problem. It’s possible that only one of these is practical. At a minimum, it’s likely that one of these gets invented before the other. Most billionaires are probably more interested in developing b).
Contractualism vs Universalism, a tale as old as time, and unfortunately Universalism usually mostly loses...
I strongly suspect that building b) leads to a high chance of nuclear war. It’s possible that b) will be smart enough to point this out to anyone who builds it — whether they will listen is less clear. However, if they don’t, it might then very reasonably conclude that they were evidently confused, and perhaps even build a) for them. (Or perhaps I’m an idealist.) Or maybe it would just take both side’s nukes away.
I’m not sure what you mean: I expect that given any conflict scenario which has a potential for nuclear war, there is some outcome which ~all powerful people prefer to a nuclear war, even if their preferences are quite opposed, so why do you expect that building AI which enhances the ability of the powerful to achieve their preferences, including through negotiation, would lead to nuclear war?
Traditionally, nuclear war is predicted to be a result of incompetence, or of breakdown of negotiations, rather than malice, and I’m not sure why b) would make those things more likely.
ASI tends to introduce a strong “winner takes all” dynamic. At some point, the people who aren’t winning, but still have nukes and haven’t yet lost so decisively they can no longer use them, may decide they’re better off expressing their displeasure with a limited nuclear exchange — which may not stay limited. IMO, the combination of high-stakes winner-take-all competition and weapons of mass destruction is a volatile one, and that’s where b) leads. But I’m trying to peer past a singularity here: an ASI might find an easy counter that I’m not seeing.
Basically, I trust human morality — it’s a very heavily battle-tested system for finding acceptable compromises between conflicting parties and encouraging them to achieve mutually beneficial cooperation. So I think we’re better off with it than without it. I’m hoping even a type b) ASI would agree. But as I said, I might just be an idealist.