Outcome B: Progress in atomic AI alignment keeps up with progress in AI capability, but progress in social AI alignment doesn’t keep up. Transformative AI is aligned with a small fraction of the population, resulting in this minority gaining absolute power and abusing it to create an extremely inegalitarian future. Wars between different factions are also a concern.
It’s unclear to me how this particular outcome relates to social alignment (or at least to the kinds of research areas in this post). Some possibilities:
Does failure to solve social alignment mean that firms and governments cannot use AI to represent their shareholders and constituents? Why might that be? (E.g. what’s a plausible approach to atomic alignment that couldn’t be used by a firm or government?)
Does AI progress occur unevenly such that some group gets much more power/profit, and then uses that power? If so, how would technical progress on alignment help address that outcome? (Why would the group with power be inclined to use whatever techniques we’re imagining?) Also, why does this happen?
Does AI progress somehow complicate the problem of governance or corporate governance such that those organizations can no longer represent their constituents/shareholders? What is the mechanism (or any mechanism) by which this happens? Does social alignment help by making new forms of organization possible, and if so should I just be thinking of it as a way of improving those institutions, or is it somehow distinctive?
Do we already believe that the situation is gravely unequal (e.g. because governments can’t effectively represent their constituents and most people don’t have a meaningful amount of capital) and AI progress will exacerbate that situation? How does social alignment prevent that?
(This might make more sense as a question for the OP, it just seemed easier to engage with this comment since it describes a particular more concrete possibility. My sense is that the OP may be more concerned about failures in which no one gets what they want rather than outcome B per se.)
Outcome C is most naturally achieved using “direct democracy” TAI, i.e. one that collects inputs from everyone and aggregates them in a reasonable way. We can try emulating democratic AI via single user AI, but that’s hard because:
If the number of AIs is small, the AI interface becomes a single point of failure, an actor that can hijack the interface will have enormous power.
If the number of AIs is small, it might be unclear what inputs should be fed into the AI in order to fairly represent the collective. It requires “manually” solving the preference aggregation problem, and faults of the solution might be amplified by the powerful optimization to which it is subjected.
If the number of AIs is more than one then we should make sure the AIs are good at cooperating, which requires research about multi-AI scenarios.
If the number of AIs is large (e.g. one per person), we need the interface to be sufficiently robust that people can use it correctly without special training. Also, this might be prohibitively expensive.
Designing democratic AI requires good theoretical solutions for preference aggregation and the associated mechanism design problem, and good practical solutions for making it easy to use and hard to hack. Moreover, we need to get the politicians to implement those solutions. Regarding the latter, the OP argues that certain types of research can help lay the foundation by providing actionable regulation proposals.
My sense is that the OP may be more concerned about failures in which no one gets what they want rather than outcome B per se
Well, the OP did say:
(2) is essentially aiming to take over the world in the name of making it safer, which is not generally considered the kind of thing we should be encouraging lots of people to do.
I understood it as hinting at outcome B, but I might be wrong.
Outcome C is most naturally achieved using “direct democracy” TAI, i.e. one that collects inputs from everyone and aggregates them in a reasonable way. We can try emulating democratic AI via single user AI, but that’s hard because:
I’m not sure what’s most natural, but I do consider this a fairly unlikely way of achieving outcome C.
I think the best argument for this kind of outcome is from Wei Dai, but I don’t think it gets you close to the “direct democracy” outcome. (Even if you had state control and AI systems aligned with the state, it seems unlikely and probably undesirable for the state to be replaced with an aggregation procedure implemented by the AI itself.)
A lot depends on AI capability as a function of cost and time. On one extreme, there might enough rising returns to get a singleton: some combination of extreme investment and algorithmic advantage produces extremely powerful AI, moderate investment or no algorithmic advantage doesn’t produce moderately powerful AI. Whoever controls the singleton has all the power. On the other extreme, returns don’t rise much, resulting in personal AIs having as much or more collective power as corporate/government AIs. In the middle, there are many powerful AIs but still not nearly as many as people.
In the first scenario, to get outcome C we need the singleton to either be democratic by design, or have a very sophisticated and robust system of controlling access to it.
In the last scenario, the free market would lead to outcome B. Corporate and government actors use their access to capital to gain power through AI until the rest of the population becomes irrelevant. Effectively, AI serves as an extreme amplifier of per-existing power differentials. Arguably, the only way to get outcome C is enforcing democratization of AI through regulation. If this seems extreme, compare it to the way our society handles physical violence. The state has monopoly on violence, and with good reason: without this monopoly, upholding the law would be impossible. But, in the age of superhuman AI, traditional means of violence are irrelevant. The only important weapon is AI.
In the second scenario, we can manage without multi-user alignment. However, we still need to have multi-AI alignment, i.e. make sure the AIs are good at coordination problems. It’s possible that any sufficiently capable AI is automatically good at coordination problems, but it’s not guaranteed. (Incidentally, if atomic alignment is flawed then it might be actually better for the AIs to be bad at coordination.)
It’s unclear to me how this particular outcome relates to social alignment (or at least to the kinds of research areas in this post). Some possibilities:
Does failure to solve social alignment mean that firms and governments cannot use AI to represent their shareholders and constituents? Why might that be? (E.g. what’s a plausible approach to atomic alignment that couldn’t be used by a firm or government?)
Does AI progress occur unevenly such that some group gets much more power/profit, and then uses that power? If so, how would technical progress on alignment help address that outcome? (Why would the group with power be inclined to use whatever techniques we’re imagining?) Also, why does this happen?
Does AI progress somehow complicate the problem of governance or corporate governance such that those organizations can no longer represent their constituents/shareholders? What is the mechanism (or any mechanism) by which this happens? Does social alignment help by making new forms of organization possible, and if so should I just be thinking of it as a way of improving those institutions, or is it somehow distinctive?
Do we already believe that the situation is gravely unequal (e.g. because governments can’t effectively represent their constituents and most people don’t have a meaningful amount of capital) and AI progress will exacerbate that situation? How does social alignment prevent that?
(This might make more sense as a question for the OP, it just seemed easier to engage with this comment since it describes a particular more concrete possibility. My sense is that the OP may be more concerned about failures in which no one gets what they want rather than outcome B per se.)
Outcome C is most naturally achieved using “direct democracy” TAI, i.e. one that collects inputs from everyone and aggregates them in a reasonable way. We can try emulating democratic AI via single user AI, but that’s hard because:
If the number of AIs is small, the AI interface becomes a single point of failure, an actor that can hijack the interface will have enormous power.
If the number of AIs is small, it might be unclear what inputs should be fed into the AI in order to fairly represent the collective. It requires “manually” solving the preference aggregation problem, and faults of the solution might be amplified by the powerful optimization to which it is subjected.
If the number of AIs is more than one then we should make sure the AIs are good at cooperating, which requires research about multi-AI scenarios.
If the number of AIs is large (e.g. one per person), we need the interface to be sufficiently robust that people can use it correctly without special training. Also, this might be prohibitively expensive.
Designing democratic AI requires good theoretical solutions for preference aggregation and the associated mechanism design problem, and good practical solutions for making it easy to use and hard to hack. Moreover, we need to get the politicians to implement those solutions. Regarding the latter, the OP argues that certain types of research can help lay the foundation by providing actionable regulation proposals.
Well, the OP did say:
I understood it as hinting at outcome B, but I might be wrong.
I’m not sure what’s most natural, but I do consider this a fairly unlikely way of achieving outcome C.
I think the best argument for this kind of outcome is from Wei Dai, but I don’t think it gets you close to the “direct democracy” outcome. (Even if you had state control and AI systems aligned with the state, it seems unlikely and probably undesirable for the state to be replaced with an aggregation procedure implemented by the AI itself.)
A lot depends on AI capability as a function of cost and time. On one extreme, there might enough rising returns to get a singleton: some combination of extreme investment and algorithmic advantage produces extremely powerful AI, moderate investment or no algorithmic advantage doesn’t produce moderately powerful AI. Whoever controls the singleton has all the power. On the other extreme, returns don’t rise much, resulting in personal AIs having as much or more collective power as corporate/government AIs. In the middle, there are many powerful AIs but still not nearly as many as people.
In the first scenario, to get outcome C we need the singleton to either be democratic by design, or have a very sophisticated and robust system of controlling access to it.
In the last scenario, the free market would lead to outcome B. Corporate and government actors use their access to capital to gain power through AI until the rest of the population becomes irrelevant. Effectively, AI serves as an extreme amplifier of per-existing power differentials. Arguably, the only way to get outcome C is enforcing democratization of AI through regulation. If this seems extreme, compare it to the way our society handles physical violence. The state has monopoly on violence, and with good reason: without this monopoly, upholding the law would be impossible. But, in the age of superhuman AI, traditional means of violence are irrelevant. The only important weapon is AI.
In the second scenario, we can manage without multi-user alignment. However, we still need to have multi-AI alignment, i.e. make sure the AIs are good at coordination problems. It’s possible that any sufficiently capable AI is automatically good at coordination problems, but it’s not guaranteed. (Incidentally, if atomic alignment is flawed then it might be actually better for the AIs to be bad at coordination.)