I don’t think it’s possible to align AGI with democracy. AGI, or at least ASI, is an inherently political technology. The power structures that ASI creates within a democratic system would likely destroy the system from within. Whichever group would end up controlling an ASI would get decisive strategic advantage over everyone else within the country, which would undermine the checks and balances that make democracy a democracy.
To steelman a devil’s advocate: If your intent-aligned AGI/ASI went something like
oh, people want the world to be according to their preferences but whatever normative system one subscribes to, the current implicit preference aggregation method is woefully suboptimal, so let me move the world’s systems to this other preference aggregation method which is much more nearly-Pareto-over-normative-uncertainty-optimal than the current preference aggregation method
and this would be, in an important sense, more democratic, because the people (/demos) would have more influence over their societies.
Yeah, I can see why that’s possible. But I wasn’t really talking about the improbable scenario where ASI would be aligned to the whole of humanity/country, but about a scenario where ASI is ‘narrowly aligned’ in the sense that it’s aligned to its creators/whoever controls it when it’s created. This is IMO much more likely to happen since technologies are not created in a vacuum.
I think it’s pretty straightforward to define what it would mean to align AGI with what democracy actually is supposed to be (the aggregate of preferences of the subjects, with an equal weighting for all) but hard to align it with the incredibly flawed american implementation of democracy, if that’s what you mean?
The american system cannot be said to represent democracy well. It’s intensely majoritarian at best, feudal at worst (since the parties stopped having primaries), indirect and so prone to regulatory capture, inefficent and opaque. I really hope no one’s taking it as their definitional example of democracy.
No, I wasn’t really talking about any specific implementation of democracy. My point was that, given the vast power that ASI grants to whoever controls it, the traditional checks and balances would be undermined.
Now, regarding your point that aligning AGI with what democracy is actually supposed to be, I have two objections:
To me, it’s not clear at all why it would be straightforward to align AGI with some ‘democratic ideal’. Arrow’s impossibility theorem shows that no perfect voting system exists, so an AGI trying to implement the “perfect democracy” will eventually have to make value judgments about which democratic principles to prioritize (although I do think that an AGI could, in principle, help us find ways to improve upon our democracies).
Even if aligning AGI with democracy would in principle be possible, we need to look at the political reality the technology will emerge from. I don’t think it’s likely that whichever group that would end up controlling AGI would willingly want to extend its alignment to other groups of people.
2: I think you’re probably wrong about the political reality of the groups in question. To not share AGI with the public is a bright line. For most of the leading players it would require building a group of AI researchers within the company who are all implausibly willing to cross a line that says “this is straight up horrible, evil, illegal, and dangerous for you personally”, while still being capable enough to lead the race, while also having implausible levels of mutual trust that no one would try to cut others out of the deal at the last second (despite the fact that the group’s purpose is cutting most of humanity out of the deal), to trust that no one would back out and whistleblow, and it also requires an implausible level of secrecy to make sure state actors wont find out.
It would require a probably actually impossible cultural discontinuity and organization structure.
It’s more conceivable to me that a lone CEO might try to do it via a backdoor. Something that mostly wasn’t built on purpose and that no one else in the company are cognisant could or would be used that way. But as soon as the conspiracy consists of more than one person...
I think there are several potential paths of AGI leading to authoritarianism.
For example consider AGI in military contexts: people might be unwilling to let it make very autonomous decisions, and on that basis, military leaders could justify that these systems be loyal to them even in situations where it would be good for the AI to disobey orders.
Regarding your point about requirement of building a group of AI researchers, these researchers could be AIs themselves. These AIs could be ordered to make future AI systems secretly loyal to the CEO. Consider e.g. this scenario (from Box 2 in Forethought’s new paper):
In 2030, the US government launches Project Prometheus—centralising frontier AI development and compute under a single authority. The aim: develop superintelligence and use it to safeguard US national security interests. Dr. Nathan Reeves is appointed to lead the project and given very broad authority.
After developing an AI system capable of improving itself, Reeves gradually replaces human researchers with AI systems that answer only to him. Instead of working with dozens of human teams, Reeves now issues commands directly to an army of singularly loyal AI systems designing next-generation algorithms and neural architectures.
Approaching superintelligence, Reeves fears that Pentagon officials will weaponise his technology. His AI advisor, to which he has exclusive access, provides the solution: engineer all future systems to be secretly loyal to Reeves personally.
Reeves orders his AI workforce to embed this backdoor in all new systems, and each subsequent AI generation meticulously transfers it to its successors. Despite rigorous security testing, no outside organisation can detect these sophisticated backdoors—Project Prometheus’ capabilities have eclipsed all competitors. Soon, the US military is deploying drones, tanks, and communication networks which are all secretly loyal to Reeves himself.
When the President attempts to escalate conflict with a foreign power, Reeves orders combat robots to surround the White House. Military leaders, unable to countermand the automated systems, watch helplessly as Reeves declares himself head of state, promising a “more rational governance structure” for the new era.
Relatedly, I’m curious what you think of that paper and the different scenarios they present.
You could regard carefully controlling one’s expression of one’s utility function as being like a vote, and so subject to that blight of strategic voting, in general people have an incentive to understate their preferences about scenarios they consider unlikely/vice versa, which influences the probability of those outcomes in unpredictable ways and fouls their strategy, or to understate valuations when buying and overstate when selling, this may add up to a game that cannot be played well, a coordination problem, outcomes no one wanted.
But I don’t think humans are all that guileful about how they express their utility function. Most of them have never actually expressed a utility function before, it’s not easy to do, it’s not like checking a box on a list of 20 names. People know it’s a game that can barely be played even in ordinary friendships, people don’t know how to lie strategically about their preferences to the youtube recommender system, let alone their neural lace.
I don’t think it’s possible to align AGI with democracy. AGI, or at least ASI, is an inherently political technology. The power structures that ASI creates within a democratic system would likely destroy the system from within. Whichever group would end up controlling an ASI would get decisive strategic advantage over everyone else within the country, which would undermine the checks and balances that make democracy a democracy.
To steelman a devil’s advocate: If your intent-aligned AGI/ASI went something like
and this would be, in an important sense, more democratic, because the people (/demos) would have more influence over their societies.
Yeah, I can see why that’s possible. But I wasn’t really talking about the improbable scenario where ASI would be aligned to the whole of humanity/country, but about a scenario where ASI is ‘narrowly aligned’ in the sense that it’s aligned to its creators/whoever controls it when it’s created. This is IMO much more likely to happen since technologies are not created in a vacuum.
I think it’s pretty straightforward to define what it would mean to align AGI with what democracy actually is supposed to be (the aggregate of preferences of the subjects, with an equal weighting for all) but hard to align it with the incredibly flawed american implementation of democracy, if that’s what you mean?
The american system cannot be said to represent democracy well. It’s intensely majoritarian at best, feudal at worst (since the parties stopped having primaries), indirect and so prone to regulatory capture, inefficent and opaque. I really hope no one’s taking it as their definitional example of democracy.
No, I wasn’t really talking about any specific implementation of democracy. My point was that, given the vast power that ASI grants to whoever controls it, the traditional checks and balances would be undermined.
Now, regarding your point that aligning AGI with what democracy is actually supposed to be, I have two objections:
To me, it’s not clear at all why it would be straightforward to align AGI with some ‘democratic ideal’. Arrow’s impossibility theorem shows that no perfect voting system exists, so an AGI trying to implement the “perfect democracy” will eventually have to make value judgments about which democratic principles to prioritize (although I do think that an AGI could, in principle, help us find ways to improve upon our democracies).
Even if aligning AGI with democracy would in principle be possible, we need to look at the political reality the technology will emerge from. I don’t think it’s likely that whichever group that would end up controlling AGI would willingly want to extend its alignment to other groups of people.
2: I think you’re probably wrong about the political reality of the groups in question. To not share AGI with the public is a bright line. For most of the leading players it would require building a group of AI researchers within the company who are all implausibly willing to cross a line that says “this is straight up horrible, evil, illegal, and dangerous for you personally”, while still being capable enough to lead the race, while also having implausible levels of mutual trust that no one would try to cut others out of the deal at the last second (despite the fact that the group’s purpose is cutting most of humanity out of the deal), to trust that no one would back out and whistleblow, and it also requires an implausible level of secrecy to make sure state actors wont find out.
It would require a probably actually impossible cultural discontinuity and organization structure.
It’s more conceivable to me that a lone CEO might try to do it via a backdoor. Something that mostly wasn’t built on purpose and that no one else in the company are cognisant could or would be used that way. But as soon as the conspiracy consists of more than one person...
I think there are several potential paths of AGI leading to authoritarianism.
For example consider AGI in military contexts: people might be unwilling to let it make very autonomous decisions, and on that basis, military leaders could justify that these systems be loyal to them even in situations where it would be good for the AI to disobey orders.
Regarding your point about requirement of building a group of AI researchers, these researchers could be AIs themselves. These AIs could be ordered to make future AI systems secretly loyal to the CEO. Consider e.g. this scenario (from Box 2 in Forethought’s new paper):
Relatedly, I’m curious what you think of that paper and the different scenarios they present.
1: The best approach to aggregating preferences doesn’t involve voting systems.
You could regard carefully controlling one’s expression of one’s utility function as being like a vote, and so subject to that blight of strategic voting, in general people have an incentive to understate their preferences about scenarios they consider unlikely/vice versa, which influences the probability of those outcomes in unpredictable ways and fouls their strategy, or to understate valuations when buying and overstate when selling, this may add up to a game that cannot be played well, a coordination problem, outcomes no one wanted.
But I don’t think humans are all that guileful about how they express their utility function. Most of them have never actually expressed a utility function before, it’s not easy to do, it’s not like checking a box on a list of 20 names. People know it’s a game that can barely be played even in ordinary friendships, people don’t know how to lie strategically about their preferences to the youtube recommender system, let alone their neural lace.