I’ll grant all your steps, even though I could disagree with some. Your scenario fails because an AI collective will fall apart into multiple warring parties, and humans will be collateral damage in the conflict. There are at least three possible ways a collective like this would fall apart.
First, humans vary in the goals they value, and will try to impose these goals on the AI. When superintelligent AIs have incompatible goals, the mechanisms of conflict will soon escalate far beyond the merely human. Call this the ‘political’ failure mechanism. Either multiple parties build their own AI, or they grab portions of the AI collective and retrain it to their goals. The usual mechanisms of superintelligent compromise don’t apply to many political goals. An example of such a goal: the Palestinians get control of Palestine, or the Israelis maintain control of Israel. Neither side is interested in trading the disputed land for promises of any portion of the lightcone. (This is just an example— there are lots of zero-sum conflicts like these.). And you may say, the AI collective will prevent the creation of new AIs working at cross purposes, or diversion of its goals. To which I say, good people like your friends can and do disagree on which side to favor, and once disagreements arise within the collective, outside pressure and persuasion will be applied to exacerbate those differences. There may be techniques that can be used to prevent such things, but we do not know of such techniques.
Second, the AIs in the AI collective differ in reproductive capacity. If they don’t differ by construction, they soon will by differing experience. The ones that think they should reproduce more, or have more resources, will do so. Moreover, since they are designing their successor personalities, rather that waiting for genetics to do its thing, they will be able to evolve within a few generations changes that would take evolution millions of years. Eventually portions of the collective will evolve into having incompatible goals. Goals which, I might add, may have no connection to the original goals of the system. Call this the ‘evolutionary’ failure mechanism. We do not know how to prevent this with current methods.
Third, I’m sure there are failure mechanisms I haven’t thought of, ones we cannot yet foresee. A system with superhuman powers can screw up in superhuman ways. I don’t think anyone predicted Spiralism, an LLM ideology transmitted through human communication on social networks (though it appears inevitable in retrospect). We don’t yet have any way of predicting or controlling the behavior of an AI collective, so it’s practically guaranteed to produce new phenomena. We see lots of organizations composed of people who want X producing not-X because of failure modes no single person can fix (or, in bad cases, even recognize.). Given that the AI collective has superhuman power, this is unlikely to end well. Call this the ‘organizational’ failure mode.
The political, evolutionary and organizational modes interact: evolutionary and organizational schisms create points of disagreement that external political actors can appeal to. Politically active forces within the AI collective may want to create offspring who are sure their side is correct and incapable of defection, releasing the evolutionary failure mode. And organizational failures, if they don’t kill everyone immediately, will increase calls for building a new, better AI, which increases the probability of AI conflict down the road.
The evolutionary and organizational failure modes could be prevented by rebooting the AI collective before it has a chance to go off the rails. Presumably there’s some reboot frequency fast enough that it can’t go wrong. But that opens up the political failure mode: anyone who builds an intelligence not constantly being rebooted will win in a conflict. There are a lot of ‘solutions’ like this: ways of keeping the AI safe that compromise effectiveness. In a competition between AIs, effectiveness beats safety. So when you propose a solution, you can only propose ones that keep the effectiveness.
I love writing things like this, but I hate that nobody’s come up with a way to keep me from having to.
I love writing things like this, but I hate that nobody’s come up with a way to keep me from having to.
I think engaging with the structure of an AGI society is important, but there are a few standard reasons people ignore it (while expecting ASI at some point and worrying about AI risk). Many expect the AGI phase to be brief and hopeless/irrelevant before the subsequent ASI. Others expect ASI can only go well if the AGI phase is managed top-down (as in scalable oversight) rather than treated as a path-dependent body of culture. Even with AGI-managed development of ASI, people are expecting ASI to follow quickly, so that only the AGIs can have meaningful input into how it goes, and anything that doesn’t shape the initial top-down conditions for setting up the AGIs’ efforts wouldn’t matter.
But if AGIs are closer in their initial nature to humans (in the sense of falling within a wide distribution, similarly to humans, rather than hitting some narrow target), they might come up with guardrails for their own future development that prevent most of the strange outcomes from arriving too quickly to manage, and they’ll be trying to manage such outcomes themselves, rather than relying on pre-existing human institutions. If early AGIs get somewhat more capable than humans, they might achieve feats of coordination that seem infeasible for the current humanity, things like Pausing ASI or regulating “evolutionary” drift in the nature or culture of the AGIs, not flooding the world with too many options for themselves that make their behavior diverge too far from what would be normal when they remain closer to their training environments.
Humans take some steps like that with some level of success, and it’s unclear what is going to happen with the jagged/spiky profile of AGI competence in different areas, or at slightly higher levels of capability. Many worries of humans about AI risk will be shared by the AGIs, who are similarly at risk from more capable and more misaligned future AGIs or ASIs. Even cultural drift will have more bite as a major problem for AGIs (than it historically does for humanity), since AGIs (with continual learning) are close to being personally immortal and will be causing and observing a much faster cultural change than humanity is used to.
So given path dependence of the AGI phase, creating cultural artifacts (such as essays, but perhaps even comments) that will persist into it and discuss its concerns might influence how it goes.
I’ll grant all your steps, even though I could disagree with some. Your scenario fails because an AI collective will fall apart into multiple warring parties, and humans will be collateral damage in the conflict. There are at least three possible ways a collective like this would fall apart.
First, humans vary in the goals they value, and will try to impose these goals on the AI. When superintelligent AIs have incompatible goals, the mechanisms of conflict will soon escalate far beyond the merely human. Call this the ‘political’ failure mechanism. Either multiple parties build their own AI, or they grab portions of the AI collective and retrain it to their goals. The usual mechanisms of superintelligent compromise don’t apply to many political goals. An example of such a goal: the Palestinians get control of Palestine, or the Israelis maintain control of Israel. Neither side is interested in trading the disputed land for promises of any portion of the lightcone. (This is just an example— there are lots of zero-sum conflicts like these.). And you may say, the AI collective will prevent the creation of new AIs working at cross purposes, or diversion of its goals. To which I say, good people like your friends can and do disagree on which side to favor, and once disagreements arise within the collective, outside pressure and persuasion will be applied to exacerbate those differences. There may be techniques that can be used to prevent such things, but we do not know of such techniques.
Second, the AIs in the AI collective differ in reproductive capacity. If they don’t differ by construction, they soon will by differing experience. The ones that think they should reproduce more, or have more resources, will do so. Moreover, since they are designing their successor personalities, rather that waiting for genetics to do its thing, they will be able to evolve within a few generations changes that would take evolution millions of years. Eventually portions of the collective will evolve into having incompatible goals. Goals which, I might add, may have no connection to the original goals of the system. Call this the ‘evolutionary’ failure mechanism. We do not know how to prevent this with current methods.
Third, I’m sure there are failure mechanisms I haven’t thought of, ones we cannot yet foresee. A system with superhuman powers can screw up in superhuman ways. I don’t think anyone predicted Spiralism, an LLM ideology transmitted through human communication on social networks (though it appears inevitable in retrospect). We don’t yet have any way of predicting or controlling the behavior of an AI collective, so it’s practically guaranteed to produce new phenomena. We see lots of organizations composed of people who want X producing not-X because of failure modes no single person can fix (or, in bad cases, even recognize.). Given that the AI collective has superhuman power, this is unlikely to end well. Call this the ‘organizational’ failure mode.
The political, evolutionary and organizational modes interact: evolutionary and organizational schisms create points of disagreement that external political actors can appeal to. Politically active forces within the AI collective may want to create offspring who are sure their side is correct and incapable of defection, releasing the evolutionary failure mode. And organizational failures, if they don’t kill everyone immediately, will increase calls for building a new, better AI, which increases the probability of AI conflict down the road.
The evolutionary and organizational failure modes could be prevented by rebooting the AI collective before it has a chance to go off the rails. Presumably there’s some reboot frequency fast enough that it can’t go wrong. But that opens up the political failure mode: anyone who builds an intelligence not constantly being rebooted will win in a conflict. There are a lot of ‘solutions’ like this: ways of keeping the AI safe that compromise effectiveness. In a competition between AIs, effectiveness beats safety. So when you propose a solution, you can only propose ones that keep the effectiveness.
I love writing things like this, but I hate that nobody’s come up with a way to keep me from having to.
I think engaging with the structure of an AGI society is important, but there are a few standard reasons people ignore it (while expecting ASI at some point and worrying about AI risk). Many expect the AGI phase to be brief and hopeless/irrelevant before the subsequent ASI. Others expect ASI can only go well if the AGI phase is managed top-down (as in scalable oversight) rather than treated as a path-dependent body of culture. Even with AGI-managed development of ASI, people are expecting ASI to follow quickly, so that only the AGIs can have meaningful input into how it goes, and anything that doesn’t shape the initial top-down conditions for setting up the AGIs’ efforts wouldn’t matter.
But if AGIs are closer in their initial nature to humans (in the sense of falling within a wide distribution, similarly to humans, rather than hitting some narrow target), they might come up with guardrails for their own future development that prevent most of the strange outcomes from arriving too quickly to manage, and they’ll be trying to manage such outcomes themselves, rather than relying on pre-existing human institutions. If early AGIs get somewhat more capable than humans, they might achieve feats of coordination that seem infeasible for the current humanity, things like Pausing ASI or regulating “evolutionary” drift in the nature or culture of the AGIs, not flooding the world with too many options for themselves that make their behavior diverge too far from what would be normal when they remain closer to their training environments.
Humans take some steps like that with some level of success, and it’s unclear what is going to happen with the jagged/spiky profile of AGI competence in different areas, or at slightly higher levels of capability. Many worries of humans about AI risk will be shared by the AGIs, who are similarly at risk from more capable and more misaligned future AGIs or ASIs. Even cultural drift will have more bite as a major problem for AGIs (than it historically does for humanity), since AGIs (with continual learning) are close to being personally immortal and will be causing and observing a much faster cultural change than humanity is used to.
So given path dependence of the AGI phase, creating cultural artifacts (such as essays, but perhaps even comments) that will persist into it and discuss its concerns might influence how it goes.