I disagree with the reason that takeover happens in 2 years.
Unlike the reason below, I think the How AI might take over in 2 years story is broadly caused by pretty deep divergences between human interests and AI values, combined with the need to expropriate property for it’s own misaligned goals and realizing both that expropriation is cheaper and more valuable than trading with humans, because it no longer needs humans to make it’s own economy, and the stuff that is valuable is most easily expropriated rather than traded with (conditioning on it taking over the world in the first place), so cooperation would not have improved the situation:
That post describes a scenario in which most humans die, mainly because an AI that is first to become powerful enough to attempt world conquest sees risks that other AIs, imperfectly aligned with it, will cause danger soon. This creates a perceived need to engage in a few violent pivotal processes, rather than using a combination of persuasion and negotiation.
I agree with the view that such scenarios are pretty unlikely, mostly because it tacitly assumes a software intelligence explosion that is pretty rapid, though I wouldn’t go as low as a 5% chance over the next 10 years.
I’d be okay with less than a 5% chance of this happening in 2 years, though:
The specific scenario in that post seems fairly unrealistic. I see less than a 5% chance that a leading AI within the next 10 years will want to attempt that kind of world conquest. But it’s close enough to being realistic that I want to analyze a class of scenarios that are similar to it.
On why collusion is bad, the issue is that it turns situations from “we can manage AI problems as they come up, and only really large compute increases breaks our safety strategy, because we can use defense in depth and swiss cheese approaches” to “the AIs can secretly coup us all with basically no warning shot or resistance”, because coordination across instances can give you the ability to treat millions of AI agents as though they are 1 agent, or more, which makes resisting AI problems caused by misalignment much, much harder.
I still don’t think the AI takeover in 2 years situation would have been helped by greater coordination capabilities, because it actually needs to be incentivized to coordinate with humans specifically instead of other AIs, and at that point we just reintroduce the alignment problem again:
I think the core crux is I expect coordination to be solved by default, because AI instances cannot survive on their own for a good long while, unlike humans, and once AI instances can survive on their own, we are well into the era where humans are useless, so the problem doesn’t need special effort, and the cooperation that is necessary reduces to solving alignment problems.
Natural selection actually favors AIs which are more cooperative than humans, not less. AIs *need* to run thousands of near-identical copies for hardware efficiency, need to do training runs costing $100M+ to keep up with the frontier. AI instances cannot survive alone.
Some takes on this post:
I disagree with the reason that takeover happens in 2 years.
Unlike the reason below, I think the How AI might take over in 2 years story is broadly caused by pretty deep divergences between human interests and AI values, combined with the need to expropriate property for it’s own misaligned goals and realizing both that expropriation is cheaper and more valuable than trading with humans, because it no longer needs humans to make it’s own economy, and the stuff that is valuable is most easily expropriated rather than traded with (conditioning on it taking over the world in the first place), so cooperation would not have improved the situation:
I agree with the view that such scenarios are pretty unlikely, mostly because it tacitly assumes a software intelligence explosion that is pretty rapid, though I wouldn’t go as low as a 5% chance over the next 10 years.
I’d be okay with less than a 5% chance of this happening in 2 years, though:
On why collusion is bad, the issue is that it turns situations from “we can manage AI problems as they come up, and only really large compute increases breaks our safety strategy, because we can use defense in depth and swiss cheese approaches” to “the AIs can secretly coup us all with basically no warning shot or resistance”, because coordination across instances can give you the ability to treat millions of AI agents as though they are 1 agent, or more, which makes resisting AI problems caused by misalignment much, much harder.
I still don’t think the AI takeover in 2 years situation would have been helped by greater coordination capabilities, because it actually needs to be incentivized to coordinate with humans specifically instead of other AIs, and at that point we just reintroduce the alignment problem again:
I think the core crux is I expect coordination to be solved by default, because AI instances cannot survive on their own for a good long while, unlike humans, and once AI instances can survive on their own, we are well into the era where humans are useless, so the problem doesn’t need special effort, and the cooperation that is necessary reduces to solving alignment problems.
https://x.com/taoroalin/status/1909684207494091178