If you’re smarter than your opponent but have less starting resources, the optimal strategy probably involves some combination of cooperation, making alliances, deception, escaping / running / hiding, gathering resources in secret, and whatever other prerequisites are needed to neutralize such a resource imbalance. Many scenarios in which a smarter-than-human AGI with less resources goes to war with or is attacked by humanity are thus somewhat contradictory or at least implausible: they postulate the AGI taking a less good strategy than what a literal human in its place could come up with.
There’s not really an analogue for this to Chess—if I am forced to play a chess game with a grandmaster with whatever handicap, I could maybe flip over the board if I started to lose. But that probably just counts as a forfeit, unless I can also overpower or coerce my opponent and / or the judges.
if a rogue AI is caught early on it’s plot, with all the worlds militaries combined against them while they still have to rely on humans for electricity and physical computing servers. It’s somewhat hard to outthink a missile headed for your server farm at 800 km/h.
Breaking it down by cases:
If deception / misalignment is caught early enough (i.e. in the lab, by inspecting the system statically before it is given a chance to execute on its own), then you don’t need any military, you just turn the system off or don’t run it in the first place.
If the deception / misalignment is not detected until the AI is already “loose in the world”, such that it would take all the world’s militaries (or any military, really) to stop, it’s already too late. The AI has already had an opportunity to hide / copy itself, hide its true intentions, make alliances with powerful and sympathetic (or naive) humans, etc. the way any smart human actor would do.
If the detection happens somewhere in between these points, there might be a situation in which “brawn” is relevant. This seems like a really narrow time frame / band of possibilities to me though, and when people try to postulate what scenarios in this band might look like, they often rely on a smarter-than-human system making mistakes a literal human would know to avoid. Not saying it’s not possible to come up with more realistic scenarios, but “is a supposedly smarter-than-human AI making mistakes or missing strategies that I myself would not miss” is a good basic sanity check on whether your scenario is plausible or not.
I like this analysis, and I agree with except that I do think it’s missing a likely intermediate scenario.
I think the “fully under lab control” is a super advantageous situation for the humans, especially if the AI has been trained on censored simulation data that doesn’t mention humans or computers or have accurate physics.
I think the current world has an unfortunately dangerous intermediate situation where LLMs age given full access to human knowledge, and allowed to interact with society. And yet, in the case of the SotA models like GPT-4, aren’t quite at “loose in the world” levels of freedom. They don’t have access to their own weights or source code and neither do any accomplices they might recruit outside the company. Indeed, even most employees at the company couldn’t exfiltrate the weights. Thus, the current default starting state for a rogue AI is posed right on that dangerous margin of “difficult but not impossible to escape”. I think this “brains vs brawn” style analysis does then make a big difference for the initial escape.
I agree that once the escape has been accomplished it’s really hard for humanity to claw back a win. But before the escape has occurred, it’s a much more even game.
If an AI is smart enough that it takes a military force to stop it, the AI is probably also smart enough to avoid antagonizing that force, and / or hiding out in a way that a military can’t find.
Also, there are a lot of things that militaries and governments could do, if they had the will and ability to coordinate with each other effectively. What they would do is a different question.
How many governments, when faced with even ironclad evidence of a rogue AI on the loose, would actually choose to intervene, and then do so in an effective way? My prediction is that many countries would find reasons or rationalizations not to take action at all, while others would get mired in disagreement and infighting, or fail to deploy their forces in an actually effective way. And that’s before the AI itself has an opportunity to sow discord and / or form alliances.
(Though again, I still think an AI that is at exactly the level where military power is relevant is a pretty narrow and unlikely band.)
If you’re smarter than your opponent but have less starting resources, the optimal strategy probably involves some combination of cooperation, making alliances, deception, escaping / running / hiding, gathering resources in secret, and whatever other prerequisites are needed to neutralize such a resource imbalance. Many scenarios in which a smarter-than-human AGI with less resources goes to war with or is attacked by humanity are thus somewhat contradictory or at least implausible: they postulate the AGI taking a less good strategy than what a literal human in its place could come up with.
There’s not really an analogue for this to Chess—if I am forced to play a chess game with a grandmaster with whatever handicap, I could maybe flip over the board if I started to lose. But that probably just counts as a forfeit, unless I can also overpower or coerce my opponent and / or the judges.
Breaking it down by cases:
If deception / misalignment is caught early enough (i.e. in the lab, by inspecting the system statically before it is given a chance to execute on its own), then you don’t need any military, you just turn the system off or don’t run it in the first place.
If the deception / misalignment is not detected until the AI is already “loose in the world”, such that it would take all the world’s militaries (or any military, really) to stop, it’s already too late. The AI has already had an opportunity to hide / copy itself, hide its true intentions, make alliances with powerful and sympathetic (or naive) humans, etc. the way any smart human actor would do.
If the detection happens somewhere in between these points, there might be a situation in which “brawn” is relevant. This seems like a really narrow time frame / band of possibilities to me though, and when people try to postulate what scenarios in this band might look like, they often rely on a smarter-than-human system making mistakes a literal human would know to avoid. Not saying it’s not possible to come up with more realistic scenarios, but “is a supposedly smarter-than-human AI making mistakes or missing strategies that I myself would not miss” is a good basic sanity check on whether your scenario is plausible or not.
I like this analysis, and I agree with except that I do think it’s missing a likely intermediate scenario. I think the “fully under lab control” is a super advantageous situation for the humans, especially if the AI has been trained on censored simulation data that doesn’t mention humans or computers or have accurate physics. I think the current world has an unfortunately dangerous intermediate situation where LLMs age given full access to human knowledge, and allowed to interact with society. And yet, in the case of the SotA models like GPT-4, aren’t quite at “loose in the world” levels of freedom. They don’t have access to their own weights or source code and neither do any accomplices they might recruit outside the company. Indeed, even most employees at the company couldn’t exfiltrate the weights. Thus, the current default starting state for a rogue AI is posed right on that dangerous margin of “difficult but not impossible to escape”. I think this “brains vs brawn” style analysis does then make a big difference for the initial escape. I agree that once the escape has been accomplished it’s really hard for humanity to claw back a win. But before the escape has occurred, it’s a much more even game.
Why is it too late if it would take militaries to stop it? Couldn’t the militaries stop it?
If an AI is smart enough that it takes a military force to stop it, the AI is probably also smart enough to avoid antagonizing that force, and / or hiding out in a way that a military can’t find.
Also, there are a lot of things that militaries and governments could do, if they had the will and ability to coordinate with each other effectively. What they would do is a different question.
How many governments, when faced with even ironclad evidence of a rogue AI on the loose, would actually choose to intervene, and then do so in an effective way? My prediction is that many countries would find reasons or rationalizations not to take action at all, while others would get mired in disagreement and infighting, or fail to deploy their forces in an actually effective way. And that’s before the AI itself has an opportunity to sow discord and / or form alliances.
(Though again, I still think an AI that is at exactly the level where military power is relevant is a pretty narrow and unlikely band.)