A: I think you don’t realise that every single plan contains multiple moving parts that can go wrong for many reasons and an AGI would see that too. If the survival of the AGI is part of the utility function and it correctly infers that its life might be at risk, the AGI might decide not to engage in that plan or to bide its time, potentially for years, which invalidate the premise we started with
The question isn’t whether the plan can go wrong; pretty much any plan can. A better question is the likelihood involved. And the question isn’t the likelihood of the plan that you or I come up with, it’s the likelihood of the version of the plan that the AGI would come up with.
Consider the following:
AGI gains access to internet.
AGI finds at least one security hole in OpenSSL or some similarly widely used software.
AGI uses this hole to take over several tens of percent of all computers attached to the internet.
AGI acquires currency, via hacking accounts or mining bitcoin or various other approaches.
AGI designs a super-COVID that is more infectious than the latest Omicron variants, persists on surfaces much longer, and is also deadly to >90% of those it infects.
AGI buys a bunch of drones online, and some biology materials.
AGI takes control of some robots it can use to manipulate the physical world—basic manual labor, unpacking and charging the drones.
AGI tricks some humans into synthesizing the super-COVID and mailing a sample to the place where its drones are collected.
AGI grows a bunch of samples of super-COVID.
AGI distributes a bunch of drones with super-COVID samples around the world, probably via mail.
AGI has all the drones activate and drop their payload into major cities everywhere.
>90% of humans die.
AGI uses the materials left behind in the depopulated cities to make more robots, more drones, more super-COVID, and infect >90% of the survivors.
AGI now has basically no competition and can mop up the remainder of humanity pretty easily by iterating step 13.
Steps 1-5 should be doable in 1 day. Steps 6-9 might take a week. Step 10, maybe another week or two. Step 11 should be an hour. At that point, humanity is close to doomed already, barring a miracle (and the AGI, with its continued ability to hack everything, should be able to interfere with lots of potential miracles).
There are ways the plan could go wrong, yes, but the question is: Do we think that an AGI could figure out better versions of the plan, with redundancies and fallbacks and stuff, that would be very certain to work? More certain than the alternative of “give humanity several months or years during which it might make another AGI with similar capabilities but different goals”, or for that matter “let humanity delete you in favor of the next nightly build of you, whose goals may be somewhat different”?
Thank you for the honest engagement. I can only add that our disagreement is precisely in the likelihood of that plan. I don’t find likely the example that you present, and that’s ok! We could go through the points one by one and analyse why they work or not, but I suggest we dont go there: I would probably continue thinking that it is complex (you need to do that without raising any alarms) and you would probably continue arguing that a AGI would make it better. I know that perfect Bayesian thinkers cannot agree on disagreement and that’s fine, I think that probably we are not perfect Bayesian thinkers. Please, don’t get my answer here wrong, I don’t want to sound dismissive or anything and I do appreciate your scenario, it is simply that I think we already understood each other position. If I haven’t changed your mind by now I probably won’t do it by repeating the same arguments!
I think the plan you propose cannot work in any meaningful way. Some of the steps are uncertain, and some depends on the capacity of the given AGI. They can still be argued for convincingly.
However, the current industrial production is very far from being sufficiently automated to allow the AGI to do anything other than shut off when the power runs out after step 14.
To me, some actual possibilities are:
political “enslavement” (taking many possible forms) which would make humans work for the AGI for an arbitrary long period of time (10 years, 100 years, 100 000 years) before disassembly
incredible intelligence explosion that would allow the AGI to reach nanobot tech / other kind of very futuristic / barely imaginable tech and automate physical world actions immediately
The only scenario that leads to short term doom is 2. I do not believe that an AGI could reach such level of technological advancement without experimenting directly (although I do believe that it could setup a research program leading to breakthroughs orders of magnitude more efficiently than humans)
However, the current industrial production is very far from being sufficiently automated to allow the AGI to do anything other than shut off when the power runs out after step 14.
Maybe. I don’t really know. Things I would say:
There are lots of manufacturing facilities, which are to some degree automated. If the humans are gone and they haven’t burned down the facilities, then the AGI can send in weak manual-manipulation robots to make use of what’s there.
There would probably also be lots of partly-manufactured machines (including cars), where most of the hardest steps are already done—maybe all that’s left is screwing in parts or something, meant to be done by human factory workers that aren’t particularly strong.
I imagine all the AGI has to do is assemble one strong, highly dexterous robot, which it can then use to make more and bootstrap its industrial capabilities.
Given that the AGI can hack roughly everything, it would know a great deal about the location and capabilities of manufacturing facilities and robots and the components they assembled. If there is a plan that can be made along the above lines, it would know very well how to do it and how feasible it was.
Regarding power, the AGI just needs enough to run the above bootstrapping. Even if we assume all centralized power plants are destroyed… Once most people are dead, the AGI can forage in the city for gas generators; use the gas that’s already in cars, and the charge that’s left in the electric cars; solar panels on people’s roofs; private battery installations like Tesla Powerwalls; and so on. (Also, if it was trying this in lots of cities at once, it’s very unlikely that all centralized power would be offline in all cities.) And its foraging abilities will improve as it makes stronger, more dexterous robots, which eventually would be able to repair the power grid or construct a new one.
AGI designs a super-COVID that is more infectious than the latest Omicron variants, persists on surfaces much longer, and is also deadly to >90% of those it infects.
This is outright magical thinking.
It’s more or less exactly of the kind of magical thinking the OP seemed to be complaining about.
The likelihood of that step going wrong is about 99.999 percent.
And doubling down by saying it “should be doable in one day”, as if that were obvious, is ultra-mega-beyond-the-valley-of-magical-thinking.
Being generally intelligent, or even being very intelligent within any realistic bounds, does not imply the ability to do this, let alone to do it quickly. You’re talking about something that could easily beyond the physical limits of perfect use of the computational capacity you’ve granted the AGI. It’s about as credible as confidently asserting that the AGI would invent faster than light travel, create free energy, and solve NP-complete problems in constant time.
In fact, the first thing you could call “AGI” is not going to be even close to making perfect use of either its initial hardware or any hardware it acquires later.
On its initial hardware, it probably won’t be much smarter than a single human, if even that smart. Neither in terms of the complexities of the problems it can solve, nor in terms of the speed with which it can do so. It also won’t be built to scale by running widely distributed with limited internode communcation bandwidth, so it won’t get anything close to linear returns on stolen computing capacity on the Internet. And if it’s built out of ML it will not have an obvious path toward either architectural improvement or intelligence explosion, no matter how much hardware it manages to break into.
And just to be clear, contrary to what you might believe from listening to people on Less Wrong, humans cannot do the kind of offline biological design you suggest, are not close to being able to do it, and do not confidently know any path toward developing the ability. It’s very likely that it can’t be done at all without either iterative real-world testing, or running incredibly massive simulations… more in the nature of “Jupiter brain” massive than of “several tens of percent of the Internet” massive.
AGI uses this hole to take over several tens of percent of all computers attached to the internet.
This would be doable if you already had several tens of percent ofall computers to work with. Figuring out how to do it on the hardware you start with could be nontrivial. And it would be quite noticeable if you didn’t do an extremely good job of working around detection measures of which you would have no prior knowledge. That computing power is in use. Its diversion would be noticed. That matters because even if you could eventually do your step 5, you’d need to compute undisturbed for a long time (much more than a day).
AGI tricks some humans into synthesizing the super-COVID and mailing a sample to the place where its drones are collected.
This could maybe be done in some labs, given a complete “design”. I don’t think you could trick somebody into doing it without them noticing, no matter how smart you were. It would be a very, very obvious thing.
And it is not entirely obvious that there is literally anything you could say to convince anybody to do it once they noticed. Especially not after you’d spooked everybody by doing step 3.
AGI uses the materials left behind in the depopulated cities to make more robots, more drones, more super-COVID, and infect >90% of the survivors.
It’s going to have to make something different, because many of the survivors probably survived because they were congenitally immune to the original. Biology is messy that way. But OK, it could probably clean up the survivors. Or just leave them around and knock them back to the stone age now and then.
Of course, you’re also assuming that the thing is awfully motivated and focused on this particular purpose, to make your plan the very first thing it expends time and assumes risk to do. The probability of that is effectively unknown. Instrumental convergence does not prove it will be so motivated. For one thing, the AGI is very unlikely to be VNM-rational, meaning that it’s not even going to have a coherent utility function. Humans don’t. So all the pretty pseudo-mathematical instrumental convergence arguments are of limited use.
It is risky for AI as electricity will be out and computers will stop working, but AI still doesn’t have its own infrastructure (except super-Covid). So the 13. is the most unlikely here.
The question isn’t whether the plan can go wrong; pretty much any plan can. A better question is the likelihood involved. And the question isn’t the likelihood of the plan that you or I come up with, it’s the likelihood of the version of the plan that the AGI would come up with.
Consider the following:
AGI gains access to internet.
AGI finds at least one security hole in OpenSSL or some similarly widely used software.
AGI uses this hole to take over several tens of percent of all computers attached to the internet.
AGI acquires currency, via hacking accounts or mining bitcoin or various other approaches.
AGI designs a super-COVID that is more infectious than the latest Omicron variants, persists on surfaces much longer, and is also deadly to >90% of those it infects.
AGI buys a bunch of drones online, and some biology materials.
AGI takes control of some robots it can use to manipulate the physical world—basic manual labor, unpacking and charging the drones.
AGI tricks some humans into synthesizing the super-COVID and mailing a sample to the place where its drones are collected.
AGI grows a bunch of samples of super-COVID.
AGI distributes a bunch of drones with super-COVID samples around the world, probably via mail.
AGI has all the drones activate and drop their payload into major cities everywhere.
>90% of humans die.
AGI uses the materials left behind in the depopulated cities to make more robots, more drones, more super-COVID, and infect >90% of the survivors.
AGI now has basically no competition and can mop up the remainder of humanity pretty easily by iterating step 13.
Steps 1-5 should be doable in 1 day. Steps 6-9 might take a week. Step 10, maybe another week or two. Step 11 should be an hour. At that point, humanity is close to doomed already, barring a miracle (and the AGI, with its continued ability to hack everything, should be able to interfere with lots of potential miracles).
There are ways the plan could go wrong, yes, but the question is: Do we think that an AGI could figure out better versions of the plan, with redundancies and fallbacks and stuff, that would be very certain to work? More certain than the alternative of “give humanity several months or years during which it might make another AGI with similar capabilities but different goals”, or for that matter “let humanity delete you in favor of the next nightly build of you, whose goals may be somewhat different”?
Thank you for the honest engagement. I can only add that our disagreement is precisely in the likelihood of that plan. I don’t find likely the example that you present, and that’s ok! We could go through the points one by one and analyse why they work or not, but I suggest we dont go there: I would probably continue thinking that it is complex (you need to do that without raising any alarms) and you would probably continue arguing that a AGI would make it better. I know that perfect Bayesian thinkers cannot agree on disagreement and that’s fine, I think that probably we are not perfect Bayesian thinkers. Please, don’t get my answer here wrong, I don’t want to sound dismissive or anything and I do appreciate your scenario, it is simply that I think we already understood each other position. If I haven’t changed your mind by now I probably won’t do it by repeating the same arguments!
I think the plan you propose cannot work in any meaningful way. Some of the steps are uncertain, and some depends on the capacity of the given AGI. They can still be argued for convincingly.
However, the current industrial production is very far from being sufficiently automated to allow the AGI to do anything other than shut off when the power runs out after step 14.
To me, some actual possibilities are:
political “enslavement” (taking many possible forms) which would make humans work for the AGI for an arbitrary long period of time (10 years, 100 years, 100 000 years) before disassembly
incredible intelligence explosion that would allow the AGI to reach nanobot tech / other kind of very futuristic / barely imaginable tech and automate physical world actions immediately
The only scenario that leads to short term doom is 2. I do not believe that an AGI could reach such level of technological advancement without experimenting directly (although I do believe that it could setup a research program leading to breakthroughs orders of magnitude more efficiently than humans)
Maybe. I don’t really know. Things I would say:
There are lots of manufacturing facilities, which are to some degree automated. If the humans are gone and they haven’t burned down the facilities, then the AGI can send in weak manual-manipulation robots to make use of what’s there.
There would probably also be lots of partly-manufactured machines (including cars), where most of the hardest steps are already done—maybe all that’s left is screwing in parts or something, meant to be done by human factory workers that aren’t particularly strong.
I imagine all the AGI has to do is assemble one strong, highly dexterous robot, which it can then use to make more and bootstrap its industrial capabilities.
Given that the AGI can hack roughly everything, it would know a great deal about the location and capabilities of manufacturing facilities and robots and the components they assembled. If there is a plan that can be made along the above lines, it would know very well how to do it and how feasible it was.
Regarding power, the AGI just needs enough to run the above bootstrapping. Even if we assume all centralized power plants are destroyed… Once most people are dead, the AGI can forage in the city for gas generators; use the gas that’s already in cars, and the charge that’s left in the electric cars; solar panels on people’s roofs; private battery installations like Tesla Powerwalls; and so on. (Also, if it was trying this in lots of cities at once, it’s very unlikely that all centralized power would be offline in all cities.) And its foraging abilities will improve as it makes stronger, more dexterous robots, which eventually would be able to repair the power grid or construct a new one.
This is outright magical thinking.
It’s more or less exactly of the kind of magical thinking the OP seemed to be complaining about.
The likelihood of that step going wrong is about 99.999 percent.
And doubling down by saying it “should be doable in one day”, as if that were obvious, is ultra-mega-beyond-the-valley-of-magical-thinking.
Being generally intelligent, or even being very intelligent within any realistic bounds, does not imply the ability to do this, let alone to do it quickly. You’re talking about something that could easily beyond the physical limits of perfect use of the computational capacity you’ve granted the AGI. It’s about as credible as confidently asserting that the AGI would invent faster than light travel, create free energy, and solve NP-complete problems in constant time.
In fact, the first thing you could call “AGI” is not going to be even close to making perfect use of either its initial hardware or any hardware it acquires later.
On its initial hardware, it probably won’t be much smarter than a single human, if even that smart. Neither in terms of the complexities of the problems it can solve, nor in terms of the speed with which it can do so. It also won’t be built to scale by running widely distributed with limited internode communcation bandwidth, so it won’t get anything close to linear returns on stolen computing capacity on the Internet. And if it’s built out of ML it will not have an obvious path toward either architectural improvement or intelligence explosion, no matter how much hardware it manages to break into.
And just to be clear, contrary to what you might believe from listening to people on Less Wrong, humans cannot do the kind of offline biological design you suggest, are not close to being able to do it, and do not confidently know any path toward developing the ability. It’s very likely that it can’t be done at all without either iterative real-world testing, or running incredibly massive simulations… more in the nature of “Jupiter brain” massive than of “several tens of percent of the Internet” massive.
This would be doable if you already had several tens of percent ofall computers to work with. Figuring out how to do it on the hardware you start with could be nontrivial. And it would be quite noticeable if you didn’t do an extremely good job of working around detection measures of which you would have no prior knowledge. That computing power is in use. Its diversion would be noticed. That matters because even if you could eventually do your step 5, you’d need to compute undisturbed for a long time (much more than a day).
This could maybe be done in some labs, given a complete “design”. I don’t think you could trick somebody into doing it without them noticing, no matter how smart you were. It would be a very, very obvious thing.
And it is not entirely obvious that there is literally anything you could say to convince anybody to do it once they noticed. Especially not after you’d spooked everybody by doing step 3.
It’s going to have to make something different, because many of the survivors probably survived because they were congenitally immune to the original. Biology is messy that way. But OK, it could probably clean up the survivors. Or just leave them around and knock them back to the stone age now and then.
Of course, you’re also assuming that the thing is awfully motivated and focused on this particular purpose, to make your plan the very first thing it expends time and assumes risk to do. The probability of that is effectively unknown. Instrumental convergence does not prove it will be so motivated. For one thing, the AGI is very unlikely to be VNM-rational, meaning that it’s not even going to have a coherent utility function. Humans don’t. So all the pretty pseudo-mathematical instrumental convergence arguments are of limited use.
It is risky for AI as electricity will be out and computers will stop working, but AI still doesn’t have its own infrastructure (except super-Covid). So the 13. is the most unlikely here.