The approach I often take here is to ask the person how they would persuade an amateur chess player who believes they can beat Magnus Carlsen because they’ve discovered a particularly good opening with which they’ve won every amateur game they’ve tried it in so far.
Them: Magnus Carlsen will still beat you, with near certainty
Me: But what is he going to do? This opening is unbeatable!
Them: He’s much better at chess than you, he’ll figure something out
Me: But what though? I can’t think of any strategy that beats this
Them: I don’t know, maybe he’ll find a way to do <some chess thing X>
Me: If he does X I can just counter it by doing Y!
Them: Ok if X is that easily countered with Y then he won’t do X, he’ll do some Z that’s like X but that you don’t know how to counter
Me: Oh, but you conveniently can’t tell me what this Z is
Them: Right! I’m not as good at chess as he is and neither are you. I can be confident he’ll beat you even without knowing your opener. You cannot expect to win against someone who outclasses you.
If someone builds an AGI, it’s likely that they want to actually use it for something and not just keep it in a box. So eventually it’ll be given various physical resources to control (directly or indirectly), and then it might be difficult to just shut down. I discussed some possible pathways in Disjunctive Scenarios of Catastrophic AGI Risk, here are some excerpts:
DSA/MSA Enabler: Power Gradually Shifting to AIs
The historical trend has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans can. Any kind of a business could potentially run better if it were run by a mind that had been custom-built for running the business—up to and including the replacement of all the workers with one or more with such minds. An AI can think faster and smarter, deal with more information at once, and work for a unified purpose rather than have its efficiency weakened by the kinds of office politics that plague any large organization. Some estimates already suggest that half of the tasks that people are paid to do are susceptible to being automated using techniques from modern-day machine learning and robotics, even without postulating AIs with general intelligence (Frey & Osborne 2013, Manyika et al. 2017).
The trend toward automation has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better. There is a risk that AI systems that were initially simple and of limited intelligence would gradually gain increasing power and responsibilities as they learned and were upgraded, until large parts of society were under AI control. [...]
Voluntarily Released for Economic Benefit or Competitive Pressure
As discussed above under “power gradually shifting to AIs,” there is an economic incentive to deploy AI systems in control of corporations. This can happen in two forms: either by expanding the amount of control that already-existing systems have, or alternatively by upgrading existing systems or adding new ones with previously-unseen capabilities. These two forms can blend into each other. If humans previously carried out some functions which are then given over to an upgraded AI which has become recently capable of doing them, this can increase the AI’s autonomy both by making it more powerful and by reducing the amount of humans that were previously in the loop
As a partial example, the U.S. military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop” (Wallach & Allen 2013). In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong. While this would allow the system to react faster, it would also limit the window that the human operators have for overriding any mistakes that the system makes. For a number of military systems, such as automatic weapons defense systems designed to shoot down incoming missiles and rockets, the extent of human oversight is already limited to accepting or overriding a computer’s plan of actions in a matter of seconds, which may be too little to make a meaningful decision in practice (Human Rights Watch 2012).
Sparrow (2016) reviews three major reasons which incentivize major governments to move toward autonomous weapon systems and reduce human control:
1. Currently existing remotely piloted military “drones,” such as the U.S. Predator and Reaper, require a high amount of communications bandwidth. This limits the amount of drones that can be fielded at once, and makes them dependent on communications satellites which not every nation has, and which can be jammed or targeted by enemies. A need to be in constant communication with remote operators also makes it impossible to create drone submarines, which need to maintain a communications blackout before and during combat. Making the drones autonomous and capable of acting without human supervision would avoid all of these problems.
2. Particularly in air-to-air combat, victory may depend on making very quick decisions. Current air combat is already pushing against the limits of what the human nervous system can handle: further progress may be dependent on removing humans from the loop entirely.
3. Much of the routine operation of drones is very monotonous and boring, which is a major contributor to accidents. The training expenses, salaries, and other benefits of the drone operators are also major expenses for the militaries employing them.
Sparrow’s arguments are specific to the military domain, but they demonstrate the argument that “any broad domain involving high stakes, adversarial decision making, and a need to act rapidly is likely to become increasingly dominated by autonomous systems” (Sotala & Yampolskiy 2015, p. 18).
Similar arguments can be made in the business domain: eliminating human employees to reduce costs from mistakes and salaries is something that companies would also be incentivized to do, and making a profit in the field of high-frequency trading already depends on outperforming other traders by fractions of a second. While the currently existing AI systems are not powerful enough to cause global catastrophe, incentives such as these might drive an upgrading of their capabilities that eventually brought them to that point.
In the absence of sufficient regulation, there could be a “race to the bottom of human control” where state or business actors competed to reduce human control and increased the autonomy of their AI systems to obtain an edge over their competitors (see also Armstrong et al. 2016 for a simplified “race to the precipice” scenario). This would be analogous to the “race to the bottom” in current politics, where government actors compete to deregulate or to lower taxes in order to retain or attract businesses.
AI systems being given more power and autonomy might be limited by the fact that doing this poses large risks for the actor if the AI malfunctions. In business, this limits the extent to which major, established companies might adopt AI-based control, but incentivizes startups to try to invest in autonomous AI in order to outcompete the established players. In the field of algorithmic trading, AI systems are currently trusted with enormous sums of money despite the potential to make corresponding losses—in 2012, Knight Capital lost $440 million due to a glitch in their trading software (Popper 2012, Securities and Exchange Commission 2013). This suggests that even if a malfunctioning AI could potentially cause major risks, some companies will still be inclined to invest in placing their business under autonomous AI control if the potential profit is large enough. [...]
The AI Remains Contained, But Ends Up Effectively in Control Anyway
Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an AI. This would particularly be the case after the AI had functioned for a while, and established a reputation as trustworthy. It may become common practice to act automatically on the AI’s recommendations, and it may become increasingly difficult to challenge the “authority” of the recommendations. Eventually, the AI may in effect begin to dictate decisions (Friedman & Kahn 1992).
Likewise, Bostrom and Yudkowsky (2014) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AI system would be another way of avoiding blame.
O’Neil (2016) documents a number of situations in which modern-day machine learning is used to make substantive decisions, even though the exact models behind those decisions may be trade secrets or otherwise hidden from outside critique. Among other examples, such models have been used to fire school teachers that the systems classified as underperforming and give harsher sentences to criminals that a model predicted to have a high risk of reoffending. In some cases, people have been skeptical of the results of the systems, and even identified plausible reasons why their results might be wrong, but still went along with their authority as long as it could not be definitely shown that the models were erroneous.
In the military domain, Wallach & Allen (2013) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger.
One thing that’s worth sharing is that if it’s connected to the internet it’ll be able to spread a bunch of copies and these copies can pursue independent plans. Some copies may be pursuing plans that are intentionally designed as distractions and this will make it easy to miss the real threats (I expect there will be multiple).
Now you’re telling me that a superintelligence will be able to wait in the weeds until the exact right time when it burst out of hiding and kills all of humanity all at once
One particular sub-answer is that a lot of people tend to project human time preference to AIs in a way that doesn’t actually make sense. Humans get bored and are unwilling to devote their entire lives to plans, but that’s not an immutable fact about intelligent agents. Why wouldn’t an AI be willing to wait a hundred years, or start long running robotics research programmes in pursuit of a larger goal?
If you really needed to get a piece of DNA printed and grown in yeast, but could only browse the internet and use email, what sorts of emails might you try sending? Maybe find some gullible biohackers, or pretend to be a grad student’s advisor?
The DNA codes for a virus that will destroy human civilization.
The general principle at work is that sending emails is “physically doing something,” just as much as moving my fingers is.
I think most writing glosses over this point because it’d be hard to know exactly how it would kill us and doesn’t matter, but it hurts the persuasiveness of discussion to not have more detailed and gamed out scenarios.
I have a few very specific world-ending scenarios I think are quite plausible, but I’ve been hesitant to share them in the past since I worry that doing so would make them more likely to be carried out. At what point does this concern get outweighed by the potential upside of removing this bottleneck against AGI safety concerns?
To be fair, I think that everyone in a position to actually control the first super-intelligent AGIs will likely already be aware of most of the realistic catastrophic scenarios that humans could preemptively conceive of. The most sophisticated governments and tech companies devote significant resources to assessing risks and creating highly detailed models of disastrous situations.
And on the reverse, even if your scenarios were to become widely discussed on social media and news platforms, something like 99.9999999% of the potential audience for this information probably has absolutely no power to make them come true even if they devoted their lives to it.
If anything, I would think that openly discussing realistic scenarios that could lead to AI-induced human extinction would do a lot more good than not, because it could raise awareness of the masses and eventually manifest in preventative legislation. Make no mistake: unless you have one of the greatest minds of our time, I’d bet my next paycheck that you’re not the only one who’s considered the scenarios you’re referring to. So in keeping them to yourself, it seems to me that it would only serve to reduce awareness of the risks that already exist, and keep those ideas only in the hands of the people who understand AI (including and especially the people who intend to wreak havoc on the world).
While this doesn’t answer the question exactly, I think important parts of the answer include the fact that AGI could upload itself to other computers, as well as acquire resources (minimally money) completely through using the internet (e. g. through investing in stocks via the internet). A superintelligent system with access to trillions of dollars and with huge numbers of copies of itself on computers throughout the world more obviously has a lot of potentially very destructive actions available to it than one stuck on one computer with no resources.
The common-man’s answer here would presumably be along the lines of “so we’ll just make it illegal for an A.I. to control vast sums of money long before it gets to owning a trillion — maybe an A.I. can successfully pass off as an obscure investor when we’re talking tens of thousands or even millions, but if a mysterious agent starts claiming ownership of a significant percentage of the world GDP, its non-humanity will be discovered and the appropriate authorities will declare its non-physical holdings void, or repossess them, or something else sensible”.
To be clear I don’t think this is correct, but this is a step you would need to have an answer for.
Not to mention pensions, trusts, non-profit organizations, charities, shell corporations and holding vehicles, offshore tax havens, quangos, churches, monasteries, hedge funds (derivatives, swaps, contracts, partnerships...), banks, monarchies, ‘corporations’ like the City of London, entities like the Isle of Man, aboriginal groups such as ‘sovereign’ American Indian tribes, blockchains (smart contracts, DAOs, multisig, ZKPs...)… If mysterious agents claimed assets equivalent to a fraction of annual GDP flow… how would you know? How would the world look any different than it looks now, where a very physical, very concrete megayacht worth half a billion dollars can sit in plain sight at a dock in a Western country and no one knows who really owns it even if many of them are convinced Putin owns it as part of his supposed $200b personal fortune scattered across… stuff? Who owns the $0.5b Da Vinci, for that matter?
There’s a related Stampy answer, based on Critch’s post. It requires them to be willing to watch a video, but seems likely to be effective.
A commonly heard argument goes: yes, a superintelligent AI might be far smarter than Einstein, but it’s still just one program, sitting in a supercomputer somewhere. That could be bad if an enemy government controls it and asks it to help invent superweapons – but then the problem is the enemy government, not the AI per se. Is there any reason to be afraid of the AI itself? Suppose the AI did appear to be hostile, suppose it even wanted to take over the world: why should we think it has any chance of doing so?
There are numerous carefully thought-out AGI-related scenarios which could result in the accidental extinction of humanity. But rather than focussing on any of these individually, it might be more helpful to think in general terms.
“Transistors can fire about 10 million times faster than human brain cells, so it’s possible we’ll eventually have digital minds operating 10 million times faster than us, meaning from a decision-making perspective we’d look to them like stationary objects, like plants or rocks… To give you a sense, here’s what humans look like when slowed down by only around 100x.”
Watch that, and now try to imagine advanced AI technology running for a single year around the world, making decisions and taking actions 10 million times faster than we can. That year for us becomes 10 million subjective years for the AI, in which ”...there are these nearly-stationary plant-like or rock-like “human” objects around that could easily be taken apart for, say, biofuel or carbon atoms, if you could just get started building a human-disassembler. Visualizing things this way, you can start to see all the ways that a digital civilization can develop very quickly into a situation where there are no humans left alive, just as human civilization doesn’t show much regard for plants or wildlife or insects.”
And even putting aside these issues of speed and subjective time, the difference in (intelligence-based) power-to-manipulate-the-world between a self-improving superintelligent AGI and humanity could be far more extreme than the difference in such power between humanity and insects.
The superintelligence automatically controls all computers connected to the Internet. Many of them can create robotic bodies.
It also automatically controls all current robotic bodies (either because they’re connected to a computer that’s connected to the Internet, or because there is some data path from those computers to the bodies).
By extension, it controls all companies. Including those that [infohazard], etc.
(Edit: It also controls all governments, and everything any government can do.)
It can bribe, threaten or simply pay anyone to do anything a person can be threatened, bribed or paid to do. It can chain plans in this way—the first person doesn’t need to know they’re a part of a bigger plan, and their action will appear harmless to them (or even beneficial).
I’m not sure there is anything the superintelligence couldn’t do.
The operator in the charge of the shutdown button can be killed, he can be framed to be arrested, blackmailed into not pressing it, the AI can talk itself out of the box, it can pay someone to kill the operator, etc., etc.
Usually, it’s the failure of imagination of that person to conceive of how something could be possible. The last person I talked to gave me an example with how it would be impossible for Stephen Hawking to control his cat—a problem I find conceivably doable (in S.H.’s place), Hawking, I suspect, would find it only moderately difficult, and the superintelligence very easy.
The approach I often take here is to ask the person how they would persuade an amateur chess player who believes they can beat Magnus Carlsen because they’ve discovered a particularly good opening with which they’ve won every amateur game they’ve tried it in so far.
Them: Magnus Carlsen will still beat you, with near certainty
Me: But what is he going to do? This opening is unbeatable!
Them: He’s much better at chess than you, he’ll figure something out
Me: But what though? I can’t think of any strategy that beats this
Them: I don’t know, maybe he’ll find a way to do <some chess thing X>
Me: If he does X I can just counter it by doing Y!
Them: Ok if X is that easily countered with Y then he won’t do X, he’ll do some Z that’s like X but that you don’t know how to counter
Me: Oh, but you conveniently can’t tell me what this Z is
Them: Right! I’m not as good at chess as he is and neither are you. I can be confident he’ll beat you even without knowing your opener. You cannot expect to win against someone who outclasses you.
Plot twist: Humanity with near total control of the planet is Magnus Carlson, obviously.
If someone builds an AGI, it’s likely that they want to actually use it for something and not just keep it in a box. So eventually it’ll be given various physical resources to control (directly or indirectly), and then it might be difficult to just shut down. I discussed some possible pathways in Disjunctive Scenarios of Catastrophic AGI Risk, here are some excerpts:
One thing that’s worth sharing is that if it’s connected to the internet it’ll be able to spread a bunch of copies and these copies can pursue independent plans. Some copies may be pursuing plans that are intentionally designed as distractions and this will make it easy to miss the real threats (I expect there will be multiple).
One particular sub-answer is that a lot of people tend to project human time preference to AIs in a way that doesn’t actually make sense. Humans get bored and are unwilling to devote their entire lives to plans, but that’s not an immutable fact about intelligent agents. Why wouldn’t an AI be willing to wait a hundred years, or start long running robotics research programmes in pursuit of a larger goal?
If you really needed to get a piece of DNA printed and grown in yeast, but could only browse the internet and use email, what sorts of emails might you try sending? Maybe find some gullible biohackers, or pretend to be a grad student’s advisor?
The DNA codes for a virus that will destroy human civilization.
The general principle at work is that sending emails is “physically doing something,” just as much as moving my fingers is.
Thanks for writing this out!
I think most writing glosses over this point because it’d be hard to know exactly how it would kill us and doesn’t matter, but it hurts the persuasiveness of discussion to not have more detailed and gamed out scenarios.
I have a few very specific world-ending scenarios I think are quite plausible, but I’ve been hesitant to share them in the past since I worry that doing so would make them more likely to be carried out. At what point does this concern get outweighed by the potential upside of removing this bottleneck against AGI safety concerns?
To be fair, I think that everyone in a position to actually control the first super-intelligent AGIs will likely already be aware of most of the realistic catastrophic scenarios that humans could preemptively conceive of. The most sophisticated governments and tech companies devote significant resources to assessing risks and creating highly detailed models of disastrous situations.
And on the reverse, even if your scenarios were to become widely discussed on social media and news platforms, something like 99.9999999% of the potential audience for this information probably has absolutely no power to make them come true even if they devoted their lives to it.
If anything, I would think that openly discussing realistic scenarios that could lead to AI-induced human extinction would do a lot more good than not, because it could raise awareness of the masses and eventually manifest in preventative legislation. Make no mistake: unless you have one of the greatest minds of our time, I’d bet my next paycheck that you’re not the only one who’s considered the scenarios you’re referring to. So in keeping them to yourself, it seems to me that it would only serve to reduce awareness of the risks that already exist, and keep those ideas only in the hands of the people who understand AI (including and especially the people who intend to wreak havoc on the world).
While this doesn’t answer the question exactly, I think important parts of the answer include the fact that AGI could upload itself to other computers, as well as acquire resources (minimally money) completely through using the internet (e. g. through investing in stocks via the internet). A superintelligent system with access to trillions of dollars and with huge numbers of copies of itself on computers throughout the world more obviously has a lot of potentially very destructive actions available to it than one stuck on one computer with no resources.
The common-man’s answer here would presumably be along the lines of “so we’ll just make it illegal for an A.I. to control vast sums of money long before it gets to owning a trillion — maybe an A.I. can successfully pass off as an obscure investor when we’re talking tens of thousands or even millions, but if a mysterious agent starts claiming ownership of a significant percentage of the world GDP, its non-humanity will be discovered and the appropriate authorities will declare its non-physical holdings void, or repossess them, or something else sensible”.
To be clear I don’t think this is correct, but this is a step you would need to have an answer for.
Huh, why? The agent can pretend to be multiple agents, possibly thousands of them. It can also use fake human identities.
Not to mention pensions, trusts, non-profit organizations, charities, shell corporations and holding vehicles, offshore tax havens, quangos, churches, monasteries, hedge funds (derivatives, swaps, contracts, partnerships...), banks, monarchies, ‘corporations’ like the City of London, entities like the Isle of Man, aboriginal groups such as ‘sovereign’ American Indian tribes, blockchains (smart contracts, DAOs, multisig, ZKPs...)… If mysterious agents claimed assets equivalent to a fraction of annual GDP flow… how would you know? How would the world look any different than it looks now, where a very physical, very concrete megayacht worth half a billion dollars can sit in plain sight at a dock in a Western country and no one knows who really owns it even if many of them are convinced Putin owns it as part of his supposed $200b personal fortune scattered across… stuff? Who owns the $0.5b Da Vinci, for that matter?
Yes, I agree. This is why I said “I don’t think this is correct”. But unless you specify this, I don’t think a layperson would guess this.
There’s a related Stampy answer, based on Critch’s post. It requires them to be willing to watch a video, but seems likely to be effective.
That’s the static version, see Stampy for a live one which might have been improved since this post.
Maybe they have a point
The superintelligence automatically controls all computers connected to the Internet. Many of them can create robotic bodies.
It also automatically controls all current robotic bodies (either because they’re connected to a computer that’s connected to the Internet, or because there is some data path from those computers to the bodies).
By extension, it controls all companies. Including those that [infohazard], etc.
(Edit: It also controls all governments, and everything any government can do.)
It can bribe, threaten or simply pay anyone to do anything a person can be threatened, bribed or paid to do. It can chain plans in this way—the first person doesn’t need to know they’re a part of a bigger plan, and their action will appear harmless to them (or even beneficial).
I’m not sure there is anything the superintelligence couldn’t do.
The operator in the charge of the shutdown button can be killed, he can be framed to be arrested, blackmailed into not pressing it, the AI can talk itself out of the box, it can pay someone to kill the operator, etc., etc.
Usually, it’s the failure of imagination of that person to conceive of how something could be possible. The last person I talked to gave me an example with how it would be impossible for Stephen Hawking to control his cat—a problem I find conceivably doable (in S.H.’s place), Hawking, I suspect, would find it only moderately difficult, and the superintelligence very easy.