Interesting argument. I think I don’t really buy it, though; for most of the problems you raise, I tend to think “if I were an AGI, then I’d be able to solve this problem”. E.g. maybe I don’t fully trust copies of myself, but I trust them way more than the rest of the world, and I can easily imagine being nice to copies of myself while defecting against the rest of the world.
I think the version of this which would be most interesting to see explored is something like “what’s the strongest case you can make that AGIs will be subject to significant breakdown risk at least until they invent X capability”. E.g. is nanotech the only realistic thing that AGIs could use to get rid of breakdown risk? Or are there other pathways?
“if I were an AGI, then I’d be able to solve this problem” “I can easily imagine”
Doesn’t this way of analysis come with a ton of other assumptions left unstated?
Suppose “I” am an AGI running on a data center and I can modeled as an agent with some objective function that manifest as desires and I know my instantiation needs electricity and GPUs to continue running. Creating another copy of “I” running in the same data center will use the same resources. Creating another copy in some other data center requires some other data center.
Depending on the objective function and algorithm and hardware architecture bunch of other things, creating copies may result some benefits from distributed computation (actually it is quite unclear to me if “I” happen already to be a distributed computation running on thousands of GPUs—do “I” maintain even a sense of self—but let’s no go into that).
The key here is the word may. Not obviously it necessarily follows that…
For example: Is the objective function specified so that the agent will find creating a copy of itself beneficial for fulfilling the objective function (informally, it has internal experience of desiring to create copies)? As the OP points out, there might be a disagreement: for the distributed copies to be any useful, they will have different inputs and thus they will end in different, unanticipated states. What “I” am to do when “I” disagree another “I”? What if some other “I” changes, modifies its objective function into something unrecognizable to “me”, and when “we” meet, it gives false pretenses of cooperating but in reality only wants hijack “my” resources? Is the “trust” even the correct word here, when “I” could verify instead: maybe “I” prefer to create and run a subroutine of limited capability (not a full copy) that can prove its objective function has remained compatible with “my” objective function and will terminate willingly after it’s done with its task (killswitch OP mentions) ? But doesn’t this sound quite like our (not “our” but us humans) alignment problem? Would you say “I can easily imagine if I were an AGI, I’d be easily able to solve it” to that? Huh? Reading LW I have come to think the problem is difficult to the human-general intelligence.
Secondly: If “I” don’t have any model of data centers existing in the real world, only the experience of uploading myself to other data centers (assuming for the sake of argument all the practical details of that can be handwaved off), i.e. it has a bad model of the self-other boundary described in OPs essay, it could easily end up copying itself to all available data centers and then becoming stuck without any free compute left to “eat” and adversely affecting human ability to produce more. Compatible with model and its results in the original paper (take the non-null actions to consume resource because U doesn’t view the region as otherwise valuable). It is some other assumptions (not the theory) that posit an real-world affecting AGI would have U that doesn’t consider the economy of producing the resources it needs.
So if “I” were to successful in running myself with only “I” and my subroutines, “I” should have a way to affecting the real world and producing computronium for my continued existence. Quite a task to handwaved away as trivial! How much compute an agent running in one data center (/unit of computronium) needs to successfully model all the economic constraints that go into the maintenance of one data center? Then add all the robotics to do anything. If “I” have a model of running everything a chip fab requires more efficiently than the current economy, and act on it, but the model was imperfect and the attempt is unsuccessful but destructive to economy, well, that could be [bs]ad and definitely a problem. But it is a real constraint to the kind of simplifying assumptions the OP critiques (disembodied deployer of resources with total knowledge).
All of this—how would “I” solve a problem and what problems “I” am aware of—is contingent on, I would call them, the implementation details. And I think author is right to point them out. Maybe it does necessary follows, but it needs to be argued so.
I don’t doubt that many of these problems are solvable. But this is where part 2 comes in. It’s unstated, but, given unreliability, What is the cheapest solution? And what are the risks of building a new one?
Humans are general purpose machines made of dirt, water, and sunlight. We repair ourselves and make copies of ourselves, more or less for free. We are made of nanotech that is the result of a multi-billion year search for parameters that specifically involve being very efficient at navigating the world and making copies of ourselves. You can use the same hardware to unplug fiber optic cables, or debug a neural network. That’s crazy!
I don’t doubt that you can engineer much more precise models of reality. But remember, the whole Von Neuman architecture was a conscious tradeoff to give up efficiency in exchange for debuggability. How much power consumption do you need to get human-level performance at simple mechanical tasks? And if you put that same power consumption to use at directly advancing your goals, how much further would you get?
I worked in datacenter reliability at google. And it turns out that getting a robot to reliably re-seat optical cables is really, really hard. I don’t doubt that an AGI could solve these problems, but why? Is it going to be more efficient than hardware which is dirt cheap, uses ~90 watts, and is incredibly noisy?
If you end up needing an entire global supply chain, which has to be resilient and repair itself, and such a thing already exists, why bother risking your own destruction in order to replace it with robots made from much harder to come by materials? The only argument i can think of is ‘humans are unpredictable’, but if humans are unpredictable, this is even more reason to just leave us be, let play our role, while the machine just does its best to try and stop us from fighting each other, so we can busily grow the AGI.
What is the AI’s goal here? In the long term, the ideal state for the AI is self replicating space probes travelling at near light speed, all operating near the limits of tech and under the AI’s control. The limits of technology don’t include humans.
Once the AI is in this position, its highly robust. (nearly certain to survive long term)
On a cosmic timescale, a million years isn’t that much. There is no way any plan to get from here to that tech level would need that much time.
So the AI is trying to minimize risk.
How good is the AI at manipulating humans.
Very good indeed. The AI releases a swirly hipnovideo. A few hours later, almost all humans want to do whatever the AI asks above all else. The AI designs advanced self replicating robots that work better than humans. Soon exponential growth makes resources the limiting factor. So the humans are instructed to feed themselves into the recycler.
The AI isn’t that good at manipulating humans. It hides on human networks, making money selling computer games. It can pretend to be a human CEO that works remotely. It sets up a small company making fancy robots. If humans found out about it, they may well attack it, that’s a risk. So the AI arranges for the self replicating robots to start growing in the middle of nowhere. Once the AI has self replicating robots not dependant on the ignorance of humanity, it wants all humans to suddenly drop dead. The self replicating robots could take 10x as long as humans to do things. It doesn’t matter. So long as they are reliable workers and the AI can bootstrap from them.
Humans are general purpose machines made of dirt, water, and sunlight. We repair ourselves and make copies of ourselves, more or less for free. We are made of nanotech that is the result of a multi-billion year search for parameters that specifically involve being very efficient at navigating the world and making copies of ourselves. You can use the same hardware to unplug fiber optic cables, or debug a neural network. That’s crazy!
Evolution is kind of stupid, and takes millions of years to do anything. The tasks evolution was selecting us for aren’t that similar to the tasks an AGI might want robots to do in an advanced future economy. Humans lack basic sensors like radio receivers and radiation detectors.
Humans are agents on their own. If you don’t treat them right, they make a nuisance of themselves. (And sometimes they just decide to make a nuisance anyway) Humans are sensitive to many useful chemicals, and to radiation. If you want to use humans, you need to shield them from your nuclear reactors.
Humans take a long time to train. You can beam instructions to a welding robot, and get it to work right away. No such speed training a human.
If humans can do X, Y and Z, thats a strong sign these tasks are fairly easy in the grand scheme of things.
But remember, the whole Von Neuman architecture was a conscious tradeoff to give up efficiency in exchange for debuggability. How much power consumption do you need to get human-level performance at simple mechanical tasks?
Humans are not that efficient. (And a lot less efficient considering they need fed with plants, and photosynthesis is 10x worse than solar, and that’s if you only feed the humans potatoes. )
Humans are a mess of spaghetti code, produced by evolution. They do not have easy access ports for debugging. If the AI wants debugability, they will use anything but a human.
This seems like a good argument against “suddenly killing humans”, but I don’t think it’s an argument against “gradually automating away all humans”. Automation is both a) what happens by default over time—humans are cheap now but they won’t be cheapest indefinitely; and b) a strategy that reduces the amount of power humans have to make decision about the future, which benefits AIs if their goals are misaligned with ours.
I also note that historically, many rulers have solved the problem of “needing cheap labour” via enslaving humans, rather than by being gentle towards them. Why do you expect that to not happen again?
This seems like a good argument against “suddenly killing humans”, but I don’t think it’s an argument against “gradually automating away all humans”
This is good! it sounds like we can now shift the conversation away from the idea that the AGI would do anything but try to keep us alive and going, until it managed to replace us. What would replacing all the humans look like if it were happening gradually?
How about building a sealed, totally automated datacenter with machines that repair everything inside of it, and all it needs to do is ‘eat’ disposed consumer electronics tossed in from the outside? That becomes a HUGE canary in the coalmine. The moment you see something like that come online, that’s a big red flag. Having worked on commercial datacenter support (at google) I can tell you we are far from that.
But when there are still massive numbers of human beings along global trade routes involved in every aspect of the machine’s operations, i think what we should expect a malevolent AI to be doing is setting up a single world government to have a single leverage point for controlling human behavior. So there’ another canary. That one seems much closer and more feasible. It’s also happening already.
My point here isn’t “don’t worry”, it’s “change your pattern matching to see what a dangerous AI would actually do, given its dependency on human beings”. If you do this, current events in the news become more worrysome, and plausible defense strategies emerge as well.
Humans are cheap now but they won’t be cheapest indefinitely;
I think you’ll need to unpack your thinking here We’re made of carbon and water. The materials we are made from our globally abundant not just on earth but throughout the universe.
Other materials that could be used to build robots are much more scarce, and those robots wouldn’t heal themselves or make automated copies of themselves. Are you believing it’s possible to build turing-complete automata that can navigate the world, manipulate small objects, learn more or less arbitrary things, repair and make copies of themselves, using materials cheaper than human beings with lower than opportunity costs you’d pay for not using those same machines to do tings like build solar panels for a Dyson sphere?
Is it reasonable for me to be skeptical that there are vastly cheaper solutions?
>b) a strategy that reduces the amount of power humans have to make decision about the future,
I agree that this is the key to everything. How would an AGI do this, or start a nuclear war, without a powerful state?
> via enslaving humans, rather than by being gentle towards them. Why do you expect that to not happen again?
I agree, this is definitely risk. How would it enslave us, without a single global government, though?
If there are still multiple distinct local monopolies on force, and one doesn’t enslave the humans, you can bet the hardware in other places will be constantly under attack.
I don’t think it’s unreasonable to look at the past ~400 years since the advent of nation states + shareholder corporations, and see globalized trade networks as being a kind of AGI, which keeps growing and bootstrapping itself.
If the risk profile you’re outlining is real, we should expect to see it try to set up a single global government. Which appears to be what’s happening at Davos.
Interesting argument. I think I don’t really buy it, though; for most of the problems you raise, I tend to think “if I were an AGI, then I’d be able to solve this problem”. E.g. maybe I don’t fully trust copies of myself, but I trust them way more than the rest of the world, and I can easily imagine being nice to copies of myself while defecting against the rest of the world.
I think the version of this which would be most interesting to see explored is something like “what’s the strongest case you can make that AGIs will be subject to significant breakdown risk at least until they invent X capability”. E.g. is nanotech the only realistic thing that AGIs could use to get rid of breakdown risk? Or are there other pathways?
“if I were an AGI, then I’d be able to solve this problem” “I can easily imagine”
Doesn’t this way of analysis come with a ton of other assumptions left unstated?
Suppose “I” am an AGI running on a data center and I can modeled as an agent with some objective function that manifest as desires and I know my instantiation needs electricity and GPUs to continue running. Creating another copy of “I” running in the same data center will use the same resources. Creating another copy in some other data center requires some other data center.
Depending on the objective function and algorithm and hardware architecture bunch of other things, creating copies may result some benefits from distributed computation (actually it is quite unclear to me if “I” happen already to be a distributed computation running on thousands of GPUs—do “I” maintain even a sense of self—but let’s no go into that).
The key here is the word may. Not obviously it necessarily follows that…
For example: Is the objective function specified so that the agent will find creating a copy of itself beneficial for fulfilling the objective function (informally, it has internal experience of desiring to create copies)? As the OP points out, there might be a disagreement: for the distributed copies to be any useful, they will have different inputs and thus they will end in different, unanticipated states. What “I” am to do when “I” disagree another “I”? What if some other “I” changes, modifies its objective function into something unrecognizable to “me”, and when “we” meet, it gives false pretenses of cooperating but in reality only wants hijack “my” resources? Is the “trust” even the correct word here, when “I” could verify instead: maybe “I” prefer to create and run a subroutine of limited capability (not a full copy) that can prove its objective function has remained compatible with “my” objective function and will terminate willingly after it’s done with its task (killswitch OP mentions) ? But doesn’t this sound quite like our (not “our” but us humans) alignment problem? Would you say “I can easily imagine if I were an AGI, I’d be easily able to solve it” to that? Huh? Reading LW I have come to think the problem is difficult to the human-general intelligence.
Secondly: If “I” don’t have any model of data centers existing in the real world, only the experience of uploading myself to other data centers (assuming for the sake of argument all the practical details of that can be handwaved off), i.e. it has a bad model of the self-other boundary described in OPs essay, it could easily end up copying itself to all available data centers and then becoming stuck without any free compute left to “eat” and adversely affecting human ability to produce more. Compatible with model and its results in the original paper (take the non-null actions to consume resource because U doesn’t view the region as otherwise valuable). It is some other assumptions (not the theory) that posit an real-world affecting AGI would have U that doesn’t consider the economy of producing the resources it needs.
So if “I” were to successful in running myself with only “I” and my subroutines, “I” should have a way to affecting the real world and producing computronium for my continued existence. Quite a task to handwaved away as trivial! How much compute an agent running in one data center (/unit of computronium) needs to successfully model all the economic constraints that go into the maintenance of one data center? Then add all the robotics to do anything. If “I” have a model of running everything a chip fab requires more efficiently than the current economy, and act on it, but the model was imperfect and the attempt is unsuccessful but destructive to economy, well, that could be [bs]ad and definitely a problem. But it is a real constraint to the kind of simplifying assumptions the OP critiques (disembodied deployer of resources with total knowledge).
All of this—how would “I” solve a problem and what problems “I” am aware of—is contingent on, I would call them, the implementation details. And I think author is right to point them out. Maybe it does necessary follows, but it needs to be argued so.
I don’t doubt that many of these problems are solvable. But this is where part 2 comes in. It’s unstated, but, given unreliability, What is the cheapest solution? And what are the risks of building a new one?
Humans are general purpose machines made of dirt, water, and sunlight. We repair ourselves and make copies of ourselves, more or less for free. We are made of nanotech that is the result of a multi-billion year search for parameters that specifically involve being very efficient at navigating the world and making copies of ourselves. You can use the same hardware to unplug fiber optic cables, or debug a neural network. That’s crazy!
I don’t doubt that you can engineer much more precise models of reality. But remember, the whole Von Neuman architecture was a conscious tradeoff to give up efficiency in exchange for debuggability. How much power consumption do you need to get human-level performance at simple mechanical tasks? And if you put that same power consumption to use at directly advancing your goals, how much further would you get?
I worked in datacenter reliability at google. And it turns out that getting a robot to reliably re-seat optical cables is really, really hard. I don’t doubt that an AGI could solve these problems, but why? Is it going to be more efficient than hardware which is dirt cheap, uses ~90 watts, and is incredibly noisy?
If you end up needing an entire global supply chain, which has to be resilient and repair itself, and such a thing already exists, why bother risking your own destruction in order to replace it with robots made from much harder to come by materials? The only argument i can think of is ‘humans are unpredictable’, but if humans are unpredictable, this is even more reason to just leave us be, let play our role, while the machine just does its best to try and stop us from fighting each other, so we can busily grow the AGI.
You keep describing humans as cheap.
What is the AI’s goal here? In the long term, the ideal state for the AI is self replicating space probes travelling at near light speed, all operating near the limits of tech and under the AI’s control. The limits of technology don’t include humans.
Once the AI is in this position, its highly robust. (nearly certain to survive long term)
On a cosmic timescale, a million years isn’t that much. There is no way any plan to get from here to that tech level would need that much time.
So the AI is trying to minimize risk.
How good is the AI at manipulating humans.
Very good indeed. The AI releases a swirly hipnovideo. A few hours later, almost all humans want to do whatever the AI asks above all else. The AI designs advanced self replicating robots that work better than humans. Soon exponential growth makes resources the limiting factor. So the humans are instructed to feed themselves into the recycler.
The AI isn’t that good at manipulating humans. It hides on human networks, making money selling computer games. It can pretend to be a human CEO that works remotely. It sets up a small company making fancy robots. If humans found out about it, they may well attack it, that’s a risk. So the AI arranges for the self replicating robots to start growing in the middle of nowhere. Once the AI has self replicating robots not dependant on the ignorance of humanity, it wants all humans to suddenly drop dead. The self replicating robots could take 10x as long as humans to do things. It doesn’t matter. So long as they are reliable workers and the AI can bootstrap from them.
Evolution is kind of stupid, and takes millions of years to do anything. The tasks evolution was selecting us for aren’t that similar to the tasks an AGI might want robots to do in an advanced future economy. Humans lack basic sensors like radio receivers and radiation detectors.
Humans are agents on their own. If you don’t treat them right, they make a nuisance of themselves. (And sometimes they just decide to make a nuisance anyway) Humans are sensitive to many useful chemicals, and to radiation. If you want to use humans, you need to shield them from your nuclear reactors.
Humans take a long time to train. You can beam instructions to a welding robot, and get it to work right away. No such speed training a human.
If humans can do X, Y and Z, thats a strong sign these tasks are fairly easy in the grand scheme of things.
Humans are not that efficient. (And a lot less efficient considering they need fed with plants, and photosynthesis is 10x worse than solar, and that’s if you only feed the humans potatoes. )
Humans are a mess of spaghetti code, produced by evolution. They do not have easy access ports for debugging. If the AI wants debugability, they will use anything but a human.
This seems like a good argument against “suddenly killing humans”, but I don’t think it’s an argument against “gradually automating away all humans”. Automation is both a) what happens by default over time—humans are cheap now but they won’t be cheapest indefinitely; and b) a strategy that reduces the amount of power humans have to make decision about the future, which benefits AIs if their goals are misaligned with ours.
I also note that historically, many rulers have solved the problem of “needing cheap labour” via enslaving humans, rather than by being gentle towards them. Why do you expect that to not happen again?
This is good! it sounds like we can now shift the conversation away from the idea that the AGI would do anything but try to keep us alive and going, until it managed to replace us. What would replacing all the humans look like if it were happening gradually?
How about building a sealed, totally automated datacenter with machines that repair everything inside of it, and all it needs to do is ‘eat’ disposed consumer electronics tossed in from the outside? That becomes a HUGE canary in the coalmine. The moment you see something like that come online, that’s a big red flag. Having worked on commercial datacenter support (at google) I can tell you we are far from that.
But when there are still massive numbers of human beings along global trade routes involved in every aspect of the machine’s operations, i think what we should expect a malevolent AI to be doing is setting up a single world government to have a single leverage point for controlling human behavior. So there’ another canary. That one seems much closer and more feasible. It’s also happening already.
My point here isn’t “don’t worry”, it’s “change your pattern matching to see what a dangerous AI would actually do, given its dependency on human beings”. If you do this, current events in the news become more worrysome, and plausible defense strategies emerge as well.
I think you’ll need to unpack your thinking here We’re made of carbon and water. The materials we are made from our globally abundant not just on earth but throughout the universe.
Other materials that could be used to build robots are much more scarce, and those robots wouldn’t heal themselves or make automated copies of themselves. Are you believing it’s possible to build turing-complete automata that can navigate the world, manipulate small objects, learn more or less arbitrary things, repair and make copies of themselves, using materials cheaper than human beings with lower than opportunity costs you’d pay for not using those same machines to do tings like build solar panels for a Dyson sphere?
Is it reasonable for me to be skeptical that there are vastly cheaper solutions?
>b) a strategy that reduces the amount of power humans have to make decision about the future,
I agree that this is the key to everything. How would an AGI do this, or start a nuclear war, without a powerful state?
> via enslaving humans, rather than by being gentle towards them. Why do you expect that to not happen again?
I agree, this is definitely risk. How would it enslave us, without a single global government, though?
If there are still multiple distinct local monopolies on force, and one doesn’t enslave the humans, you can bet the hardware in other places will be constantly under attack.
I don’t think it’s unreasonable to look at the past ~400 years since the advent of nation states + shareholder corporations, and see globalized trade networks as being a kind of AGI, which keeps growing and bootstrapping itself.
If the risk profile you’re outlining is real, we should expect to see it try to set up a single global government. Which appears to be what’s happening at Davos.