I think that’s mostly a really good summary. The major distinction I would try to make is that agenthood is primarily a way to actualize power, rather than a source of it.
If you had an agent that wasn’t strongly optimized in any sense other than it was an agent, in that it had goals and wanted to solve them, that wouldn’t make it dangerous, any more than your dog is dangerous for being an agent. Whereas the converse, if you have something that’s strongly optimised in some more generic sense, but wasn’t an agent, this still puts you extremely close to a lot of danger. The article was trying to emphasize this by pointing to the most reductive form of agenthood I could see, in that none of the intrinsic power of the resulting system could reasonably be attributed to any intrinsic smartness of the agent component, even if the system was an agent that was powerful.
I think there’s some additional nuance here that makes a difference.
Most extremely optimized outputs are benign. Like suppose I’m trying to measure the length of a pieces of wood, at an extremely high level of precision. The capabilities needed to get an atomic-level measurement might be dangerous, but the actual output would harmeless, a number on paper.
It’s not that optimized outputs are dangerous, it’s that optimization is dangerous.
This is an unnatural use of “most”. Extremely optimized outputs will tend to be dangerous, on their own, even if they are actually just optimized “for something”. It seems more natural to say that for most features such that you know how to ask for something to be very optimized on that feature, something extremely optimized for that feature will be dangerous.
I agree with that example but I don’t see the distinction the same way. An optimised measure of that sort is safe primarily because it is within an extremely limited domain without much freedom for there to be a lot of optimality, in some informal sense.
Contrast, capabilities for getting very precise measures of that sort exist in the space of things-you-can-do-in-reality, so there is lots of room for such capabilities to be both benign (an extremely accurate laboratory machine) or dangerous (the shortest program that if executed would have that measurement performed). I wouldn’t say that there is an important distinction in it involving an optimizing action—an optimiser—but that the domain is large enough such optimal results within it are dangerous in general.
For instance, the process of optimizing a simple value within a simple domain can be as simple as Newton–Raphson, and that’s safe because the domain is sufficiently restricted. Contrast, a sufficiently optimised book ends the world, a widget sufficiently optimised for manufacturability ends the world, a baseball sufficiently optimised for speed ends the world.
While I agree that there are many targets that are harmless if optimised for, like you could have a dumpling optimised to be exactly 1kg in mass, I still see a lot of these outputs being intrinsically dangerous. To me, the key danger of optimal strategies is that they are optimal within a sufficiently broad domain, and the key danger of optimisers is that they produce a lot of optimised outputs.
Ok. Let me try to draw out why optimized stuff is inherently dangerous. This might be a bit meandering.
I think it’s because humans live in an only mildly optimized world. There’s this huge, high dimensional space of the “the way the world can be” with a bunch of parameters including, the force of gravity, the percentage of oxygen in the air, the number of rabbits, the amount of sunlight that reaches the surface of the earth, the virulence of various viruses, etc. Human life is fragile; it depends on the remaining within a relatively narrow “goldilocks” band for a huge number of those parameters.
Optimizing hard on anything, unless it is specifically for maintaining the those goldilocks conditions, implies extremizing. Even the optimization is not itself for an extreme value (eg one could be trying to maintain the oxygen percentage in the air at exactly 21.45600 percent), hitting a value that precisely means doing something substantially different than what the world would otherwise be doing. Hitting a value that precisely means that you have to extremize on some parameter. To get a highly optimized value you have to steer reality into a corner case that is far outside the bounds of the current distribution of outcomes on planet earth.
Indeed, if it isn’t far outside the current distribution of outcomes on planet earth, that suggests that there’s a lot of room left for further optimization. This is because the world is not already optimized on that given parameter, and because the world is so high dimensional it would be staggeringly, exponentially, unlikely that the precisely optimized outcome was within the bounds of the current distribution of outcomes. By default, you should expect that perfect optimization on any given parameter would be a random draw from the state space of all possible ways that earth can be. So if the world looks pretty normal, you haven’t optimized very hard for anything.
That sounds right to me. A key addendum might be that extremizing one value will often extremize (>1) other related values, even those that are normally second-order relations. Eg. a baseball with extremized speed also extremizes the quantity of local radiation. So extremes often don’t stay localized to their domain.
I think that’s mostly a really good summary. The major distinction I would try to make is that agenthood is primarily a way to actualize power, rather than a source of it.
If you had an agent that wasn’t strongly optimized in any sense other than it was an agent, in that it had goals and wanted to solve them, that wouldn’t make it dangerous, any more than your dog is dangerous for being an agent. Whereas the converse, if you have something that’s strongly optimised in some more generic sense, but wasn’t an agent, this still puts you extremely close to a lot of danger. The article was trying to emphasize this by pointing to the most reductive form of agenthood I could see, in that none of the intrinsic power of the resulting system could reasonably be attributed to any intrinsic smartness of the agent component, even if the system was an agent that was powerful.
I think there’s some additional nuance here that makes a difference.
Most extremely optimized outputs are benign. Like suppose I’m trying to measure the length of a pieces of wood, at an extremely high level of precision. The capabilities needed to get an atomic-level measurement might be dangerous, but the actual output would harmeless, a number on paper.
It’s not that optimized outputs are dangerous, it’s that optimization is dangerous.
This is an unnatural use of “most”. Extremely optimized outputs will tend to be dangerous, on their own, even if they are actually just optimized “for something”. It seems more natural to say that for most features such that you know how to ask for something to be very optimized on that feature, something extremely optimized for that feature will be dangerous.
I agree with that example but I don’t see the distinction the same way. An optimised measure of that sort is safe primarily because it is within an extremely limited domain without much freedom for there to be a lot of optimality, in some informal sense.
Contrast, capabilities for getting very precise measures of that sort exist in the space of things-you-can-do-in-reality, so there is lots of room for such capabilities to be both benign (an extremely accurate laboratory machine) or dangerous (the shortest program that if executed would have that measurement performed). I wouldn’t say that there is an important distinction in it involving an optimizing action—an optimiser—but that the domain is large enough such optimal results within it are dangerous in general.
For instance, the process of optimizing a simple value within a simple domain can be as simple as Newton–Raphson, and that’s safe because the domain is sufficiently restricted. Contrast, a sufficiently optimised book ends the world, a widget sufficiently optimised for manufacturability ends the world, a baseball sufficiently optimised for speed ends the world.
While I agree that there are many targets that are harmless if optimised for, like you could have a dumpling optimised to be exactly 1kg in mass, I still see a lot of these outputs being intrinsically dangerous. To me, the key danger of optimal strategies is that they are optimal within a sufficiently broad domain, and the key danger of optimisers is that they produce a lot of optimised outputs.
Ok. Let me try to draw out why optimized stuff is inherently dangerous. This might be a bit meandering.
I think it’s because humans live in an only mildly optimized world. There’s this huge, high dimensional space of the “the way the world can be” with a bunch of parameters including, the force of gravity, the percentage of oxygen in the air, the number of rabbits, the amount of sunlight that reaches the surface of the earth, the virulence of various viruses, etc. Human life is fragile; it depends on the remaining within a relatively narrow “goldilocks” band for a huge number of those parameters.
Optimizing hard on anything, unless it is specifically for maintaining the those goldilocks conditions, implies extremizing. Even the optimization is not itself for an extreme value (eg one could be trying to maintain the oxygen percentage in the air at exactly 21.45600 percent), hitting a value that precisely means doing something substantially different than what the world would otherwise be doing. Hitting a value that precisely means that you have to extremize on some parameter. To get a highly optimized value you have to steer reality into a corner case that is far outside the bounds of the current distribution of outcomes on planet earth.
Indeed, if it isn’t far outside the current distribution of outcomes on planet earth, that suggests that there’s a lot of room left for further optimization. This is because the world is not already optimized on that given parameter, and because the world is so high dimensional it would be staggeringly, exponentially, unlikely that the precisely optimized outcome was within the bounds of the current distribution of outcomes. By default, you should expect that perfect optimization on any given parameter would be a random draw from the state space of all possible ways that earth can be. So if the world looks pretty normal, you haven’t optimized very hard for anything.
That sounds right to me. A key addendum might be that extremizing one value will often extremize (>1) other related values, even those that are normally second-order relations. Eg. a baseball with extremized speed also extremizes the quantity of local radiation. So extremes often don’t stay localized to their domain.