But we’re also getting to the point of being powerful enough to kill every mosquito. And we may just do that. We might do it even in the world where we were able to trade with them. The main reasons not to are that we have some level of empathy/love towards. nature and animals, something ASI won’t have towards us.
Moreover, it’s unclear whether or not humans will be able to coordinate with ASI. Even stupid animals can coordinate with one another on the timescale of evolution, but it’s quicker for us to kill them. It would probably be quicker and cheaper for an ASI to just kill us (or deprive us of resources, which we also do to animals all the time) than to try and get us to perform cognitively useful labour.
Eradicating mosquitoes would be incredibly difficult from a logistical standpoint. Even if we could accomplish this goal, doing so would cause large harm to the environment, which humans would prefer to avoid. By contrast, providing a steady stored supply of blood to feed all the mosquitoes that would have otherwise fed on humans would be relatively easy for humans to accomplish. Note that, for most mosquito species, we could use blood from domesticated mammals like cattle or pigs, not just human blood.
When deciding whether to take an action, a rational agent does not merely consider whether that action would achieve their goal. Instead, they identify which action would achieve their desired outcome at the lowest cost. In this case, trading blood with mosquitoes would be cheaper than attempting to eradicate them, even if we assigned zero value to mosquito welfare. The reason we do not currently trade with mosquitoes is not that eradication would be cheaper. Rather, it is because trade is not feasible.
You might argue that future technological progress will make eradication the cheaper option. However, to make this argument, you would need to explain why technological progress will reduce the cost of eradication without simultaneously reducing the cost of producing stored blood at a comparable rate. If both technologies advance together, trade would remain relatively cheaper than extermination. The key question is not whether an action is possible. The key question is which strategy achieves our goal at the lowest relative cost.
If you predict that eradication will become far cheaper while trade will not become proportionally cheaper, thereby making eradication the rational choice, then I think you’d simply be making a speculative assertion. Unless it were backed up by something rigorous, this prediction would not constitute meaningful empirical evidence about how trade functions in the real world.
I was approaching the mosquito analogy on its own terms but at this level of granularity it does just break down.
Firstly, mosquitos directly use human bodies as resources (as well as various natural environments which we voluntarily choose to keep around) while we can’t suck nutrients out of an ASI.
Secondly, mosquitos cause harm to humans and the proposed trade involves them stopping harming us which is different to proposed trades with ASI.
An ASI would experience some cost to keeping us around (sunlight for plants, space, temperature regulation) which needs to be balanced by benefits we can give it. If it can use the space and energy we take up to have more GPUs (or whatever future chip it runs on) and those GPUs give it more value than we do, it would want to kill us.
If you want arguments as to whether it would be more costly to kill humans vs keep us around, just look at the amount of resources and space humans currently take up on the planet. This is OOMs more resources than an ASI would need to kill us, especially once you consider it only needs to pay the cost to kill us once, then it gets the benefits of that extra energy essentially forever. If you don’t think an ASI could definitely make a profit from getting us out of the picture, then we just have extremely different pictures of the world.
I was approaching the mosquito analogy on its own terms but at this level of granularity it does just break down.
My goal in my original comment was narrow: to demonstrate that a commonly held model of trade is incorrect. This naive model claims (roughly): “Entities do not trade with each other when one party is vastly more powerful than the other. Instead, in such cases, the more powerful entity rationally wipes out the weaker one.” This model fails to accurately describe the real world. Despite being false, this model appears popular, as I have repeatedly encountered people asserting it, or something like it, including in the post I was replying to.
I have some interest in discussing how this analysis applies to future trade between humans and AIs. However, that discussion would require extensive additional explanation, as I operate from very different background assumptions than most people on LessWrong regarding what constraints future AIs will face and what forms they will take. I even question whether the idea of “an ASI” is a meaningful concept. Without establishing this shared context first, any attempt to discuss whether humans will trade with AIs would likely derail the narrow point I was trying to make.
If you don’t think an ASI could definitely make a profit from getting us out of the picture, then we just have extremely different pictures of the world.
Indeed, we likely do have extremely different pictures of the world.
Fair question. It might have been better to phrase this as “Something ASI won’t have towards us without much more effort and knowledge than we are currently putting into making ASI be friendly.”
The answer is *gestures vaguely at entire history of alignment arguments* that I agree with the Yudkowsky position. To roughly summarise:
Empathy is a very specific way of relating to other minds, and which isn’t even obviously well-defined when the two minds are very different; e.g. what does it mean to have empathy towards an ant, or a colony of ants? And humans and ASI will be very different minds. To make an ASI view us with something like empathy, we need to specify the target of “empathy, and also generalise it in this correct way when you’re very different from us.”
To correctly specify this target within the space of all possible ways that an ASI could view us requires putting a lot of bits of information into the ASI. Some bits of that information (for example, the bit of information which distinguishes “Do X” from “Make the humans think you’ve done X”, but there are others) are especially hard to come by and especially hard to put into the ASI.
To make the task even harder, we’re doing things using gradient descent, where there isn’t an obvious, predictable-in-advance relation between the information we feed into an AI and the things it ends up doing.
Putting it all together, I think it’s very likely we’ll fail at the task of making an ASI have empathy towards us.
(There’s arguments for “empathy by default” or at the very least “empathy is simple” but I think these don’t really work, see above how “empathy” is not obviously easy to define across wildly different minds. Maaaaaybe there’s some correspondence between certain types of self-reflective mind which can make things work but I’m very confused about the nature of self-reflection, so my prior is that it’s as doomed as any other approach i.e. very doomed)
(“Love” is even harder to specify than empathy, so I think that’s even more doomed)
You didn’t actually answer the question posed, which was “Why couldn’t humans and ASI have peaceful trades even in the absence of empathy/love/alignment to us rather than killing us?” and not “Why would we fail at making AIs that are aligned/have empathy for us?”
I think this position made a lot of sense a few years ago when we had no idea how a superintelligence might be built. The LLM paradigm has made me more hopeful about this. We’re not doing a random draw from the space of all possible intelligences, where you would expect to find eldritch alien weirdness. LLMs are trained to imitate humans; that’s their nature. I’ve been very positively surprised by the amount of empathy and emotional intelligence LLMs like Claude display; I think alignment research is on the right track.
What happens when you take an LLM that successfully models empathy and emotional intelligence and you dramatically scale up its IQ? Impossible to be certain, but I don’t think it’s obvious that it loses all its empathy and decides to only care about itself. As humans get smarter, do they become more amoral and uncaring? Quite the opposite: people like Einstein and Feynman are some of our greatest heroes. An ASI far smarter than Einstein might be more empathetic than him, rather than less.
To correctly specify [empathy] within the space of all possible ways that an ASI could view us requires putting a lot of bits of information into the ASI.
To me this reads like “we need to design a schematic for empathy that we can program into the ASI.” That’s not how we train LLMs, though. Instead we show them lots of examples of the behavior we want them to exhibit. As you pointed out, empathy and love are hard to specify and define. But that’s OK—LLMs don’t primarily work off of specifications and definitions. They just want lots of examples.
This isn’t to say the problem is solved. Alignment research still has more to do. I agree that we should put more resources into it. But I think there’s reason to be hopeful about the current approach.
But we’re also getting to the point of being powerful enough to kill every mosquito. And we may just do that. We might do it even in the world where we were able to trade with them. The main reasons not to are that we have some level of empathy/love towards. nature and animals, something ASI won’t have towards us.
Moreover, it’s unclear whether or not humans will be able to coordinate with ASI. Even stupid animals can coordinate with one another on the timescale of evolution, but it’s quicker for us to kill them. It would probably be quicker and cheaper for an ASI to just kill us (or deprive us of resources, which we also do to animals all the time) than to try and get us to perform cognitively useful labour.
Eradicating mosquitoes would be incredibly difficult from a logistical standpoint. Even if we could accomplish this goal, doing so would cause large harm to the environment, which humans would prefer to avoid. By contrast, providing a steady stored supply of blood to feed all the mosquitoes that would have otherwise fed on humans would be relatively easy for humans to accomplish. Note that, for most mosquito species, we could use blood from domesticated mammals like cattle or pigs, not just human blood.
When deciding whether to take an action, a rational agent does not merely consider whether that action would achieve their goal. Instead, they identify which action would achieve their desired outcome at the lowest cost. In this case, trading blood with mosquitoes would be cheaper than attempting to eradicate them, even if we assigned zero value to mosquito welfare. The reason we do not currently trade with mosquitoes is not that eradication would be cheaper. Rather, it is because trade is not feasible.
You might argue that future technological progress will make eradication the cheaper option. However, to make this argument, you would need to explain why technological progress will reduce the cost of eradication without simultaneously reducing the cost of producing stored blood at a comparable rate. If both technologies advance together, trade would remain relatively cheaper than extermination. The key question is not whether an action is possible. The key question is which strategy achieves our goal at the lowest relative cost.
If you predict that eradication will become far cheaper while trade will not become proportionally cheaper, thereby making eradication the rational choice, then I think you’d simply be making a speculative assertion. Unless it were backed up by something rigorous, this prediction would not constitute meaningful empirical evidence about how trade functions in the real world.
I was approaching the mosquito analogy on its own terms but at this level of granularity it does just break down.
Firstly, mosquitos directly use human bodies as resources (as well as various natural environments which we voluntarily choose to keep around) while we can’t suck nutrients out of an ASI.
Secondly, mosquitos cause harm to humans and the proposed trade involves them stopping harming us which is different to proposed trades with ASI.
An ASI would experience some cost to keeping us around (sunlight for plants, space, temperature regulation) which needs to be balanced by benefits we can give it. If it can use the space and energy we take up to have more GPUs (or whatever future chip it runs on) and those GPUs give it more value than we do, it would want to kill us.
If you want arguments as to whether it would be more costly to kill humans vs keep us around, just look at the amount of resources and space humans currently take up on the planet. This is OOMs more resources than an ASI would need to kill us, especially once you consider it only needs to pay the cost to kill us once, then it gets the benefits of that extra energy essentially forever. If you don’t think an ASI could definitely make a profit from getting us out of the picture, then we just have extremely different pictures of the world.
My goal in my original comment was narrow: to demonstrate that a commonly held model of trade is incorrect. This naive model claims (roughly): “Entities do not trade with each other when one party is vastly more powerful than the other. Instead, in such cases, the more powerful entity rationally wipes out the weaker one.” This model fails to accurately describe the real world. Despite being false, this model appears popular, as I have repeatedly encountered people asserting it, or something like it, including in the post I was replying to.
I have some interest in discussing how this analysis applies to future trade between humans and AIs. However, that discussion would require extensive additional explanation, as I operate from very different background assumptions than most people on LessWrong regarding what constraints future AIs will face and what forms they will take. I even question whether the idea of “an ASI” is a meaningful concept. Without establishing this shared context first, any attempt to discuss whether humans will trade with AIs would likely derail the narrow point I was trying to make.
Indeed, we likely do have extremely different pictures of the world.
Why are you so confident about that?
Fair question. It might have been better to phrase this as “Something ASI won’t have towards us without much more effort and knowledge than we are currently putting into making ASI be friendly.”
The answer is *gestures vaguely at entire history of alignment arguments* that I agree with the Yudkowsky position. To roughly summarise:
Empathy is a very specific way of relating to other minds, and which isn’t even obviously well-defined when the two minds are very different; e.g. what does it mean to have empathy towards an ant, or a colony of ants? And humans and ASI will be very different minds. To make an ASI view us with something like empathy, we need to specify the target of “empathy, and also generalise it in this correct way when you’re very different from us.”
To correctly specify this target within the space of all possible ways that an ASI could view us requires putting a lot of bits of information into the ASI. Some bits of that information (for example, the bit of information which distinguishes “Do X” from “Make the humans think you’ve done X”, but there are others) are especially hard to come by and especially hard to put into the ASI.
To make the task even harder, we’re doing things using gradient descent, where there isn’t an obvious, predictable-in-advance relation between the information we feed into an AI and the things it ends up doing.
Putting it all together, I think it’s very likely we’ll fail at the task of making an ASI have empathy towards us.
(There’s arguments for “empathy by default” or at the very least “empathy is simple” but I think these don’t really work, see above how “empathy” is not obviously easy to define across wildly different minds. Maaaaaybe there’s some correspondence between certain types of self-reflective mind which can make things work but I’m very confused about the nature of self-reflection, so my prior is that it’s as doomed as any other approach i.e. very doomed)
(“Love” is even harder to specify than empathy, so I think that’s even more doomed)
You didn’t actually answer the question posed, which was “Why couldn’t humans and ASI have peaceful trades even in the absence of empathy/love/alignment to us rather than killing us?” and not “Why would we fail at making AIs that are aligned/have empathy for us?”
I think this position made a lot of sense a few years ago when we had no idea how a superintelligence might be built. The LLM paradigm has made me more hopeful about this. We’re not doing a random draw from the space of all possible intelligences, where you would expect to find eldritch alien weirdness. LLMs are trained to imitate humans; that’s their nature. I’ve been very positively surprised by the amount of empathy and emotional intelligence LLMs like Claude display; I think alignment research is on the right track.
What happens when you take an LLM that successfully models empathy and emotional intelligence and you dramatically scale up its IQ? Impossible to be certain, but I don’t think it’s obvious that it loses all its empathy and decides to only care about itself. As humans get smarter, do they become more amoral and uncaring? Quite the opposite: people like Einstein and Feynman are some of our greatest heroes. An ASI far smarter than Einstein might be more empathetic than him, rather than less.
To me this reads like “we need to design a schematic for empathy that we can program into the ASI.” That’s not how we train LLMs, though. Instead we show them lots of examples of the behavior we want them to exhibit. As you pointed out, empathy and love are hard to specify and define. But that’s OK—LLMs don’t primarily work off of specifications and definitions. They just want lots of examples.
This isn’t to say the problem is solved. Alignment research still has more to do. I agree that we should put more resources into it. But I think there’s reason to be hopeful about the current approach.