Note that this doesn’t undermine the post, because it’s thesis only gets stronger if we assume that more alignment attempts like romantic love or altruism generalized, because that could well imply that control or alignment is actually really easy to generalize, even when the intelligence of the aligner is way less than the alignee.
This suggests that scalable oversight is either a non-problem, or a problem only at ridiculous levels of disparity, and suggests that alignment does generalize quite far.
This, as well as my belief that current alignment designers have far more tools in their alignment toolkit than evolution had makes me extremely optimistic that alignment is likely to be solved before dangerous AI.
Those are motivations but they don’t (mostly) have the type signature of “goals” but rather the type signature of “drives”.
I pursue interesting stuff because I’m curious. That doesn’t require me to even have a concept of curiosity—it could in principle be steering me without my awareness. My planning process might use curiosity, but it isn’t aligned with curiosity, in the sense that we make plans that maximize our curiosity (usually). We just do what’s interesting.
In contrast, social status is a concept that humans learn, and it does look like the planning process is aligned with the status concept, in that (some) humans habitually make plans that are relatively well described as status maximizing.
Or another way of saying it. Our status motivations are not straightforward adaption execution. It’s recruiting the general intelligence in service of this concept, in much the way that we would want an AGI to be aligned with a concept like the Good or corrigibility.
Romantic love, again people act on (including using their general intelligence), but their planning process is not in general aligned with maximization of romantic love. (Indeed, I’m editorializing human nature here, but it looks to me like romantic love is mostly a strategy to get other goals).
Altruism—It’s debatable whether most instances of maximizing altruistic impact are better described as status maximization. Regardless, this is an overriding strategic goal, recruiting general intelligence, for a very small fraction of humans.
Why do you highlight status among bazilliion other things that generalized too, like romantic love, curiosity, altruism?
…and eating, and breastfeeding…
Note that this doesn’t undermine the post, because it’s thesis only gets stronger if we assume that more alignment attempts like romantic love or altruism generalized, because that could well imply that control or alignment is actually really easy to generalize, even when the intelligence of the aligner is way less than the alignee.
This suggests that scalable oversight is either a non-problem, or a problem only at ridiculous levels of disparity, and suggests that alignment does generalize quite far.
This, as well as my belief that current alignment designers have far more tools in their alignment toolkit than evolution had makes me extremely optimistic that alignment is likely to be solved before dangerous AI.
Those are motivations but they don’t (mostly) have the type signature of “goals” but rather the type signature of “drives”.
I pursue interesting stuff because I’m curious. That doesn’t require me to even have a concept of curiosity—it could in principle be steering me without my awareness. My planning process might use curiosity, but it isn’t aligned with curiosity, in the sense that we make plans that maximize our curiosity (usually). We just do what’s interesting.
In contrast, social status is a concept that humans learn, and it does look like the planning process is aligned with the status concept, in that (some) humans habitually make plans that are relatively well described as status maximizing.
Or another way of saying it. Our status motivations are not straightforward adaption execution. It’s recruiting the general intelligence in service of this concept, in much the way that we would want an AGI to be aligned with a concept like the Good or corrigibility.
Romantic love, again people act on (including using their general intelligence), but their planning process is not in general aligned with maximization of romantic love. (Indeed, I’m editorializing human nature here, but it looks to me like romantic love is mostly a strategy to get other goals).
Altruism—It’s debatable whether most instances of maximizing altruistic impact are better described as status maximization. Regardless, this is an overriding strategic goal, recruiting general intelligence, for a very small fraction of humans.