Evolution did a surprising good job at aligning humans...to social status
[This is post is a slightly edited tangent from my dialogue with John Wentworth here. I think the point is sufficiently interesting and important that I wanted to make it as a top level post, and not leave it buried in that dialog on mostly another topic.]
The conventional story is that natural selection failed extremely badly at aligning humans. One fact about humans that casts doubt on this story is that natural selection got the concept of “social status” into us, and it seems to have done a shockingly good job of aligning (many) humans to that concept.
Evolution somehow gave humans some kind of inductive bias (or something) such that our brains are reliably able to learn what it is to be “high status”, even though the concrete markers for status are as varied as human cultures.
And it further, it successfully hooked up the motivation and planning systems to that “status” concept. Modern humans not only take actions that play for status in their local social environment, they sometimes successfully navigate (multi-decade) career trajectories and life paths, completely foreign to the ancestral environment, in order to become prestigious by the standards of the local culture.
And this is one of the major drivers of human behavior! As Robin Hanson argues, a huge portion of our activity is motivated by status-seeking and status-affiliation.
This is really impressive to me. It seems like natural selection didn’t do so hot at aligning humans to inclusive genetic fitness. But it did kind of shockingly well aligning humans to the goal of seeking, even maximizing, status, all things considered.[1]
This seems like good news about alignment. The common story that condoms prove that evolution basically failed at alignment—that as soon as we developed the technological capability to route around the evolution’s “goal” of maximizing the frequency of your alleles in the next generation, to attain only the proxy measure of sex, we did that—doesn’t seem to apply to our status drive.
It looks to me like “status” generalized really well across the distributional shift of technological civilization. Humans still recognize it and optimize for it, regardless of whether the status markers are money or technical acumen or h-factor or military success.[2]
This makes me way less confident about the standard “evolution failed at alignment” story.
- ^
I guess that we can infer from this that having an intuitive “status” concept was much more strongly instrumental for attaining high inclusive genetic fitness in the ancestral environment than having an intuitive concept of “inclusive genetic fitness” itself. A human-level status seeking agent with a sex drive does better by the standard of inclusive genetic fitness than a human-level agent IGF maximizer.
The other hypothesis, of course, is that the “status” concept was easier to encode in an human than the “inclusive genetic fitness concept, for some reason. - ^
I’m interested if others think that this is an illusion, if it only looks like the status target generalized, because I’m drawing the target around where the arrow landed. That is, what we think of as “social status” is exactly the parts of social status in the ancestral environment that did generalize to across cultures.
I think this was a badly written post, and it appropriately got a lot of pushback.
Let my briefly try again: clarifying what I was trying to communicate.
Evolution did not succeed at aligning humans to the sole outer objective function of inclusive genetic fitness.
There are multiple possible reasons why evolution didn’t succeed, and presumably multiple stacked problems.
But one thing that I’ve sometimes heard claimed or implied is that evolution couldn’t possibly have succeeded at instilling inclusive genetic fitness as a goal, because individuals humans don’t have inclusive genetic fitness as a concept.
Evolution could only have approximated that goal with a godshatter of adaptions to prefer various proxies to inclusive genetic fitness, where each proxy has to be close to the level of sensory-evidence. eg. Evolution can shape humans to like the taste of sugar, or the feeling of orgasm, or to prefer sexy-looking people, or even to love their cousins (less than their brothers but more than their more distant relatives). But, it’s claimed, evolution can’t shape humans to desire their own inclusive genetic fitness directly, because it can’t instill goals that aren’t at the at the level of sensory-evidence.
And so it’s not surprising that the proxies would completely deviate from the “intended” target, as soon as conditions changed.
Supposedly, evolution can’t produce an inclusive genetic fitness maximizer, not just that it happened not to.
However, this story is undercut by an example in which evolution was able to make an abstract concept (not just a bunch of sensory correlates of that concept in the ancestral environment) an optimization target that the human will apply it’s full creative intelligence to achieving.
Social status seems like one such an example.
It’s an abstract concept that many humans have as an actual long term optimization target (they’ll implement plans over years to increase their prestige, they don’t just have a myopic status-grabbing heuristic).
And humans seem to have have a desire for social status itself, or at least not just for a collection of sensory-evidence-level proxy measures that correlated in the ancestral environment, and which break down entirely when the environment changes.
(If you doubt this, compare status-seeking behavior to male sexual preferences. In the latter case, it looks much more like evolution did instill a bunch of specific desires for close-to-sensory-level features that were proxies for fertility and health: big breasts, long legs, unwrinkled skin. Heterosexual men find those features desirable, and finding out that a particular sexy woman is actually infertile doesn’t change the desirability.
But in the case of status-seeking, I can’t write a list of collection of near-sensory-level features that that people desire, independently of actual social prestige. The markers of status are enormously varied, by culture and subculture, and constantly changing. I bet that Steve Byrnes can point out a bunch of specific sensory evidence that the brain uses to construct the status concept (stuff like gaze length of conspecifics or something?), but the human motivation system isn’t just optimizing for those physical proxy measures, or people wouldn’t be motivated to get prestige on internet forums where people have reputations but never see each other’s faces.)
This is suggestive that at least in some circumstances, evolution actually can shape an organism to have at least a specific abstract concept as a long term optimization target, and recruit the organism’s own intelligence to identifying how how that concept applies in many varied environments.
This is not to say that evolution succeeded at aligning humans. It didn’t. This also doesn’t imply that alignment is easy. Maybe it is, maybe it isn’t, but this argument doesn’t establish that.
But it is to say that the specific story for why evolution failed at aligning humans to inclusive genetic fitness that I believed in say 2020, is incorrect, or at least incomplete.
If it helps, my take is in Neuroscience of human social instincts: a sketch and its follow-up Social drives 2: “Approval Reward”, from norm-enforcement to status-seeking.
Sensory evidence is definitely involved, but kinda indirectly. As I wrote in the latter: “The central situation where Approval Reward fires in my brain, is a situation where someone else (especially one of my friends or idols) feels a positive or negative feeling as they think about and interact with me.” I think it has to start with in-person interactions with other humans (and associated sensory evidence), but then there’s “generalization upstream of reward signals” such that rewards also get triggered in semantically similar situations, e.g. online interactions. And it’s intimately related to the fact that there’s a semantic overlap between “I am happy” and “you are happy”, via both involving a “happy” concept. It’s a trick that works for certain social things but can’t be applied to arbitrary concepts like inclusive genetic fitness.
I stand by my nitpick in other comment that you’re not using the word “concept” quite right. Or, hmm, maybe we can distinguish (A) “concept” = a latent variable in a specific human brain’s world-model, versus (B) “concept” = some platonic Natural Abstraction™ or whatever, whether or not any human is actually tracking it. Maybe I was confused because you’re using the (B) sense but I (mis)read it as the (A) sense? In AI alignment, we care especially about getting a concept in the (A) sense to be explicitly desired because that’s likelier to generalize out-of-distribution, e.g. via out-of-the-box plans. (Arguably.) There are indeed situations where the desires bestowed by Approval Reward come apart from social status as normally understood (cf. this section, plus the possibility that we’ll all get addicted to sycophantic digital friends upon future technological changes), and I wonder whether the whole question of “is Approval Reward exactly creating social status desire, or something that overlaps it but comes apart out-of-distribution?” might be a bit ill-defined via “painting the target around the arrow” in how we think about what social status even means.
(This is a narrow reply, not taking a stand on your larger points, and I wrote it quickly, sorry for errors.)
I think it’s true and valuable to say:
There is a natural abstraction of “human status” that covers play, work, love, friends, community and more.
Humans seek “human status” as an intrinsic motivation/reward/value, such that humans will seek status even at the cost of other goals and of generalized empowerment.
In humans, “human status” often functions as a long-term target (I think more satisficing than optimizing).
This proves that evolution is capable of creating intelligent agents that target abstract concepts as goals in ways that survive a massive training/deployment shift.
This is a small piece of evidence against one of many reasons that we are all going to die.
The top-voted objection is that this abstraction includes “human status” that commenters see as fake: internet forums, obscure hobbies, video games, and music fandom. I don’t find this compelling, it’s just pointing out that the drive is for “human status”, not “real world status”, “high class status”, “rationalist status”, or some other thing.
My best objection is that the order is reversed. Humans have genes that cause us to have various behaviors and seek various things, and then the natural abstraction of “human status” is something that we use to learn and describe what humans end up doing. If humans had ended up doing a slightly different thing, based on different genetics, and that resulted in different behaviors, then those behaviors would be what we called “human status”. There is another natural abstraction concept of “generic status” that abstracts all status-like concepts in all animals on Earth, and humans don’t target that. When I learned that grooming is a marker of status in some primates, that didn’t cause me to spend more time seeking opportunities to be groomed.
It would be more accurate to say that the natural abstraction of “human status” co-evolved with the genetics of humans, rather than them happening in either order. We see this with the natural abstraction of “Claude” co-evolving with the weights that make up various Claude models. I gave this +4 in the review. It would have been extremely valuable to post this twenty years ago, but today it seems obvious that we can grow artificial intelligent agents that target natural abstractions in the environment, including the Anthropic trick of making and targeting a natural abstraction at the same time.
Seems like you’re mushing together several loosely related things, including what we might call model-based motivation, explicit long-term planning, unified purpose, and precisely targeted goals.
Model-based motivation: being motivated to do something in a way that relies on your internal models of the world, not just on direct sensory rewards.
Explicit long-term planning: being aware of your goal, explicitly planning ways to achieve it, following those plans including over periods of months or years.
Unified purpose: a person’s motivations and actions in a domain fitting together coherently to work towards a single purpose, even across contexts.
Precisely targeted goals: having the goal precisely match something that can be specified on other grounds besides what we can empirically observe that people aim for (like “inclusive genetic fitness” which is picked out by theory).
The godshatter post is mainly about the last two—people have a collection of fragmented motivations which helped towards the selected-for purpose in the contexts where we evolved. Your argument here is mainly about the first two.
I think that the first two are pretty common, and are found in human romantic/reproductive goals, e.g. long-term planning around having kids, or motivations to improve ones appearance in ways that you expect potential partners to find attractive. I think that the last two are pretty rare, including for status—most people have a collection of somewhat-status-related motivations (though perhaps a small fraction of people (sociopaths?) have status as a more unified goal), and I haven’t seen anyone specify the “status” target well enough to even check if people’s motivations aim at that precise target.
Curious to see what Steven Byrnes would actually say here. I fed your comment and Byrnes’ two posts on social status to Opus 4.5, it thought for 3m 40s (!) and ended up arguing he’d disagree with your social status example:
(mods, let me know if this is slop and I’ll take it down)