Evolution Solved Alignment (what sharp left turn?)

Some people like to use the evolution of homo sapiens as an argument by analogy concerning the apparent difficulty of aligning powerful optimization processes:

And in the same stroke that its capabilities leap forward, its alignment properties are revealed to be shallow, and to fail to generalize.

The much confused framing of this analogy has lead to a protracted debate about it’s applicability.

The core issue is just misaligned mesaoptimization. We have a powerful optimization process optimizing world stuff according to some utility function. The concern is that a sufficiently powerful optimization process will (inevitably?) lead to internal takeover by a selfish mesa-optimizer unaligned to the outer utility function, resulting in a bad (low or zero utility) outcome.

In the AGI scenario, the outer utility function is CEV, or external human empowerment, or whatever (insert placeholder, not actually relevant). The optimization process is the greater tech economy and AI/​ML research industry. The fear is that this optimization process, even if outer aligned, could result in AGI systems unaligned to the outer objective (humanity’s goals), leading to doom (humanity’s extinction). Success here would be largenum utility, and doom/​extinction is 0. So the claim is mesaoptimization inner alignment failure leads to 0 utility outcomes: complete failure.

For the evolution of human intelligence, the optimizer is just evolution: biological natural selection. The utility function is something like fitness: ex gene replication count (of the human defining genes)[1]. And by any reasonable measure, it is obvious that humans are enormously successful. If we normalize so that a utility score of 1 represents a mild success—the expectation of a typical draw of a great apes species, then humans’ score is >4 OOM larger, completely off the charts.[2]

So evolution of human intelligence is an interesting example: of alignment success. The powerful runaway recursive criticality that everyone feared actually resulted in an enormous anomalously high positive utility return, at least in this historical example. Human success, if translated into the AGI scenario, corresponds to the positive singularity of our wildest dreams.

Did it have to turn out this way? No!

Due to observational selection effects, we naturally wouldn’t be here if mesaoptimization failure during brain evolution was too common across the multiverse.[3] But we could have found ourselves in a world with many archaeological examples of species achieving human general technocultural intelligence and then going extinct—not due to AGI of course, but simply due to becoming too intelligent to reproduce. But we don’t, as far as I know.

And that is exactly what’d we necessarily expect to see in the historical record if mesaoptimization inner misalignment was a common failure mode: intelligent dinosaurs that suddenly went extinct, ruins of proto pachyderm cities, the traces of long forgotten underwater cetacean atlantis, etc.

So evolution solved alignment in the only sense that actually matters: according to its own utility function, the evolution of human intelligence enormously increased utility, rather than imploding it to 0.

So back to the analogy—where did it go wrong?

The central analogy here is that optimizing apes for inclusive genetic fitness (IGF) doesn’t make the resulting humans optimize mentally for IGF. Like, sure, the apes are eating because they have a hunger instinct and having sex because it feels good—but it’s not like they could be eating/​fornicating due to explicit reasoning about how those activities lead to more IGF. They can’t yet perform the sort of abstract reasoning that would correctly justify those actions in terms of IGF. And then, when they start to generalize well in the way of humans, they predictably don’t suddenly start eating/​fornicating because of abstract reasoning about IGF, even though they now could. Instead, they invent condoms[4],

Nate’s critique is an example of the naive engineer fallacy. Nate is critiquing a specific detail of evolution’s solution, but failing to notice that all that matters is the score, and humans are near an all time high score success[5]. Evolution didn’t make humans explicitly just optimize mentally for IGF because that—by itself—probably would have been a stupid failure of a design, and evolution is a superhuman optimizer whose designs are subtle, mysterious, and often beyond human comprehension.

Instead evolution created a solution with many layers and components- a defense in depth against mesaoptimization misalignment. And even though all of those components will inevitably fail in many individuals—even most! - that is completely irrelevant at the species level, and in fact just part of the design of how evolution explores the state space.

And finally, if all else fails, evolution did in fact find some weird way to create humans who rather obviously consciously optimize for IGF! And so if the other mechanisms had all started to fail too frequently, the genes responsible for that phenotype would inevitably become more common.

On further reflection, much of premodern history already does look like at least some humans consciously optimizing for something like IGF: after all, “be fruitful and multiply” is hardly a new concept. What do you think was really driving the nobility of old, with all their talk of bloodlines and legacies? There already is some deeper drive to procreate at work in our psyche (to varying degrees); we are clearly not all just mere byproducts of pleasure’s pursuit[6].

The central takeway is that evolution adapted the brain’s alignment mechanisms/​protections in tandem with our new mental capabilities, such that the sharp left turn led to an enormous runaway alignment success.


  1. ↩︎

    Nitpick arguments about how you define this specifically are irrelevant and uninteresting. Homo sapiens is enormously successful! If you really think you know the true utility function of evolution, and humans are a failure according to that metric, you have simply deluded yourself. My argument here does not depend on the details of the evolutionary utility function.

  2. ↩︎

    We are unarguably the most successful recent species, probably the most successful mammal species ever, and all that despite arriving in a geological blink of an eye. The dU/​dt for homo sapiens is probably the highest ever, so we are tracking to be the most successful species ever, if current trends continue (which of course is another story).

  3. ↩︎

    Full consideration of the observational selection effects also leads to an argument for alignment success via the simulation argument, as future alignment success probably creates many historical sims, whereas failures do not.

  4. ↩︎

    Condom analogs are at least 5000 years old; there is ample evidence contraception was understand and used in various ancient civilizations, and many premodern tribal people understood herbal methods, so humans have probably had this knowledge since the beginning, in one form or another. (Although memetic evolution would naturally apply optimization pressure against wide usage)

  5. ↩︎

    Be careful anytime you find yourself defining peak evolutionary fitness as anything other than the species currently smiling from on top a giant heap of utility.

  6. ↩︎

    I say this as I am about to have a child myself, planned for reasons I cannot fully yet articulate.