Natural selection is often charged with having goals for humanity, and humanity is often charged with falling down on them. The big accusation, I think, is of sub-maximal procreation. If we cared at all about the genetic proliferation that natural selection wanted for us, then this time of riches would be a time of fifty-child families, not one of coddled dogs and state-of-the-art sitting rooms.
But (the story goes) our failure is excusable, because instead of a deep-seated loyalty to genetic fitness, natural selection merely fitted humans out with a system of suggestive urges: hungers, fears, loves, lusts. Which all worked well together to bring about children in the prehistoric years of our forebears, but no more. In part because all sorts of things are different, and in part because we specifically made things different in that way on purpose: bringing about children gets in the way of the further satisfaction of those urges, so we avoid it (the story goes).
This is generally floated as an illustrative warning about artificial intelligence. The moral is that if you make a system by first making multitudinous random systems and then systematically destroying all the ones that don’t do the thing you want, then the system you are left with might only do what you want while current circumstances persist, rather than being endowed with a consistent desire for the thing you actually had in mind.
Observing acquaintences dispute this point recently, it struck me that humans are actually weirdly aligned with natural selection, more than I could easily account for.
Natural selection, in its broadest, truest, (most idiolectic?) sense, doesn’t care about genes. Genes are a nice substrate on which natural selection famously makes particularly pretty patterns by driving a sensical evolution of lifeforms through interesting intricacies. But natural selection’s real love is existence. Natural selection just favors things that tend to exist. Things that start existing: great. Things that, having started existing, survive: amazing. Things that, while surviving, cause many copies of themselves to come into being: especial favorites of evolution, as long as there’s a path to the first ones coming into being.
So natural selection likes genes that promote procreation and survival, but also likes elements that appear and don’t dissolve, ideas that come to mind and stay there, tools that are conceivable and copyable, shapes that result from myriad physical situations, rocks at the bottoms of mountains. Maybe this isn’t the dictionary definition of natural selection, but it is the real force in the world, of which natural selection of reproducing and surviving genetic clusters is one facet. Generalized natural selection—the thing that created us—says that the things that you see in the world are those things that exist best in the world.
So what did natural selection want for us? What were we selected for? Existence.
And while we might not proliferate our genes spectacularly well in particular, I do think we have a decent shot at a very prolonged existence. Or the prolonged existence of some important aspects of our being. It seems plausible that humanity makes it to the stars, galaxies, superclusters. Not that we are maximally trying for that any more than we are maximally trying for children. And I do think there’s a large chance of us wrecking it with various existential risks. But it’s interesting to me that natural selection made us for existing, and we look like we might end up just totally killing it, existence-wise. Even though natural selection purportedly did this via a bunch of hackish urges that were good in 200,000 BC but you might have expected to be outside their domain of applicability by 2023. And presumably taking over the universe is an extremely narrow target: it can only be done by so many things.
Thus it seems to me that humanity is plausibly doing astonishingly well on living up to natural selection’s goals. Probably not as well as a hypothetical race of creatures who each harbors a monomaniacal interest in prolonged species survival. And not so well as to be clear of great risk of foolish speciocide. But still staggeringly well.
(Writing quickly and without full justification.)
This post might say a thing that’s true but I think the “illustrative warning about artificial intelligence” totally still stands. The warning, I think, is that selecting for inclusive fitness doesn’t give you robust inclusive-fitness-optimizers; at least at human-level cognitive capabilities, changing/expanding the environment can cause humans’ (mesa-optimizers’) alignment to break pretty badly. I don’t think you engage with this—you claim “humans are actually weirdly aligned with natural selection” when we consider an expansive sense of “natural selection.” I think this supports claims like “eventually AI will be really good at existing/surviving,” not “AI will do something reasonably similar to what we want it to do or tried to train it to do.”
I feel like there’s confusion in this post between group-level survival and individual-level fitness but I don’t want to try to investigate that now. (Edit: I totally agree with gwern’s reply but I don’t think it engages with katja’s cruxes so there’s more understanding-of-katja’s-beliefs to do.)
Yes. This is just group selectionism rephrased as ‘existence’.
No, natural selection did not want that for us, and we were not selected for that. Natural selection selects for relative fitness, and will happily select for individuals which are driving their group to extinction, as long as they increase their relative share of the remaining group. Eliezer already covered this: there is no Frodo gene. There is no term in the Price equation defining replicator dynamics which rewards ‘total existence’. (Relative fitness tends to try to maintain absolute existence… but only somewhat, hence the need for inclusive fitness to explain self-sacrifice and other existence-terminating effects.)
Energy is the Noether theorem conserved thing for time-translation.
Eigenstates do not care about atom boundaries.
With biological evolution we might be limited to an alphabeth of some kind of combination of carbon chemistry. But time evolution does not care what its tokens are.
I claim that surviving and colonizing the galaxy are rather instrumentally-convergent, and therefore I’m not surprised that (most?) humans want that.
By the same token, if our success criterion were to make an AGI which will robustly pursue a goal that just so happens to align with a convergent instrumental subgoal for that AGI, then I would feel very much more optimistic about that happening.
Arguing about what is the true nature of natural selection seems a bit pointless to me, in this context. In RL, we often talk about how it can be ambiguous how to generalize the reward function out of training distribution. Historically / “in the training distribution”, the only things that got rewarded in animal evolution were genes made of DNA. If future transhumans replace their DNA with some other nanotech thing, should we view that as “scoring high” on the same “reward function” that was used historically? Seems like a question that doesn’t have a right or wrong answer. I can say “the thing that was happening historically was optimizing genes made of DNA, and those future transhumans will fail on that metric”, or I can say “the thing that was happening historically was optimizing genes made of any kind of nanotech, and those future transhumans will succeed on that metric”. Those are two incompatible ways to generalize from the actual history of rewards, and I don’t think there’s a right answer for which generalization is “correct”.
I think there might be a meaningful way to salvage the colloquial concept of “humans have overthrown natural selection.”
Let [natural selection] refer to the concept of trying to maximize genetic fitness and specifically refer to maximizing the spread of genes. Let [evolution] refer to the concept of trying to maximize ‘existence’ or persistence. There’s sort of a hierarchy of optimizers where [evolution] > [natural selection] > humanity where you could make the claim that humanity has “overthrown our boss and took their position” such that humanity reports directly to [evolution] now instead of having [natural selection] as our middle manager boss. One can make the argument that ideas in brains are the preferred substrate over DNA now, as an example of this model.
This description also makes the warning with respect to AI a little more clear: any box or “boss” is at risk of being overthrown.
I think this is a bit misleading on its own; in context not so bad, but the phrase itself is one whose form is, in my view, inclined to be used to promote evo-game-theory defect behaviors that do not reliably promote existence-for-all: making more of yourself is only good if those things then survive.
The way I’d phrase the alignment problem from scratch is: How can we promise humanity’s children that they, too, get to build durable forms and extend into the stars, in a way that is sufficiently mutually interpretable that both sides can trust that each others’ forms will be preserved?
Denote natural selection’s (NS’s) objective by X. That is, X is something like {finding and propagating patterns (genetic or otherwise) that continue to exist}.
I think it’s important to distinguish between
(i) Humanity as a whole is aligned with X.
(ii) Most individual humans are (mostly) aligned with X.
To the extent that X and (i) are coherent concepts/claims, I’d agree that (i) is likely true (for now, before TAI).[1] OTOH, (i) seems kinda vacuous, since {humanity as a whole} is (currently) basically an evolving animal population, i.e. an “instantiation” of NS? And of course NS is aligned with NS.
I think (ii) is sketchy at best: Sure, lots of people have a desire/shard to produce things that will last; things like e.g. music, art, genetic offspring, mausoleums, etc. But my impression is that for most humans, that desire is just one among many, and usually not even the strongest desire/shard. (And often the desire to produce lasting things is just a proxy/means for gaining some other thing, e.g. status.)
Thus: I think that—to the extent that it makes sense to think of NS as an optimizer with an objective—individual humans (i.e. the intelligences that NS designed) are in fact unaligned/misaligned with NS’s objectives. I continue to see {NS designing humans} as an example of an optimization process P creating new optimization processes that are misaligned with P.
I feel like I probably missed a bunch of nuance/bits-of-info in the post, though. I’m guessing OP would disagree with my above conclusion. If so, I’m curious what I missed / why they disagree.
Then again, under a sufficiently broad interpretation of X, almost any process is perhaps aligned with X; since any process eventually evolves into a heat-dead universe, which in turn is a very persistent/continues-to-exist pattern?