The Evolution Argument Sucks
There is a common argument that AI development is dangerous that goes something like:
The “goal” of evolution is to make animals which replicate their genes as much as possible;
humans do not want to replicate their genes as much as possible;
we have some goal which we want AIs to accomplish, and we develop them in a similar way to the way evolution developed humans;
therefore, they will not share this goal, just as humans do not share the goal of evolution.
This argument sucks. It has serious, fundamental, and, to my thinking, irreparable flaws. This is not to say that its conclusions are incorrect; to some extent, I agree with the point which is typically being made. People should still avoid bad arguments.
When Should We Stop Using Analogies?
Consider the following argument:
Most wars have fewer than one million casualties.
Therefore, we should expect that the next war (operationalised in some way) which starts will end with fewer than one million casualties.
There are some problems with this. For instance, we might think that modern wars have more or fewer casualties than the reference class of all wars; we might think that some particular conflict is on the horizon which we have reason to believe will be larger than normal; we might think that there is going to be some change in warfare in the near future which makes the next war even further from a historically “typical war” than recent wars have been. In any case, none of these issues are insurmountable. We could change the class which we are drawing from to match what we believe to be a more representative class, or use this as a “prior” and include more information to adjust our estimate; a reasonable casualty estimate might well have an argument similar to this at its core.
Here is another argument:
Most wars have fewer than one million casualties.
Therefore, the Russo-Ukrainian war has had fewer than one million casualties.
This argument is absurd. Even if there is a great deal of uncertainty about casualty estimates, almost any direct evidence about the actual specific war in question is going to totally overwhelm any considerations about which broad reference class that war may or may not be a part of, to the extent that considering such is ridiculous. Likewise[1], it may have been the case that, in the year 2005, extremely broad analogies like the evolution analogy were a reasonable part of a best guess estimate of how likely human-developed AIs were to do what we want. This is no longer the case. We have much more direct evidence about how well AIs, trained via any particular method we want to consider, generalise their training to off-distribution inputs; we have much more direct evidence about what goal-directed behavior might or might not result.[2]Even if this evidence is of poor quality, it is of much higher quality.
… And It Was Never a Good Analogy
Evolution Does Not Have Goals
Moving on to issues with the argument itself, this is the obvious one: evolution is not a person, and it does not have goals. I think most people, implicitly or explicitly, realise that this is a problem, and so invite the reader to imagine that there is some goal which evolution is working to achieve. Fundamentally there is no issue here; you can imagine constructing an argument that goes something like:
Evolution is analogous to a “loss function” which we train a model on;
humans do not want to “minimize loss” with respect to this loss function;
therefore, AIs will not want to “minimize loss” with respect to whichever loss function they are trained on.
This implies that training according to a loss function which represents our desires for an AI is a poor way to get an AI to intrinsically want to fulfil those desires.
Mendel, Not Crick
It is often claimed (e.g., in IABED) that the “goal” of genes is to cause as many instances of a particular molecule—the gene—to exist as possible. Bracketing all other questions, let’s first focus on a technical point: the “loss function” of evolution, insofar as it could be said to have such a thing, is not related to molecules in any way. I’m not completely sure where this misunderstanding actually came from, but I’m guessing it’s related to the slight misuse of the word “gene” in Dawkin’s The Selfish Gene. There are two meanings that the word “gene” can have. Wikipedia explains it as follows:
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA.
Insofar as one can view evolution as a process similar to, for instance, gradient descent, the unit on which it operates is the Mendelian gene, not the molecular gene. This may seem like an academic distinction, but if it is not made, the process of evolution becomes quite mysterious. For instance, organisms generally do not produce any more copies of their genetic code than they need to function[3]. Why not? DNA represents an extremely small portion of the resource expenditure of many animals; it would be nearly costless to, say, duplicate the genome a few additional times per cell. This ought to, assuming that organisms are selected based on how much they proliferate their (molecular) genes, massively increase fitness; in fact it reduces it. Why? Because the thing which is the essence of “fitness” is the proliferation of the trait, not whatever happens to encode that trait[4]. This makes claims like:
It seems to us that most humans simply don’t care about genetic fitness at all, in the deep sense. We care about proxies, like friendship and love and family and children. We maybe even care about passing on some of our traits to the next generation. But genes, specifically?
quite confusing. “Passing on some of our traits to the next generation” is precisely what propagating one’s genes is! There is no way to pass on traits which is not materially identical, with regards to evolution, to having offspring which inherit traits via whatever substrate those traits happen to currently reside on—indeed, producing many copies of one’s molecular genes without producing new individuals which are carriers of one’s traits is a failure by the standards of natural selection.
Humans Are Not “Misaligned”
It follows that, by any reasonable metric, humans are (somewhat) “aligned” to the task of propagating their genes. Many humans may not be, but most humans are. Most humans, intrinsically, want there to be a future in which there are many people who are morphologically and functionally similar to themselves; that is, most humans want their genes to be propagated. Perhaps most people do not want to optimise for genetic fitness, but if they did, it’s unclear that this would actually produce a species which was particularly fit; it seems as though our “misaligned” drives are precisely what lead us to developing industrial civilisation, with the concomitant population explosion[5]. It does not seem clear to me that a human organism which prioritises reproduction much more highly would actually have a greater population. There are certainly changes one could make that would make the current population higher in some counterfactual, but there’s no particular reason to believe that those changes would be the obvious ones, or the ones that would make humans “more aligned” to natural selection.
Now, one might say: that’s all well and good, but although things look good for our genes right now, in the future, the human population is going to collapse so far due to a lack of intrinsic reproductive drive that things will be much worse (again, for our genes) than if we just stayed hunter-gatherers. This seems possible, but hardly a sure thing—the only way I can imagine, for instance, human extinction, is via events that are not anthropogenic (and therefore not relevant for the question of whether greater or lesser “alignment” to the goal of replication would be preferable for species propagation, as it wouldn’t matter either way), or from ASI itself (and so the argument is circular). In particular, it seems no more certain that fertility decline will happen indefinitely than it was certain that population growth would happen until carrying capacity was reached. Given that many people explicitly and intrinsically desire the propagation of their phenotype (or at least the perpetuation of it), it’s unclear that the majority of “misalignment” is actually the result of a difference in what goals are held by humans as agents rather than the result of non-goal-directed behaviour (e.g. very few humans would characterise themselves as fulfilling any consciously-held goal by spending hours watching short videos, even among those who do). It seems likely to me that, with respect to evolution, humans have substantially more “inner” “alignment” than they do “outer”.
We Do Not Train AIs Using Evolution
This is obvious, and I include it only for completeness. The mechanism by which we optimise for whichever function we train an AI with is not the same as the mechanism by which organisms were selected for fitness.
Evolution Does Not Produce Individual Brains
More to the point, the object which is under training is not analogous. The “product” of evolution is a phenotype, not an organism, whereas the product of a run of AI training is an actual AI. The fact that the goals an adult organism has are not the same as the “goal” its phenotype was constructed to fulfil, even supposing that such a goal exists, is unsurprising—the task of completely specifying through genes an organism that will want as an adult a highly abstract goal—reliably, and no matter their environment—is facially impossible! (If one misunderstands which kind of “gene” is relevant to the matter, then it becomes even more clearly impossible; how, exactly, would one encode in DNA the concept of a molecule and the concept that some particular molecule ought to be propagated as much as possible?). Given that it’s not possible for evolution to select some genome which ensures that every organism cares only about optimising for genetic fitness, it’s totally meaningless that it failed to do so. One could say that there exist genomes close to those of humans which are substantially more likely to become propagation-maximisers as adults, but this is non-obvious, and it is unclear exactly how much selection pressure was applied. It could be that there exist slightly better human-similar genomes for this goal but that even a genome that is as good at specifying goals as the one we have is extremely rare; this would imply that evolution is perfectly fine at selecting for the appropriate “inner optimisers”, it’s just that the task it needs to solve to produce such a thing is much more difficult than the task of selecting, for instance, model weights. It is clear that, although there may not exist any genomes of reasonable size which satisfy the assignment, there exist some brains that do, so the fact that evolution failed to instill any particular abstract goal via the genome is extremely weak evidence towards the claim that we will fail at the same task by training “brains”. It would be reasonable to conclude that we would fail to produce an AI that does what we want through only the specification of architecture and hyperparameters, but nobody is trying to do this.
…And So On
The weakest possible conclusion of the evolution argument, which one does ever see[6], is: if a thing with goals results from a process which picks things which fulfill some particular criterion, it is possible for those goals not to be to optimally fulfill that criterion. I do not have any objection to this weak form; the example of evolution does seem to show that this can occur. However, I think this is about as strong an argument as you can derive from evolution, and people seem to want to go much further.
Supposing that one wants to use it to argue for a stronger conclusion, one might wish to claim that it is useful as a tool for communicating how well AI actions will conform to our desires to other people, even if it is not useful as evidence thereof. The issue is that either the analogy holds sufficiently well to prove the stronger conclusion (and it does not), or any understanding you are imparting towards that effect is illusory; you will give people the impression that they understand how AI works or will work, but it will be a false impression. There seems to be a desire to “have it both ways”; it’s an accurate tool one can use to think about AI training (so one does not need to disclaim it or clarify later that it was a lie-to-children), but insofar as it’s not accurate, it’s just an evocative metaphor without any explanatory force.
Of course, no matter how flawed the comparison to evolution is, there doesn’t seem to be any competing analogy which makes the same argument in a more defensible manner. And people love analogies. A friend, reading this post, told me (paraphrased): “give a better analogy, then, if this one isn’t good”. I have to admit, I have no better analogy. Maybe this is the best possible. But if so, then we should not be using analogies at all; there is simply no other situation familiar to most people[7]similar enough to provide insight. Imagine explaining basic rocket physics to someone and telling them that they can’t think of the motion of bodies in space as they would think of the motion of objects in an atmosphere; if they respond “well, if it’s not like the movement of objects I’m familiar with, what is it like?”, the only appropriate response is: “nothing. You do not encounter anything like it in your daily life. You simply must learn it as it is, without reference to any other situation”.[8]This is unsatisfying, as it is unsatisfying to say “we have not encountered artificial intelligence of this kind before, nor anything similar”, but the alternative is to be wrong, and that is worse.
Further Reading
I’m obviously not the first person to talk about this, and I’ve elided some points which other have made clearly before; here’s a couple of prior posts on the same topic:
I am aware that this is also an analogy. Mea culpa. ↩︎
For instance, in the year 2015 it seemed very likely to me, and I assume many others, that any AI capable of any remotely intelligent behaviour would in fact have such a thing as goals, which would be easily observed; after all, among animals, intelligent behaviour and goal-directed behaviour seem universally to go hand-in-hand. Now that it has been conclusively demonstrated that it is possible for something to carry out a conversation as facially intelligent as most humans can without much or any goal-directed behaviour to speak of, the analogy to intelligent animals is much less relevant as a source of insight into what future AIs will look like. ↩︎
Indeed, there are sometimes complex mechanisms to remove cell nuclei from places where they are not needed; e.g. RBCs in humans, neurons in M. mymaripenne. ↩︎
…kind of. There exist things like transposable elements, viruses, and so on, on a spectrum between “unit of inheritance” and what one might more generally call a “molecular replicator”. ↩︎
One might object to the idea of referring to the species as a whole as “fit”, but note that we have already conceded this framework to begin with by talking about whether “humans” are “aligned” in aggregate; we could remove this abstraction by talking about individual humans, but, given the similar phenotypes among all living humans, this would ultimately result in the same conclusion. ↩︎
…not that many people understand natural selection very well, as is frequently demonstrated. ↩︎
Well, actually, there’s another appropriate response: “it’s like Kerbal Space Program, play that for fifty hours and then you’ll understand”. So, imagine someone having this conversation before KSP existed. ↩︎
Given that you immediately give an example where they’re not identical, maybe you wanted to say something a little more complicated than “these things are materially identical.”
Anyhow, good post just on the strength of the point about Mendelian genes vs. DNA. An organism that sprays its DNA everywhere is not the sort of thing natural selection optimizes for (except in very special cases where the environment helps the DNA cause more of the organism). That seems obvious, but the implications about traits not being molecular is non-obvious.
Totally don’t buy “But maybe we needed to not be optimizing in order to have the industrial revolution”—how on earth are we supposed to define such a thing, let alone measure it? Meanwhile our current degree of baby production is highly measurable, and we can clearly see that we’re doing way better than chance but way worse than the optimum. Whether this counts as “aligned” or “misaligned” seems to be a matter of interpretation. You can ask how I would feel about an AI that had a similar relationship to its training signal and I’d probably call it ‘inner misaligned’, but the analogy is bad at this.
Good point WRT that first line—I edited it to something more clunky but I think more accurate. Hopefully the intended meaning came across anyway.
WRT the second point—I agree that this is the weakest/most speculative argument in the post, although I still think it’s worth considering. Evolution obviously “had the ability” to make us much more baby-obsessed, or have a higher sex drive, and yet we do not. This indicates that there are tradeoffs to be made; a human with a higher reproductive drive is less fit in other ways. One of those ways is plausibly that a human with a lower reproductive drive gets more “other stuff” done—like maintaining a community, thinking about its environment, and so on—and that “other stuff” is very important for increasing the number of offspring which survive. And, indeed, we have a very important example of some “other stuff” which massively increased the total number of humans alive; it doesn’t seem absurd to suggest that it was no “mistake” for us to have the reproductive drive that we do, and that if God reached down into the world in the distant past and made the straightforward change of “increase the reproductive drive of humans”, this would in fact have made there be fewer humans in the year 2026.
Now, this is all very tangential with regards to the actual analogy being made; it’s unclear what if anything this has to do with AI, in large part due to the many other disanalogies between evolution and AI training. But insofar as all we are doing is judging the capacity of the human species to “fulfill the goal of evolution”, it’s relevant that our drives are what they are in large part because having them that way does “fulfill the goal”, even in part because the drive does not perfectly match the goal.
I agree with a lot in this post, but still it seem unfair to call something the evolution argument. I mean, maybe Eliezer and people use that terminology, in which case, they are making an error or at least being imprecise, and I retract my accusation, I haven’t gone back and checked.
How I’d phrase it is: There is an abstract argument, which makes no reference to evolution, which is just that optimizing a high dimensional thing to achieve low loss doesn’t tell you how that loss is achieved, and you consequently cannot make strong inferences about how that object will behave OOD after training (without studying it further and collecting more information).
Evolution is a piece of evidence for the argument, maybe the central example of this playing out. But its not the only piece of evidence, and its not itself an argument.
A better, more mechanistically relevant analogy is within-lifetime human reward circuitry (outer) and learned human values (inner). However, it doesn’t yield the same conclusions (which I think is good). I think it’s more relevant due to greater similarity in mechanism to LLMs (locally randomly initialized networks updated by a local update rule using predictive and reinforcement learning, also trained on a lot of language data), but still not quite as relevant as actual LLM experiments.
I agree that we should stop with the analogies. Gather evidence to learn how it actually works. Let go of these old arguments that we don’t need anymore.
True and important, but if anything I think the importance in this particular community is often overstated rather than unappreciated. I suspect the analogy itself is downstream of a flaw in human languages, which are very agent-centric in their grammatical assumptions. They didn’t evolve to describe impersonal forces like evolution, and trying to do so without such analogies is often very cumbersome in ways that obfuscate the reality more than they enlighten.
A lot of good points in this section as well. To the “Who cares?” question, the answer is, “We do, until and unless we know how to use other methods that do sufficiently reliably encode the goals we (should) care about into the AIs we create.”