Roughly speaking, this is because when you grow minds, they don’t care about what you ask them to care about and they don’t care about what you train them to care about; instead, I expect them to care about a bunch of correlates of the training signal in weird and specific ways.
(Similar to how the human genome was naturally selected for inclusive genetic fitness, but the resultant humans didn’t end up with a preference for “whatever food they model as useful for inclusive genetic fitness”. Instead, humans wound up internalizing a huge and complex set of preferences for “tasty” foods, laden with complications like “ice cream is good when it’s frozen but not when it’s melted”.)
I simply do not understand why people keep using this example.
I think it is wrong—evolution does not grow minds, it grows hyperparameters for minds. When you look at the actual process for how we actually start to like ice-cream—namely, we eat it, and then we get a reward, and that’s why we like it—then the world looks a a lot less hostile, and misalignment a lot less likely.
But given that this example is so controversial, even if it were right why would you use it—at least, why would you use it if you had any other example at all to turn to?
Does evolution ~= AI have predictive power apart from doom? I have yet to see how natural selection helps me predict how any SGD algorithm works. It does not distinguish between Adam, AdamW. As far as I know it is irrelevant to Singular Learning Theory or NTK or anything else. It doesn’t seem to come up when you try to look at NN biases. If it isn’t an illuminating analogy anywhere else, why do we think the way it predicts doom to be true?
I think Nate’s claim “I expect them to care about a bunch of correlates of the training signal in weird and specific ways.” is plausible, at least for the kinds of AGI architectures and training approaches that I personally am expecting. If you don’t find the evolution analogy useful for that (I don’t either), but are OK with human within-lifetime learning as an analogy, then fine! Here goes!
OK, so imagine some “intelligent designer” demigod, let’s call her Ev. In this hypothetical, the human brain and body were not designed by evolution, but rather by Ev. She was working 1e5 years ago, back on the savannah. And her design goal was for these humans to have high inclusive genetic fitness.
So Ev pulls out a blank piece of paper. First things first: She designed the human brain with a fancy large-scale within-lifetime learning algorithm, so that these humans can gradually get to understand the world and take good actions in it.
Supporting that learning algorithm, she needs a reward function (“innate drives”). What to do there? Well, she spends a good deal of time thinking about it, and winds up putting in lots of perfectly sensible components for perfectly sensible reasons.
For example: She wanted the humans to not get injured, so she installed in the human body a system to detect physical injury, and put in the brain an innate drive to avoid getting those injuries, via an innate aversion (negative reward) related to “pain”. And she wanted the humans to eat sugary food, so she put a sweet-food-detector on the tongue and installed in the brain an innate drive to trigger reinforcement (positive reward) when that detector goes off (but modulated by hunger, as detected by yet another system). And so on.
Then she did some debugging and hyperparameter tweaking by running these newly-designed humans in the training environment (African savannah) and seeing how they do.
So that’s how Ev designed humans. Then she “pressed go” and lets them run for 1e5 years. What happened?
Well, I think it’s fair to say that modern humans “care about” things that probably would have struck Ev as “weird”. (Although we, with the benefit of hindsight, can wag our finger at Ev and say that she should have seen them coming.) For example:
Superstitions and fashions: Some people care, sometimes very intensely, about pretty arbitrary things that Ev could not have possibly anticipated in detail, like walking under ladders, and where Jupiter is in the sky, and exactly what tattoos they have on their body.
Lack of reflective equilibrium resulting in self-modification: Ev put a lot of work into her design, but sometimes people don’t like some of the innate drives or other design features that Ev put into them, so the people go right ahead and change them! For example, they don’t like how Ev designed their hunger drive, so they take Ozempic. They don’t like how Ev designed their attentional system, so they take Adderall. Many such examples.
New technology / situations leading to new preferences and behaviors: When Ev created the innate taste drives, she was (let us suppose) thinking about the food options available on the savannah, and thinking about what drives would lead to people making smart eating choices in that situation. And she came up with a sensible and effective design for a taste-receptors-and-associated-innate-drives system that worked well for that circumstance. But maybe she wasn’t thinking that humans would go on to create a world full of ice cream and coca cola and miraculin and so on. Likewise, Ev put in some innate drives with the idea that people would wind up exploring their local environment. Very sensible! But Ev would probably be surprised that her design is now leading to people “exploring” open-world video-game environments while cooped up inside. Ditto with social media, organized religion, sports, and a zillion other aspects of modern life. Ev probably didn’t see any of it coming when she was drawing up and debugging her design, certainly not in any detail.
To spell out the analogy here:
Ev ↔ AGI programmers;
Human within-lifetime learning ↔ AGI training;
Adult humans ↔ AGIs;
Ev “presses go” and lets human civilization “run” for 1e5 years without further intervention ↔ For various reasons I consider it likely (for better or worse) that there will eventually be AGIs that go off and autonomously do whatever they think is a good thing to do, including inventing new technologies, without detailed human knowledge and approval.
Modern humans care about (and do) lots of things that Ev would have been hard-pressed to anticipate, even though Ev designed their innate drives and within-lifetime learning algorithm in full detail ↔ even if we carefully design the “innate drives” of future AGIs, we should expect to be surprised about what those AGIs end up caring about, particularly when the AGIs have an inconceivably vast action space thanks to being able to invent new technology and build new systems.
Does evolution ~= AI have predictive power apart from doom?
Evolution analogies predict a bunch of facts that are so basic they’re easy to forget about, and even if we have better theories for explaining specific inductive biases, the simple evolution analogies should still get some weight for questions we’re very uncertain about.
Selection works well to increase the thing you’re selecting on, at least when there is also variation and heredity
Overfitting: sometimes models overfit to a certain training set; sometimes species adapt to a certain ecological niche and their fitness is low outside of it
Vanishing gradients: fitness increase in a subpopulation can be prevented by lack of correlation between available local changes to genes and fitness
Catastrophic forgetting: when trained on task A then task B, models often lose circuits specific to task A; when put in environment A then environment B species often lose vestigial structures useful in environment A
There’s a mostly unimodal and broad peak for optimal learning rate, just like for optimal mutation rate
Adversarial training dynamics
Adversarial examples usually exist (there exist chemicals that can sterilize or poison most organisms)
Adversarial training makes models more robust (bacteria can evolve antibiotic resistance)
Adversarially trained models generally have worse performance overall (antibiotic-resistant bacteria are outcompeted by normal bacteria when there are no antibiotics)
The attacker can usually win the arms race of generating and defending against adversarial attacks (evolutionary arms races are very common)
A few things that feel more tenuous
maybe NTK lottery ticket hypothesis; when mutation rates are low evolution can be approximated as taking the best-performing organism; when total parameter distance is small SGD can be approximated as taking the best-performing model from the parameter tangent space
maybe inner optimizers; transformers learn in context by gradient descent while evolution invents brains, positive and negative selection of T cells to prevent them attacking the body, probably other things
Task vectors: adding sparse task vectors together often produces a model that can do both tasks; giving an organism alleles for two unrelated genetic disorders often gives it both disorders
Grokking/punctuated equilibrium: in some circumstances applying the same algorithm for 100 timesteps causes much larger changes in model behavior / organism physiology than in other circumstances [edit: moved this from above because 1a3orn makes the case that it’s not very central]
I agree that if you knew nothing about DL you’d be better off using that as an analogy to guide your predictions about DL than using an analogy to a car or a rock.
I do think a relatively small quantity of knowledge about DL screens off the usefulness of this analogy; that you’d be better off deferring to local knowledge about DL than to the analogy.
Or, what’s more to the point—I think you’d better defer to an analogy to brains than to evolution, because brains are more like DL than evolution is.
Combining some of yours and Habryka’s comments, which seem similar.
The resulting structure of the solution is mostly discovered not engineered. The ontology of the solution is extremely unopinionated and can contain complicated algorithms that we don’t know exist.
It’s true that the structure of the solution is discovered and complex—but the ontology of the solution for DL (at least in currently used architectures) is quite opinionated towards shallow circuits with relatively few serial ops. This is different than the bias for evolution, which is fine with a mutation that leads to 10^7 serial ops if it’s metabolic costs are low. So the resemblance seems shallow other than “solutions can be complex.” I think to the degree that you defer to this belief rather than more specific beliefs about the inductive biases of DL you’re probably just wrong.
There’s a mostly unimodal and broad peak for optimal learning rate, just like for optimal mutation rate
As far as I know optimal learning rate for most architectures is scheduled, and decreases over time, which is not a feature of evolution so far as I am aware? Again the local knowledge is what you should defer to.
You are ultimately doing a local search, which means you can get stuck at local minima, unless you do something like increase your step size or increase the mutation rate
Is this a prediction that a cyclic learning rate—that goes up and down—will work out better than a decreasing one? If so, that seems false, as far as I know.
Grokking/punctuated equilibrium: in some circumstances applying the same algorithm for 100 timesteps causes much larger changes in model behavior / organism physiology than in other circumstances
As far as I know grokking is a non-central example of how DL works, and in evolution punctuated equilibrium is a result of the non-i.i.d. nature of the task, which is again a different underlying mechanism from DL. If apply DL on non-i.i.d problems then you don’t get grokking, you just get a broken solution. This seems to round off to, “Sometimes things change faster than others,” which is certainly true but not predictively useful, or in any event not a prediction that you couldn’t get from other places.
Like, leaving these to the side—I think the ability to post-hoc fit something is questionable evidence that it has useful predictive power. I think the ability to actually predict something else means that it has useful predictive power.
Again, let’s take “the brain” as an example of something to which you could analogize DL.
There are multiple times that people have cited the brain as an inspiration for a feature in current neural nets or RL. CNNS, obviously; the hippocampus and experience replay; randomization for adversarial robustness. You can match up interventions that cause learning deficiencies in brains to similar deficiencies in neural networks. There are verifiable, non-post hoc examples of brains being useful for understanding DL.
As far as I know—you can tell me if there are contrary examples—there are obviously more cases where inspiration from the brain advanced DL or contributed to DL understanding than inspiration from evolution. (I’m aware of zero, but there could be some.) Therefore it seems much more reasonable to analogize from the brain to DL, and to defer to it as your model.
I think in many cases it’s a bad idea to analogize from the brain to DL! They’re quite different systems.
But they’re more similar than evolution and DL, and if you’d not trust the brain to guide your analogical a-theoretic low-confidence inferences about DL, then it makes more sense to not trust evolution for the same.
FWIW my take is that the evolution-ML analogy is generally a very excellent analogy, with a bunch of predictive power, but worth using carefully and sparingly. Agreed that sufficient detail on e.g. DL specifics can screen off the usefulness of the analogy, but it’s very unclear whether we have sufficient detail yet. The evolution analogy was originally supposed to point out that selecting a bunch for success on thing-X doesn’t necessarily produce thing-X-wanters (which is obviously true, but apparently not obvious enough to always be accepted without providing an example).
I think you’d better defer to an analogy to brains than to evolution, because brains are more like DL than evolution is.
Not sure where to land on that. It seems like both are good analogies? Brains might not be using gradients at all[1], whereas evolution basically is. But brains are definitely doing something like temporal-difference learning, and the overall ‘serial depth’ thing is also weakly in favour of brains ~= DL vs genomes+selection ~= DL.
I’d love to know what you’re referring to by this:
evolution… is fine with a mutation that leads to 10^7 serial ops if it’s metabolic costs are low.
Also,
Is this a prediction that a cyclic learning rate—that goes up and down—will work out better than a decreasing one? If so, that seems false, as far as I know.
I think the jury is still out on this, but there’s literature on it (probably much more I haven’t fished out). [EDIT: also see this comment which has some other examples]
AFAIK there’s no evidence of this and it would be somewhat surprising to find it playing a major role. Then again, I also wouldn’t be surprised if it turned out that brains are doing something which is secretly sort of equivalent to gradient descent.
I’m genuinely surprised at the “brains might not be doing gradients at all” take; my understanding is they are probably doing something equivalent.
Similarly this kind of paper points in the direction of LLMs doing something like brains. My active expectation is that there will be a lot more papers like this in the future.
But to be clear—my overall view of the similarity of brain to DL is admittedly fueled less by these specific papers, though, which are nice gravy for my view but not the actual foundation, and much more by what I see as the predictive power of hypotheses like this, which are massively more impressive inasmuch as they were made before Transformers had been invented. Given Transformers, the comparison seems overdetermined; I wish I had seen that way back in 2015.
Re. serial ops and priors—I need to pin down the comparison more, given that it’s mostly about the serial depth thing, and I think you already get it. The base idea is that what is “simple” to mutations and what is “simple” to DL are extremely different. Fuzzily: A mutation alters protein-folding instructions, and is indifferent to the “computational costs” of working this out in reality; if you tried to work out the analytic gradient for the mutation (the gradient over mutation → protein folding → different brain → different reward → competitors children look yummy → eat em) your computer would explode. But DL seeks only a solution that can be computed in a big ensemble of extremely short circuits, learned almost entirely specifically because of the data off of which you’ve trained. Ergo DL has very different biases, where the “complexity” for mutations probably has to do with instructional length where, “complexity” for DL is more related to how far you are from whatever biases are engrained in the data (<--this is fuzzy), and the shortcut solutions DL learns are always implied from the data.
So when you try to transfer intuitions about the “kind of solution” DL gets from evolution (which ignores this serial depth cost) to DL (which is enormously about this serial depth cost) then the intuition breaks. As far as I can tell that’s why we have this immense search for mesaoptimizers and stuff, which seems like it’s mostly just barking up the wrong tree to me. I dunno; I’d refine this more but I need to actually work.
Re. cyclic learning rates: Both of us are too nervous about the theory --> practice junction to make a call on how all this transfers to useful algos (Although my bet is that it won’t.). But if we’re reluctant to infer from this—how much more from evolution?
Mm, thanks for those resource links! OK, I think we’re mostly on the same page about what particulars can and can’t be said about these analogies at this point. I conclude that both ‘mutation+selection’ and ‘brain’ remain useful, having both is better than having only one, and care needs to be taken in any case!
As I said,
I also wouldn’t be surprised if it turned out that brains are doing something which is secretly sort of equivalent to gradient descent
so I’m looking forward to reading those links.
Runtime optimisation/search and whatnot remain (broadly-construed) a sensible concern from my POV, though I wouldn’t necessarily (at first) look literally inside NN weights to find them. I think more likely some scaffolding is needed, if that makes sense (I think I am somewhat idiosyncratic in this)? I get fuzzy at this point and am still actively (slowly) building my picture of this—perhaps your resource links will provide me fuel here.
Not sure where to land on that. It seems like both are good analogies? Brains might not be using gradients at all[1], whereas evolution basically is.
I mean, does it matter? What if it turns out that gradient descent itself doesn’t affect inductive biases as much as the parameter->function mapping? If implicit regularization (e.g. SGD) isn’t an important part of the generalization story in deep learning, will you down-update on the appropriateness of the evolution/AI analogy?
Is this a prediction that a cyclic learning rate—that goes up and down—will work out better than a decreasing one? If so, that seems false, as far as I know.
https://www.youtube.com/watch?v=GM6XPEQbkS4 (talk) / https://arxiv.org/abs/2307.06324 prove faster convergence with a periodic learning rate. On a specific ‘nicer’ space than reality, and they’re (I believe from what I remember) comparing to a good bound with a constant stepsize of 1.
So it may be one of those papers that applies in theory but not often in practice, but I think it is somewhat indicative.
I think the ability to post-hoc fit something is questionable evidence that it has useful predictive power. I think the ability to actually predict something else means that it has useful predictive power.
It’s always trickier to reason about post-hoc, but some of the observations could be valid, non-cherry-picked parallels between evolution and deep learning that predict further parallels.
I think looking at which inspired more DL capabilities advances is not perfect methodology either. It looks like evolution predicts only general facts whereas the brain also inspires architectural choices. Architectural choices are publishable research whereas general facts are not, so it’s plausible that evolution analogies are decent for prediction and bad for capabilities. Don’t have time to think this through further unless you want to engage.
One more thought on learning rates and mutation rates:
As far as I know optimal learning rate for most architectures is scheduled, and decreases over time, which is not a feature of evolution so far as I am aware?
This feels consistent with evolution, and I actually feel like someone clever could have predicted it in advance. Mutation rate per nucleotide is generally lower and generation times are longer in more complex organisms; this is evidence that lower genetic divergence rates are optimal, because evolution can tune them through e.g. DNA repair mechanisms. So it stands to reason that if models get more complex during training, their learning rate should go down.
Does anyone know if decreasing learning rate is optimal even when model complexity doesn’t increase over time?
Not sure what you mean here. One of the best explanations of how neural networks get trained uses basically a pure natural selection lens, and I think it gets most predictions right:
In-general I think if you use a natural selection analogy you will get a huge amount of things right about how AI works, though I agree not everything (it won’t explain the difference between Adam and AdamW, but it will explain the difference between hierarchical bayesian networks, linear regression and modern deep learning).
Note: I just watched the videos. I personally would not recommend the first video as an explanation to a layperson if I wanted them to come away with accurate intuitions around how today’s neural networks learn / how we optimize them. What it describes is a very different kind of optimizer, one explicitly patterned after natural selection such as a genetic algorithm or population-based training, and the follow-up video more or less admits this. I would personally recommend they opt for videos these instead:
Except that selection and gradient descent are closely mathematically related—you have to make a bunch of simplifying assumptions, but ‘mutate and select’ (evolution) is actually equivalent to ‘make a small approximate gradient step’ (SGD) in the limit of small steps.
I read the post and left my thoughts in a comment. In short, I don’t think the claimed equivalence in the post is very meaningful.
(Which is not to say the two processes have no relationship whatsoever. But I am skeptical that it’s possible to draw a connection stronger than “they both do local optimization and involve randomness.”)
Awesome, I saw that comment—thanks, and I’ll try to reply to it in more detail.
It looks like you’re not disputing the maths, but the legitimacy/meaningfulness of the simplified models of natural selection that I used? From a skim, the caveats you raised are mostly/all caveated in the original post too—though I think you may have missed the (less rigorous but more realistic!) second model at the end, which departs from the simple annealing process to a more involved population process.
I think even on this basis though, it’s going too far to claim that the best we can say is “they both do local optimization and involve randomness”! The steps are systematically pointed up/down the local fitness gradient, for one. And they’re based on a sample-based stochastic realisation for another.
I don’t want you to get the impression I’m asking for too much from this analogy. But the analogy is undeniably there. In fact, in those explainer videos Habryka linked, the particular evolution described is a near-match for my first model (in which, yes, it departs from natural genetic evolution in the same ways).
It looks like you’re not disputing the maths, but the legitimacy/meaningfulness of the simplified models of natural selection that I used?
I’m disputing both. Re: math, the noise in your model isn’t distributed like SGD noise, and unlike SGD the the step size depends on the gradient norm. (I know you did mention the latter issue, but IMO it rules out calling this an “equivalence.”)
I did see your second proposal, but it was a mostly-verbal sketch that I found hard to follow, and which I don’t feel like I can trust without seeing a mathematical presentation.
(FWIW, if we have a population that’s “spread out” over some region of a high-dim NN loss landscape—even if it’s initially a small / infinitesimal region—I expect it to quickly split up into lots of disjoint “tendrils,” something like dye spreading in water. Consider what happens e.g. at saddle points. So the population will rapidly “speciate” and look like an ensemble of GD trajectories instead of just one.
If your model assumes by fiat that this can’t happen, I don’t think it’s relevant to training NNs with SGD.)
Wait, you think that a model which doesn’t speciate isn’t relevant to SGD? I’ll need help following, unless you meant something else. It seems like speciation is one of the places where natural evolutions distinguish themselves from gradient descent, but you seem to also be making this point?
In the second model, we retrieve non-speciation by allowing for crossover/horizontal transfer, and yes, essentially by fiat I rule out speciation (as a consequence of the ‘eventually-universal mixing’ assumption). In real natural selection, even with horizontal transfer, you get speciation, albeit rarely. It’s obviously a fascinating topic, but I think pretty irrelevant to this analogy.
For me, the step-size thing is interesting but essentially a minor detail. Any number of practical departures from pure SGD mess with the step size anyway (and with the gradient!) so this feels like asking for too much. Do we really think SGD vs momentum vs Adam vs … is relevant to the conclusions we want to draw? (Serious question; my best guess is ‘no’, but I hold that medium-lightly.)
(irrelevant nitpick by my preceding paragraph, but) FWIW vanilla SGD does depend on gradient norm. [ETA: I think I misunderstood exactly what you were saying by ‘step size depends on the gradient norm’, so I think we agree about the facts of SGD. But now think about the space including SGD, RMSProp, etc. The ‘depends on gradient norm’ piece which arises from my evolution model seems entirely at home in that family.]
On the distribution of noise, I’ll happily acknowledge that I didn’t show equivalence. I half expect that one could be eked out at a stretch, but I also think this is another minor and unimportant detail.
I agree that they are related. In the context of this discussion, the critical difference between SGD and evolution is somewhat captured by your Assumption 1:
Fixed ‘fitness function’ or objective function mapping genome to continuous ‘fitness score’
Evolution does not directly select/optimize the content of minds. Evolution selects/optimizes genomes based (in part) on how they distally shape what minds learn and what minds do (to the extent that impacts reproduction), with even more indirection caused by selection’s heavy dependence on the environment. All of that creates a ton of optimization “slack”, such that large-brained human minds with language could steer optimiztion far faster & more decisively than natural selection could. This what 1a3orn was pointing to earlier with
evolution does not grow minds, it grows hyperparameters for minds. When you look at the actual process for how we actually start to like ice-cream—namely, we eat it, and then we get a reward, and that’s why we like it—then the world looks a a lot less hostile, and misalignment a lot less likely.
SGD does not have that slack by default. It acts directly on cognitive content (associations, reflexes, decision-weights), without slack or added indirection. If you control the training dataset/environment, you control what is rewarded and what is penalized, and if you are using SGD, then this lets you directly mold the circuits in the model’s “brain” as desired. That is one of the main alignment-relevant intuitions that gets lost when blurring the evolution/SGD distinction.
Right. And in the context of these explainer videos, the particular evolution described has the properties which make it near-equivalent to SGD, I’d say?
SGD does not have that slack by default. It acts directly on cognitive content (associations, reflexes, decision-weights), without slack or added indirection. If you control the training dataset/environment, you control what is rewarded and what is penalized, and if you are using SGD, then this lets you directly mold the circuits in the model’s “brain” as desired.
Hmmm, this strikes me as much too strong (especially ‘this lets you directly mold the circuits’).
Remember also that with RLHF, we’re learning a reward model which is something like the more-hardcoded bits of brain-stuff, which is in turn providing updates to the actually-acting artefact, which is something like the more-flexibly-learned bits of brain-stuff.
I also think there’s a fair alternative analogy to be drawn like
evolution of genome (including mostly-hard-coded brain-stuff) ~ SGD (perhaps +PBT) of NN weights
within-lifetime-learning of organism ~ in-context something-something of NN
(this is one analogy I commonly drew before RLHF came along.)
So, look, the analogies are loose, but they aren’t baseless.
It won’t explain the difference between Adam and AdamW, but it will explain the difference between hierarchical bayesian networks, linear regression and modern deep learning
CGP Grey’s video is a decent example source. Most of the differences between hierarchical bayesian networks and modern deep learning come across pretty well if you model the latter as a type of genetic algorithm search:
The resulting structure of the solution is mostly discovered not engineered. The ontology of the solution is extremely unopinionated and can contain complicated algorithms that we don’t know exist.
Training consists of a huge amount of trial and error where you take datapoints, predict something about the result, then search for nearby modifications that do better, then repeat until performance plateaus.
You are ultimately doing a local search, which means you can get stuck at local minima, unless you do something like increase your step size or increase the mutation rate
There are also just actually deep similarities. Vanilla SGD is perfectly equivalent to a genetic search with an infinitesimally small mutation size and infinite samples per generation (I could make a proof here but won’t unless someone is interested in it). Indeed in one of my ML classes at Berkeley genetic algorithms were suggested as one of the obvious things to do in an indifferentiable loss-landscape as generalization of SGD, where you just try some mutations, see which one performs best, and then modify your parameters in that direction.
Vanilla SGD is perfectly equivalent to a genetic search with an infinitesimally small mutation size and infinite samples per generation (I could make a proof here but won’t unless someone is interested in it)
If you think that people’s genes would be a lot fitter if people cared about fitness more then surely there’s a good chance that a more efficient version of natural selection would lead to people caring more about fitness.
You might, on the other hand, think that the problem is more related to feedbacks. I.e. if you’re the smartest monkey, you can spend your time scheming to have all the babies. If there are many smart monkeys, you have to spend a lot of time worrying about what the other monkeys think of you. If this is how you’re worried misalignment will arise, then I think “how do deep learning models generalise?” is the wrong tree to bark up
C. If people did care about fitness, would Yudkowsky not say “instrumental convergence! Reward hacking!”? I’d even be inclined to grant he had a point.
evolution does not grow minds, it grows hyperparameters for minds.
Imo this is a nitpick that isn’t really relevant to the point of the analogy. Evolution is a good example of how selection for X doesn’t necessarily lead to a thing that wants (‘optimizes for’) X; and more broadly it’s a good example for how the results of an optimization process can be unexpected.
I want to distinguish two possible takes here:
The argument from direct implication: “Humans are misaligned wrt evolution, therefore AIs will be misaligned wrt their objectives”
Evolution as an intuition pump: “Thinking about evolution can be helpful for thinking about AI. In particular it can help you notice ways in which AI training is likely to produce AIs with goals you didn’t want”
It sounds like you’re arguing against (1). Fair enough, I too think (1) isn’t a great take in isolation. If the evolution analogy does not help you think more clearly about AI at all then I don’t think you should change your mind much on the strength of the analogy alone. But my best guess is that most people incl Nate mean (2).
> evolution does not grow minds, it grows hyperparameters for minds.
Imo this is a nitpick that isn’t really relevant to the point of the analogy. Evolution is a good example of how selection for X doesn’t necessarily lead to a thing that wants (‘optimizes for’) X; and more broadly it’s a good example for how the results of an optimization process can be unexpected.
I think it’s extremely relevant, if we want to ensure that we only analogize between processes which share enough causal structure to ensure that lessons from e.g. evolution actually carry over to e.g. AI training (due to those shared mechanisms). If the shared mechanisms aren’t there, then we’re playing reference class tennis because someone decided to call both processes “optimization processes.”
The argument I think is good (nr (2) in my previous comment) doesn’t go through reference classes at all. I don’t want to make an outside-view argument (eg “things we call optimization often produce misaligned results, therefore sgd is dangerous”). I like the evolution analogy because it makes salient some aspects of AI training that make misalignment more likely. Once those aspects are salient you can stop thinking about evolution and just think directly about AI.
It has been over two years since the publication of that post, and criticism of this analogy has continued to intensify. The OP and other MIRI members have certainly been exposed to this criticism already by this point, and as far as I am aware, no principled defense has been made of the continued use of this example.
I encourage @So8res and others to either stop using this analogy, or to argue explicitly for its continued usage, engaging with the arguments presented by Byrnes, Pope, and others.
But given that this example is so controversial, even if it were right why would you use it—at least, why would you use it if you had any other example at all to turn to?
Humans are the only real-world example we have of human-level agents, and natural selection is the only process we know of for actually producing them.
SGD, singular learning theory, etc. haven’t actually produced human-level minds or a usable theory of how such minds work, and arguably haven’t produced anything that even fits into the natural category of minds at all, yet. (Maybe they will pretty soon, when applied at greater scale or in combination with additional innovations, either of which could result in the weird-correlates problem emerging.)
Also, the actual claims in the quote seem either literally true (humans don’t care about foods that they model as useful for inclusive genetic fitness) or plausible / not obviously false (when you grow minds [to human capabilities levels], they end up caring about a bunch of weird correlates). I think you’re reading the quote as saying something stronger / more specific than it actually is.
Because it serves as a good example, simply put. It gets the idea clear across about what it means, even if there are certainly complexities in comparing evolution to the output of an SGD-trained neural network.
It predicts learning correlates of the reward signal that break apart outside of the typical environment.
When you look at the actual process for how we actually start to like ice-cream—namely, we eat it, and then we get a reward, and that’s why we like it—then the world looks a a lot less hostile, and misalignment a lot less likely.
Yes, that’s why we like it, and that is a way we’re misaligned with evolution (in the ‘do things that end up with vast quantities of our genes everywhere’ sense).
Our taste buds react to it, and they were selected for activating on foods which typically contained useful nutrients, and now they don’t in reality since ice-cream is probably not good for you. I’m not sure what this example is gesturing at? It sounds like a classic issue of having a reward function (‘reproduction’) that ends up with an approximation (‘your tastebuds’) that works pretty well in your ‘training environment’ but diverges in wacky ways outside of that.
I’m inferring by ‘evolution is only selecting hyperparameters’ is that SGD has less layers of indirection between it and the actual operation of the mind compared to evolution (which has to select over the genome which unfolds into the mind). Sure, that gives some reason to believe it will be easier to direct it in some ways—though I think there’s still active room for issues of in-life learning, I don’t really agree with Quintin’s idea that the cultural/knowledge-transfer boom with humans has happened thus AI won’t get anything like it—but even if we have more direct optimization I don’t see that as strongly making misalignment less likely? It does make it somewhat less likely, though it still has many large issues for deciding what reward signals to use.
I still expect correlates of the true objective to be learned, which even in-life training for humans have happen to them through sometimes associating not-related-thing to them getting a good-thing and not just as a matter of false beliefs. Like, as a simple example, learning to appreciate rainy days because you and your family sat around the fire and had fun, such that you later in life prefer rainy days even without any of that.
Evolution doesn’t directly grow minds, but it does directly select for the pieces that grow minds, and has been doing that for quite some time. There’s a reason why it didn’t select for tastebuds that gave a reward signal strictly when some other bacteria in the body reported that they would benefit from it: that’s more complex (to select for), opens more room for ‘bad reporting’, may have problems with shorter gut bacteria lifetimes(?), and a simpler tastebud solution captured most of what it needed!
The way he’s using the example of evolution is captured entirely by that, quite directly, and I don’t find it objectionable.
I simply do not understand why people keep using this example.
I think it is wrong—evolution does not grow minds, it grows hyperparameters for minds. When you look at the actual process for how we actually start to like ice-cream—namely, we eat it, and then we get a reward, and that’s why we like it—then the world looks a a lot less hostile, and misalignment a lot less likely.
But given that this example is so controversial, even if it were right why would you use it—at least, why would you use it if you had any other example at all to turn to?
Why on push so hard for “natural selection” and “stochastic gradient descent” to be beneath the same tag of “optimization”, and thus to be able to infer things about the other from the analogy? Have we completely forgotten that the glory of words is not to be expansive, and include lots of things in them, but to be precise and narrow?.
Does evolution ~= AI have predictive power apart from doom? I have yet to see how natural selection helps me predict how any SGD algorithm works. It does not distinguish between Adam, AdamW. As far as I know it is irrelevant to Singular Learning Theory or NTK or anything else. It doesn’t seem to come up when you try to look at NN biases. If it isn’t an illuminating analogy anywhere else, why do we think the way it predicts doom to be true?
I think Nate’s claim “I expect them to care about a bunch of correlates of the training signal in weird and specific ways.” is plausible, at least for the kinds of AGI architectures and training approaches that I personally am expecting. If you don’t find the evolution analogy useful for that (I don’t either), but are OK with human within-lifetime learning as an analogy, then fine! Here goes!
OK, so imagine some “intelligent designer” demigod, let’s call her Ev. In this hypothetical, the human brain and body were not designed by evolution, but rather by Ev. She was working 1e5 years ago, back on the savannah. And her design goal was for these humans to have high inclusive genetic fitness.
So Ev pulls out a blank piece of paper. First things first: She designed the human brain with a fancy large-scale within-lifetime learning algorithm, so that these humans can gradually get to understand the world and take good actions in it.
Supporting that learning algorithm, she needs a reward function (“innate drives”). What to do there? Well, she spends a good deal of time thinking about it, and winds up putting in lots of perfectly sensible components for perfectly sensible reasons.
For example: She wanted the humans to not get injured, so she installed in the human body a system to detect physical injury, and put in the brain an innate drive to avoid getting those injuries, via an innate aversion (negative reward) related to “pain”. And she wanted the humans to eat sugary food, so she put a sweet-food-detector on the tongue and installed in the brain an innate drive to trigger reinforcement (positive reward) when that detector goes off (but modulated by hunger, as detected by yet another system). And so on.
Then she did some debugging and hyperparameter tweaking by running these newly-designed humans in the training environment (African savannah) and seeing how they do.
So that’s how Ev designed humans. Then she “pressed go” and lets them run for 1e5 years. What happened?
Well, I think it’s fair to say that modern humans “care about” things that probably would have struck Ev as “weird”. (Although we, with the benefit of hindsight, can wag our finger at Ev and say that she should have seen them coming.) For example:
Superstitions and fashions: Some people care, sometimes very intensely, about pretty arbitrary things that Ev could not have possibly anticipated in detail, like walking under ladders, and where Jupiter is in the sky, and exactly what tattoos they have on their body.
Lack of reflective equilibrium resulting in self-modification: Ev put a lot of work into her design, but sometimes people don’t like some of the innate drives or other design features that Ev put into them, so the people go right ahead and change them! For example, they don’t like how Ev designed their hunger drive, so they take Ozempic. They don’t like how Ev designed their attentional system, so they take Adderall. Many such examples.
New technology / situations leading to new preferences and behaviors: When Ev created the innate taste drives, she was (let us suppose) thinking about the food options available on the savannah, and thinking about what drives would lead to people making smart eating choices in that situation. And she came up with a sensible and effective design for a taste-receptors-and-associated-innate-drives system that worked well for that circumstance. But maybe she wasn’t thinking that humans would go on to create a world full of ice cream and coca cola and miraculin and so on. Likewise, Ev put in some innate drives with the idea that people would wind up exploring their local environment. Very sensible! But Ev would probably be surprised that her design is now leading to people “exploring” open-world video-game environments while cooped up inside. Ditto with social media, organized religion, sports, and a zillion other aspects of modern life. Ev probably didn’t see any of it coming when she was drawing up and debugging her design, certainly not in any detail.
To spell out the analogy here:
Ev ↔ AGI programmers;
Human within-lifetime learning ↔ AGI training;
Adult humans ↔ AGIs;
Ev “presses go” and lets human civilization “run” for 1e5 years without further intervention ↔ For various reasons I consider it likely (for better or worse) that there will eventually be AGIs that go off and autonomously do whatever they think is a good thing to do, including inventing new technologies, without detailed human knowledge and approval.
Modern humans care about (and do) lots of things that Ev would have been hard-pressed to anticipate, even though Ev designed their innate drives and within-lifetime learning algorithm in full detail ↔ even if we carefully design the “innate drives” of future AGIs, we should expect to be surprised about what those AGIs end up caring about, particularly when the AGIs have an inconceivably vast action space thanks to being able to invent new technology and build new systems.
Evolution analogies predict a bunch of facts that are so basic they’re easy to forget about, and even if we have better theories for explaining specific inductive biases, the simple evolution analogies should still get some weight for questions we’re very uncertain about.
Selection works well to increase the thing you’re selecting on, at least when there is also variation and heredity
Overfitting: sometimes models overfit to a certain training set; sometimes species adapt to a certain ecological niche and their fitness is low outside of it
Vanishing gradients: fitness increase in a subpopulation can be prevented by lack of correlation between available local changes to genes and fitness
Catastrophic forgetting: when trained on task A then task B, models often lose circuits specific to task A; when put in environment A then environment B species often lose vestigial structures useful in environment A
There’s a mostly unimodal and broad peak for optimal learning rate, just like for optimal mutation rate
Adversarial training dynamics
Adversarial examples usually exist (there exist chemicals that can sterilize or poison most organisms)
Adversarial training makes models more robust (bacteria can evolve antibiotic resistance)
Adversarially trained models generally have worse performance overall (antibiotic-resistant bacteria are outcompeted by normal bacteria when there are no antibiotics)
The attacker can usually win the arms race of generating and defending against adversarial attacks (evolutionary arms races are very common)
A few things that feel more tenuous
maybe NTK lottery ticket hypothesis; when mutation rates are low evolution can be approximated as taking the best-performing organism; when total parameter distance is small SGD can be approximated as taking the best-performing model from the parameter tangent space
maybe inner optimizers; transformers learn in context by gradient descent while evolution invents brains, positive and negative selection of T cells to prevent them attacking the body, probably other things
Task vectors: adding sparse task vectors together often produces a model that can do both tasks; giving an organism alleles for two unrelated genetic disorders often gives it both disorders
Grokking/punctuated equilibrium: in some circumstances applying the same algorithm for 100 timesteps causes much larger changes in model behavior / organism physiology than in other circumstances [edit: moved this from above because 1a3orn makes the case that it’s not very central]
I agree that if you knew nothing about DL you’d be better off using that as an analogy to guide your predictions about DL than using an analogy to a car or a rock.
I do think a relatively small quantity of knowledge about DL screens off the usefulness of this analogy; that you’d be better off deferring to local knowledge about DL than to the analogy.
Or, what’s more to the point—I think you’d better defer to an analogy to brains than to evolution, because brains are more like DL than evolution is.
Combining some of yours and Habryka’s comments, which seem similar.
It’s true that the structure of the solution is discovered and complex—but the ontology of the solution for DL (at least in currently used architectures) is quite opinionated towards shallow circuits with relatively few serial ops. This is different than the bias for evolution, which is fine with a mutation that leads to 10^7 serial ops if it’s metabolic costs are low. So the resemblance seems shallow other than “solutions can be complex.” I think to the degree that you defer to this belief rather than more specific beliefs about the inductive biases of DL you’re probably just wrong.
As far as I know optimal learning rate for most architectures is scheduled, and decreases over time, which is not a feature of evolution so far as I am aware? Again the local knowledge is what you should defer to.
Is this a prediction that a cyclic learning rate—that goes up and down—will work out better than a decreasing one? If so, that seems false, as far as I know.
As far as I know grokking is a non-central example of how DL works, and in evolution punctuated equilibrium is a result of the non-i.i.d. nature of the task, which is again a different underlying mechanism from DL. If apply DL on non-i.i.d problems then you don’t get grokking, you just get a broken solution. This seems to round off to, “Sometimes things change faster than others,” which is certainly true but not predictively useful, or in any event not a prediction that you couldn’t get from other places.
Like, leaving these to the side—I think the ability to post-hoc fit something is questionable evidence that it has useful predictive power. I think the ability to actually predict something else means that it has useful predictive power.
Again, let’s take “the brain” as an example of something to which you could analogize DL.
There are multiple times that people have cited the brain as an inspiration for a feature in current neural nets or RL. CNNS, obviously; the hippocampus and experience replay; randomization for adversarial robustness. You can match up interventions that cause learning deficiencies in brains to similar deficiencies in neural networks. There are verifiable, non-post hoc examples of brains being useful for understanding DL.
As far as I know—you can tell me if there are contrary examples—there are obviously more cases where inspiration from the brain advanced DL or contributed to DL understanding than inspiration from evolution. (I’m aware of zero, but there could be some.) Therefore it seems much more reasonable to analogize from the brain to DL, and to defer to it as your model.
I think in many cases it’s a bad idea to analogize from the brain to DL! They’re quite different systems.
But they’re more similar than evolution and DL, and if you’d not trust the brain to guide your analogical a-theoretic low-confidence inferences about DL, then it makes more sense to not trust evolution for the same.
FWIW my take is that the evolution-ML analogy is generally a very excellent analogy, with a bunch of predictive power, but worth using carefully and sparingly. Agreed that sufficient detail on e.g. DL specifics can screen off the usefulness of the analogy, but it’s very unclear whether we have sufficient detail yet. The evolution analogy was originally supposed to point out that selecting a bunch for success on thing-X doesn’t necessarily produce thing-X-wanters (which is obviously true, but apparently not obvious enough to always be accepted without providing an example).
Not sure where to land on that. It seems like both are good analogies? Brains might not be using gradients at all[1], whereas evolution basically is. But brains are definitely doing something like temporal-difference learning, and the overall ‘serial depth’ thing is also weakly in favour of brains ~= DL vs genomes+selection ~= DL.
I’d love to know what you’re referring to by this:
Also,
I think the jury is still out on this, but there’s literature on it (probably much more I haven’t fished out). [EDIT: also see this comment which has some other examples]
AFAIK there’s no evidence of this and it would be somewhat surprising to find it playing a major role. Then again, I also wouldn’t be surprised if it turned out that brains are doing something which is secretly sort of equivalent to gradient descent.
I’m genuinely surprised at the “brains might not be doing gradients at all” take; my understanding is they are probably doing something equivalent.
Similarly this kind of paper points in the direction of LLMs doing something like brains. My active expectation is that there will be a lot more papers like this in the future.
But to be clear—my overall view of the similarity of brain to DL is admittedly fueled less by these specific papers, though, which are nice gravy for my view but not the actual foundation, and much more by what I see as the predictive power of hypotheses like this, which are massively more impressive inasmuch as they were made before Transformers had been invented. Given Transformers, the comparison seems overdetermined; I wish I had seen that way back in 2015.
Re. serial ops and priors—I need to pin down the comparison more, given that it’s mostly about the serial depth thing, and I think you already get it. The base idea is that what is “simple” to mutations and what is “simple” to DL are extremely different. Fuzzily: A mutation alters protein-folding instructions, and is indifferent to the “computational costs” of working this out in reality; if you tried to work out the analytic gradient for the mutation (the gradient over mutation → protein folding → different brain → different reward → competitors children look yummy → eat em) your computer would explode. But DL seeks only a solution that can be computed in a big ensemble of extremely short circuits, learned almost entirely specifically because of the data off of which you’ve trained. Ergo DL has very different biases, where the “complexity” for mutations probably has to do with instructional length where, “complexity” for DL is more related to how far you are from whatever biases are engrained in the data (<--this is fuzzy), and the shortcut solutions DL learns are always implied from the data.
So when you try to transfer intuitions about the “kind of solution” DL gets from evolution (which ignores this serial depth cost) to DL (which is enormously about this serial depth cost) then the intuition breaks. As far as I can tell that’s why we have this immense search for mesaoptimizers and stuff, which seems like it’s mostly just barking up the wrong tree to me. I dunno; I’d refine this more but I need to actually work.
Re. cyclic learning rates: Both of us are too nervous about the theory --> practice junction to make a call on how all this transfers to useful algos (Although my bet is that it won’t.). But if we’re reluctant to infer from this—how much more from evolution?
Mm, thanks for those resource links! OK, I think we’re mostly on the same page about what particulars can and can’t be said about these analogies at this point. I conclude that both ‘mutation+selection’ and ‘brain’ remain useful, having both is better than having only one, and care needs to be taken in any case!
As I said,
so I’m looking forward to reading those links.
Runtime optimisation/search and whatnot remain (broadly-construed) a sensible concern from my POV, though I wouldn’t necessarily (at first) look literally inside NN weights to find them. I think more likely some scaffolding is needed, if that makes sense (I think I am somewhat idiosyncratic in this)? I get fuzzy at this point and am still actively (slowly) building my picture of this—perhaps your resource links will provide me fuel here.
I mean, does it matter? What if it turns out that gradient descent itself doesn’t affect inductive biases as much as the parameter->function mapping? If implicit regularization (e.g. SGD) isn’t an important part of the generalization story in deep learning, will you down-update on the appropriateness of the evolution/AI analogy?
https://www.youtube.com/watch?v=GM6XPEQbkS4 (talk) / https://arxiv.org/abs/2307.06324 prove faster convergence with a periodic learning rate. On a specific ‘nicer’ space than reality, and they’re (I believe from what I remember) comparing to a good bound with a constant stepsize of 1. So it may be one of those papers that applies in theory but not often in practice, but I think it is somewhat indicative.
It’s always trickier to reason about post-hoc, but some of the observations could be valid, non-cherry-picked parallels between evolution and deep learning that predict further parallels.
I think looking at which inspired more DL capabilities advances is not perfect methodology either. It looks like evolution predicts only general facts whereas the brain also inspires architectural choices. Architectural choices are publishable research whereas general facts are not, so it’s plausible that evolution analogies are decent for prediction and bad for capabilities. Don’t have time to think this through further unless you want to engage.
One more thought on learning rates and mutation rates:
This feels consistent with evolution, and I actually feel like someone clever could have predicted it in advance. Mutation rate per nucleotide is generally lower and generation times are longer in more complex organisms; this is evidence that lower genetic divergence rates are optimal, because evolution can tune them through e.g. DNA repair mechanisms. So it stands to reason that if models get more complex during training, their learning rate should go down.
Does anyone know if decreasing learning rate is optimal even when model complexity doesn’t increase over time?
Not sure what you mean here. One of the best explanations of how neural networks get trained uses basically a pure natural selection lens, and I think it gets most predictions right:
CGP Grey “How AIs, like ChatGPT, Learn” https://www.youtube.com/watch?v=R9OHn5ZF4Uo
There is also a follow-up video that explains SGD:
CGP Grey “How AI, Like ChatGPT, *Really* Learns” https://www.youtube.com/watch?v=wvWpdrfoEv0
In-general I think if you use a natural selection analogy you will get a huge amount of things right about how AI works, though I agree not everything (it won’t explain the difference between Adam and AdamW, but it will explain the difference between hierarchical bayesian networks, linear regression and modern deep learning).
Note: I just watched the videos. I personally would not recommend the first video as an explanation to a layperson if I wanted them to come away with accurate intuitions around how today’s neural networks learn / how we optimize them. What it describes is a very different kind of optimizer, one explicitly patterned after natural selection such as a genetic algorithm or population-based training, and the follow-up video more or less admits this. I would personally recommend they opt for videos these instead:
3Blue1Brown—Gradient descent, how neural networks learn
Emergent Garden—Watching Neural Networks Learn
WIRED—Computer Scientist Explains Machine Learning in 5 Levels of Difficulty
Except that selection and gradient descent are closely mathematically related—you have to make a bunch of simplifying assumptions, but ‘mutate and select’ (evolution) is actually equivalent to ‘make a small approximate gradient step’ (SGD) in the limit of small steps.
I read the post and left my thoughts in a comment. In short, I don’t think the claimed equivalence in the post is very meaningful.
(Which is not to say the two processes have no relationship whatsoever. But I am skeptical that it’s possible to draw a connection stronger than “they both do local optimization and involve randomness.”)
Awesome, I saw that comment—thanks, and I’ll try to reply to it in more detail.
It looks like you’re not disputing the maths, but the legitimacy/meaningfulness of the simplified models of natural selection that I used? From a skim, the caveats you raised are mostly/all caveated in the original post too—though I think you may have missed the (less rigorous but more realistic!) second model at the end, which departs from the simple annealing process to a more involved population process.
I think even on this basis though, it’s going too far to claim that the best we can say is “they both do local optimization and involve randomness”! The steps are systematically pointed up/down the local fitness gradient, for one. And they’re based on a sample-based stochastic realisation for another.
I don’t want you to get the impression I’m asking for too much from this analogy. But the analogy is undeniably there. In fact, in those explainer videos Habryka linked, the particular evolution described is a near-match for my first model (in which, yes, it departs from natural genetic evolution in the same ways).
I’m disputing both. Re: math, the noise in your model isn’t distributed like SGD noise, and unlike SGD the the step size depends on the gradient norm. (I know you did mention the latter issue, but IMO it rules out calling this an “equivalence.”)
I did see your second proposal, but it was a mostly-verbal sketch that I found hard to follow, and which I don’t feel like I can trust without seeing a mathematical presentation.
(FWIW, if we have a population that’s “spread out” over some region of a high-dim NN loss landscape—even if it’s initially a small / infinitesimal region—I expect it to quickly split up into lots of disjoint “tendrils,” something like dye spreading in water. Consider what happens e.g. at saddle points. So the population will rapidly “speciate” and look like an ensemble of GD trajectories instead of just one.
If your model assumes by fiat that this can’t happen, I don’t think it’s relevant to training NNs with SGD.)
Wait, you think that a model which doesn’t speciate isn’t relevant to SGD? I’ll need help following, unless you meant something else. It seems like speciation is one of the places where natural evolutions distinguish themselves from gradient descent, but you seem to also be making this point?
In the second model, we retrieve non-speciation by allowing for crossover/horizontal transfer, and yes, essentially by fiat I rule out speciation (as a consequence of the ‘eventually-universal mixing’ assumption). In real natural selection, even with horizontal transfer, you get speciation, albeit rarely. It’s obviously a fascinating topic, but I think pretty irrelevant to this analogy.
For me, the step-size thing is interesting but essentially a minor detail. Any number of practical departures from pure SGD mess with the step size anyway (and with the gradient!) so this feels like asking for too much. Do we really think SGD vs momentum vs Adam vs … is relevant to the conclusions we want to draw? (Serious question; my best guess is ‘no’, but I hold that medium-lightly.)
(irrelevant nitpick by my preceding paragraph, but) FWIW vanilla SGD does depend on gradient norm. [ETA: I think I misunderstood exactly what you were saying by ‘step size depends on the gradient norm’, so I think we agree about the facts of SGD. But now think about the space including SGD, RMSProp, etc. The ‘depends on gradient norm’ piece which arises from my evolution model seems entirely at home in that family.]
On the distribution of noise, I’ll happily acknowledge that I didn’t show equivalence. I half expect that one could be eked out at a stretch, but I also think this is another minor and unimportant detail.
I agree that they are related. In the context of this discussion, the critical difference between SGD and evolution is somewhat captured by your Assumption 1:
Evolution does not directly select/optimize the content of minds. Evolution selects/optimizes genomes based (in part) on how they distally shape what minds learn and what minds do (to the extent that impacts reproduction), with even more indirection caused by selection’s heavy dependence on the environment. All of that creates a ton of optimization “slack”, such that large-brained human minds with language could steer optimiztion far faster & more decisively than natural selection could. This what 1a3orn was pointing to earlier with
SGD does not have that slack by default. It acts directly on cognitive content (associations, reflexes, decision-weights), without slack or added indirection. If you control the training dataset/environment, you control what is rewarded and what is penalized, and if you are using SGD, then this lets you directly mold the circuits in the model’s “brain” as desired. That is one of the main alignment-relevant intuitions that gets lost when blurring the evolution/SGD distinction.
Right. And in the context of these explainer videos, the particular evolution described has the properties which make it near-equivalent to SGD, I’d say?
Hmmm, this strikes me as much too strong (especially ‘this lets you directly mold the circuits’).
Remember also that with RLHF, we’re learning a reward model which is something like the more-hardcoded bits of brain-stuff, which is in turn providing updates to the actually-acting artefact, which is something like the more-flexibly-learned bits of brain-stuff.
I also think there’s a fair alternative analogy to be drawn like
evolution of genome (including mostly-hard-coded brain-stuff) ~ SGD (perhaps +PBT) of NN weights
within-lifetime-learning of organism ~ in-context something-something of NN
(this is one analogy I commonly drew before RLHF came along.)
So, look, the analogies are loose, but they aren’t baseless.
Source?
CGP Grey’s video is a decent example source. Most of the differences between hierarchical bayesian networks and modern deep learning come across pretty well if you model the latter as a type of genetic algorithm search:
The resulting structure of the solution is mostly discovered not engineered. The ontology of the solution is extremely unopinionated and can contain complicated algorithms that we don’t know exist.
Training consists of a huge amount of trial and error where you take datapoints, predict something about the result, then search for nearby modifications that do better, then repeat until performance plateaus.
You are ultimately doing a local search, which means you can get stuck at local minima, unless you do something like increase your step size or increase the mutation rate
There are also just actually deep similarities. Vanilla SGD is perfectly equivalent to a genetic search with an infinitesimally small mutation size and infinite samples per generation (I could make a proof here but won’t unless someone is interested in it). Indeed in one of my ML classes at Berkeley genetic algorithms were suggested as one of the obvious things to do in an indifferentiable loss-landscape as generalization of SGD, where you just try some mutations, see which one performs best, and then modify your parameters in that direction.
Oh, I actually did that a year or so ago
Two observations:
If you think that people’s genes would be a lot fitter if people cared about fitness more then surely there’s a good chance that a more efficient version of natural selection would lead to people caring more about fitness.
You might, on the other hand, think that the problem is more related to feedbacks. I.e. if you’re the smartest monkey, you can spend your time scheming to have all the babies. If there are many smart monkeys, you have to spend a lot of time worrying about what the other monkeys think of you. If this is how you’re worried misalignment will arise, then I think “how do deep learning models generalise?” is the wrong tree to bark up
C. If people did care about fitness, would Yudkowsky not say “instrumental convergence! Reward hacking!”? I’d even be inclined to grant he had a point.
Imo this is a nitpick that isn’t really relevant to the point of the analogy. Evolution is a good example of how selection for X doesn’t necessarily lead to a thing that wants (‘optimizes for’) X; and more broadly it’s a good example for how the results of an optimization process can be unexpected.
I want to distinguish two possible takes here:
The argument from direct implication: “Humans are misaligned wrt evolution, therefore AIs will be misaligned wrt their objectives”
Evolution as an intuition pump: “Thinking about evolution can be helpful for thinking about AI. In particular it can help you notice ways in which AI training is likely to produce AIs with goals you didn’t want”
It sounds like you’re arguing against (1). Fair enough, I too think (1) isn’t a great take in isolation. If the evolution analogy does not help you think more clearly about AI at all then I don’t think you should change your mind much on the strength of the analogy alone. But my best guess is that most people incl Nate mean (2).
I think it’s extremely relevant, if we want to ensure that we only analogize between processes which share enough causal structure to ensure that lessons from e.g. evolution actually carry over to e.g. AI training (due to those shared mechanisms). If the shared mechanisms aren’t there, then we’re playing reference class tennis because someone decided to call both processes “optimization processes.”
The argument I think is good (nr (2) in my previous comment) doesn’t go through reference classes at all. I don’t want to make an outside-view argument (eg “things we call optimization often produce misaligned results, therefore sgd is dangerous”). I like the evolution analogy because it makes salient some aspects of AI training that make misalignment more likely. Once those aspects are salient you can stop thinking about evolution and just think directly about AI.
Also relevent is Steven Byrnes’ excelent Against evolution as an analogy for how humans will create AGI.
It has been over two years since the publication of that post, and criticism of this analogy has continued to intensify. The OP and other MIRI members have certainly been exposed to this criticism already by this point, and as far as I am aware, no principled defense has been made of the continued use of this example.
I encourage @So8res and others to either stop using this analogy, or to argue explicitly for its continued usage, engaging with the arguments presented by Byrnes, Pope, and others.
Humans are the only real-world example we have of human-level agents, and natural selection is the only process we know of for actually producing them.
SGD, singular learning theory, etc. haven’t actually produced human-level minds or a usable theory of how such minds work, and arguably haven’t produced anything that even fits into the natural category of minds at all, yet. (Maybe they will pretty soon, when applied at greater scale or in combination with additional innovations, either of which could result in the weird-correlates problem emerging.)
Also, the actual claims in the quote seem either literally true (humans don’t care about foods that they model as useful for inclusive genetic fitness) or plausible / not obviously false (when you grow minds [to human capabilities levels], they end up caring about a bunch of weird correlates). I think you’re reading the quote as saying something stronger / more specific than it actually is.
Because it serves as a good example, simply put. It gets the idea clear across about what it means, even if there are certainly complexities in comparing evolution to the output of an SGD-trained neural network.
It predicts learning correlates of the reward signal that break apart outside of the typical environment.
Yes, that’s why we like it, and that is a way we’re misaligned with evolution (in the ‘do things that end up with vast quantities of our genes everywhere’ sense). Our taste buds react to it, and they were selected for activating on foods which typically contained useful nutrients, and now they don’t in reality since ice-cream is probably not good for you. I’m not sure what this example is gesturing at? It sounds like a classic issue of having a reward function (‘reproduction’) that ends up with an approximation (‘your tastebuds’) that works pretty well in your ‘training environment’ but diverges in wacky ways outside of that.
I’m inferring by ‘evolution is only selecting hyperparameters’ is that SGD has less layers of indirection between it and the actual operation of the mind compared to evolution (which has to select over the genome which unfolds into the mind). Sure, that gives some reason to believe it will be easier to direct it in some ways—though I think there’s still active room for issues of in-life learning, I don’t really agree with Quintin’s idea that the cultural/knowledge-transfer boom with humans has happened thus AI won’t get anything like it—but even if we have more direct optimization I don’t see that as strongly making misalignment less likely? It does make it somewhat less likely, though it still has many large issues for deciding what reward signals to use.
I still expect correlates of the true objective to be learned, which even in-life training for humans have happen to them through sometimes associating not-related-thing to them getting a good-thing and not just as a matter of false beliefs. Like, as a simple example, learning to appreciate rainy days because you and your family sat around the fire and had fun, such that you later in life prefer rainy days even without any of that.
Evolution doesn’t directly grow minds, but it does directly select for the pieces that grow minds, and has been doing that for quite some time. There’s a reason why it didn’t select for tastebuds that gave a reward signal strictly when some other bacteria in the body reported that they would benefit from it: that’s more complex (to select for), opens more room for ‘bad reporting’, may have problems with shorter gut bacteria lifetimes(?), and a simpler tastebud solution captured most of what it needed! The way he’s using the example of evolution is captured entirely by that, quite directly, and I don’t find it objectionable.