Against evolution as an analogy for how humans will create AGI


When we do Deep Reinforcement Learning to make a PacMan-playing AI (for example), there are two algorithms at play: (1) the “inner” algorithm (a.k.a. “policy”, a.k.a. “trained model”) is a PacMan-playing algorithm, which looks at the pixels and outputs a sequence of moves, and (2) the “outer” algorithm is a learning algorithm, probably involving gradient descent, which edits the “inner” PacMan-playing algorithm in a way that tends to improve it over time.

Likewise, when the human brain evolved, there were two algorithms at play: (1) the “inner” algorithm is the brain algorithm, which analyzes sensory inputs, outputs motor commands, etc., and (2) the “outer” algorithm is Evolution By Natural Selection, which edits the brain algorithm (via the genome) in a way that tends to increase the organism’s inclusive genetic fitness.

There’s an obvious parallel here: Deep RL involves a two-layer structure, and evolution involves a two-layer structure.

So there is a strong temptation to run with this analogy, and push it all the way to AGI. According to this school of thought, maybe the way we will eventually build AGI is by doing gradient descent (or some other optimization algorithm) and then the inner algorithm (a.k.a. trained model) will be an AGI algorithm—just as evolution designed the generally-intelligent human brain algorithm.

I think this analogy /​ development model is pretty often invoked by people thinking about AGI safety. Maybe the most explicit discussion is the one in Risks From Learned Optimization last year.

But I want to argue that the development of AGI is unlikely to happen that way.

Defining “The Evolution Analogy for AGI Development”: Three ingredients

I want to be specific about what I’m arguing against here. So I define the Evolution Analogy For AGI Development as having all 3 of the following pieces:

  • “Outer + Inner”: The analogy says that we humans will write and run an “outer algorithm” (e.g. gradient descent), which runs an automated search process to find an “inner algorithm” (a.k.a. “trained model”).

    • …Analogous to how evolution is an automated search that discovered the human brain algorithm.

    • (If you’ve read Risks From Learned Optimization you can mentally substitute the words “base & mesa” for “outer & inner” respectively; I’m using different words because I’m thinking of “base & mesa” as something very specific, and I want to talk more broadly.)

  • “Outer As Lead Designer”: The analogy says that the outer algorithm is doing the bulk of the “real design work”. So I am not talking about something like a hyperparameter search or neural architecture search, where the outer algorithm is merely adjusting a handful of legible adjustable parameters in the human-written inner algorithm code. Instead, I’m talking about a situation where the outer algorithm is really doing the hard work of figuring out fundamentally what the inner algorithm is and how it works, and meanwhile humans stare at the result and scratch their head and say “What on earth is this thing doing?” For example, the inner algorithm could internally have an RL submodule doing tree search, yet the humans have no idea that there’s any RL going on, or any tree search going on, or indeed have any idea how this thing is learning anything at all in the first place.

    • …Analogous to how evolution designed the human brain algorithm 100% from scratch.

  • “Inner As AGI”: The analogy says that “The AGI” is identified as the inner algorithm, not the inner and outer algorithm working together. In other words, if I ask the AGI a question, I don’t need the outer algorithm to be running in the course of answering that question.

    • ...Analogous to how, if you ask a human a question, they don’t reply “Ooh, that’s a hard question, hang on, let me procreate a few generations and then maybe my descendents will be able to help you!”

I want to argue that AGI will not be developed in a way where all three of these ingredients are present.

…On the other hand, I am happy to argue that AGI will be developed in a way that involves only two of these three ingredients!

  • What if the “Outer As Lead Designer” criterion does not apply? Then (as mentioned above) we’re talking about things like automated hyperparameter search or neural architecture search, which edit a handful of adjustable parameters (number of layers, learning rate, etc.) within a human-designed algorithm. Well, I consider it totally plausible that those kinds of search processes will be part of AGI development. Learning algorithms (and planning algorithms, etc.) inevitably have adjustable parameters that navigate tradeoffs in the design space. And sometimes the best way to navigate those tradeoffs is to just try running the thing! Try lots of different settings and find what works best empirically—i.e., wrap it in an outer-loop optimization algorithm. Again, I don’t count this as a victory for the evolution analogy, because the inner algorithm is still primarily designed by a human, and is legible to humans.

  • What if the “Inner As AGI” criterion does not apply? Then the outer algorithm is an essential part of the AGI’s operating algorithm. I definitely see that as plausible—indeed likely—and this is how I think of within-lifetime human learning. Much more on this in a bit—see “Intelligence via online learning” below. If this is in fact the path to AGI, then we wind up with a different biological analogy…

A biological analogy I like much better: The “genome = code” analogy

Human intelligence

Artificial intelligence

Human genomeGitHub repository with all the PyTorch code for training and running the PacMan-playing agent
Within-lifetime learningTraining the PacMan-playing agent
How an adult human thinks and actsTrained PacMan-playing agent

Note that evolution is not in this picture: its role has been usurped by the engineers who wrote the PyTorch code. This is intelligent design, not evolution!

A motivating question: Two visions for how brain-like AGI would come to be

I’m trying to make a general argument in this post, but here is a concrete example to keep in mind.

As discussed here, I see the brain as having a “neocortex subsystem” that runs a particular learning algorithm—one which takes in sensory inputs and reward inputs, constructs a predictive world-model, and takes foresighted actions that tend to lead to high rewards. Then there is a different subsystem that (among many other things) calculates those rewards I just mentioned.

The learning algorithm is what the “neocortex subsystem” does. The “learned content” includes things like “tires are black”, “I love the thrill of discovery”, problem-solving strategies, what I had for breakfast, how to skateboard, etc. Rewards include things like “pain is bad”, “sweet taste is good”, “being confused is bad”, “being popular is good”, etc. “Other stuff” includes getting goosebumps when you’re cold, regulating your heart-rate, etc.

Maybe we’ll make an AGI that has some resemblance to this system. Actually, I should be more specific: Maybe we’ll make an AGI with this general structure, in which the learning algorithm component has some principles in common with the human brain’s learning algorithm component. (The “reward calculator” and “other stuff” will obviously not be human-brain-like in any detail—unless we’re doing whole-brain-emulation which is a different topic—since AGIs don’t need to regulate their heart rate, or to have instinctive reactions to big hairy spiders, etc.)

Now, assuming that we make an AGI that has some resemblance to this system, consider two scenarios for how that happens:

  • The scenario I don’t consider likely is where an automated search discovers both the neocortex subsystem learning algorithm and the reward calculator, tangled together into a big black box, with the programmer having no idea of how that algorithm is structured or what it’s doing.

  • The scenario I do consider likely is where humans design something like the neocortex subsystem learning algorithm by itself, using “the usual engineering approach” (see below)—trying to figure out how the learning algorithm is supposed to work, writing code, testing and iterating, etc. And then, in this scenario, the humans probably by-and-large ignore the rest of the brain, including the reward calculation, and they just insert their own reward function (and/​or other “steering systems”), starting with whatever is easy and obvious, and proceeding by trial-and-error or whatever, as they try to get this cool new learning algorithm do the things they want it to do. (Much more on this scenario in a forthcoming post.)

In the remainder of the post I’ll go over three reasons suggesting that the first scenario would be much less likely than the second scenario. First I’ll offer a couple outside-view arguments. Second I’ll work through the various possibilities for how the training and episode lengths would work. Third I’ll argue that the tangled-together black box in the first scenario would run with a horrific (I expect many orders of magnitude) performance penalty compared to the second scenario, due to neither the programmers nor the compiler toolchain having visibility into the black box. (I’m talking here about a run-time performance penalty. So this is on top of the computational costs of the original automated search that designed the black box.)

(I also have a very-inside-view argument—that the second scenario is already happening and well on its way to completion—but I won’t get into that, it’s more speculative and outside the scope of this post.)

Anyway, comparing these two scenarios, I have no idea which of them would make it easier or harder to develop Safe And Beneficial AGI. (There are very difficult inner alignment problems in both cases—more on which in a forthcoming post.) But they are different scenarios, and I want us to be putting more effort into planning for whichever one is likelier to happen! I could be wrong here. Let’s figure it out!

1. A couple outside-view arguments

Now that we’re done with the background section, we’re on to the first of my three arguments against the evolution analogy: invoking a couple outside views.

Outside view #1: How biomimetics has always worked

Here’s a typical example. Evolution has made wing-flapping animals. Human engineers wanted to make a wing-flapping flying machine. What those engineers did not do was imitate evolution by, say, running many generations of automated search over body plans and behaviors in a real or simulated environment and rewarding the ones that flew better. What they did do was to take, let’s call it, “the usual engineering approach”. That involves some combination of (1) trying to understand how wing-flapping animals fly, (2) trying to understand aerodynamics and the principles of flight more generally, (3) taking advantage of any available tools and techniques, (4) trial-and-error, (5) hypothesis-driven testing and iteration, etc. etc.

By the same token, Evolution has made human-level intelligence. Human engineers want to make human-level-intelligent machines. Just like the paragraph above, I expect them to take “the usual engineering approach”. That involves some combination of (1) trying to understand how human intelligence works, (2) trying to understand the nature of intelligence and intelligent algorithms more generally, (3) taking advantage of any available tools and techniques, (4) trial-and-error, (5) hypothesis-driven testing and iteration, etc. etc.

More examples along the same lines: (A) When people first started to build robots, they were inspired by human and animal locomotion, and they hooked up actuators and hinges etc. to make moving machines. There was no evolution-like outer-loop automated search process involved. (B) The Wright Brothers were inspired by, and stealing ideas from, soaring birds. There was no evolution-like outer-loop automated search process involved. (C) “Artificial photosynthesis” is an active field of research trying to develop systems that turn sunlight directly into chemical fuels. None of the ongoing research threads, to my knowledge, involve an evolution-like outer-loop automated search process (except very for narrow questions, like what molecule to put in a particular spot within the human-designed overall architecture). You get the idea.

Outside view #2: How learning algorithms have always been developed

As described above, I expect AGI to be a learning algorithm—for example, it should be able to read a book and then have a better understanding of the subject matter. Every learning algorithm you’ve ever heard of—ConvNets, PPO, TD learning, etc. etc.—was directly invented, understood, and programmed by humans. None of them were discovered by an automated search over a space of algorithms. Thus we get a presumption that AGI will also be directly invented, understood, and programmed by humans.

(Update: Admittedly, you can say “the GPT-3 trained model (inner algorithm) is a learning algorithm”, in the sense that it has 96 layers, and it sorta “learns” things in earlier layers and “applies that knowledge” in later layers. And that was developed by an automated search. I don’t count that because I don’t think this type of “learning algorithm” is exactly the right type of learning algorithm that will be sufficient for AGI by itself; see discussion of GPT-3 in a later section, and also elaboration in my comment here.)

In general, which algorithms are a good fit for automated design (= design by learning algorithm), and which algorithms are a good fit for human design?

When we wanted to label images using a computer, we invented a learning algorithm (ConvNet + SGD) that looks at a bunch of images and gradually learns how to label images. By the same token, when we want to do human-level cognitive tasks with a computer, I claim that we’ll invent a learning algorithm that reads books and watches movies and interacts and whatever else, and gradually learns how to do human-level cognitive tasks.

Why not go one level up and invent a learning algorithm that will invent a learning algorithm that will gradually learn how to do human-level cognitive tasks? (Or a learning algorithm that will invent a learning algorithm that will invent a learning algorithm that will…)

Indeed, on what principled grounds can I say “Learning algorithms are a good way to develop image classification algorithms”, but also say “Learning algorithms are a bad way to develop learning algorithms?”

My answer is that, generically, automated design (= design by learning algorithm) is the best way to build algorithms that are (1) not computationally intensive to run (so we can easily run them millions of times), and (2) horrifically complicated (so that human design is intractable).

So image classification algorithms are a perfect fit for automated design (= design by learning algorithm). They’re easy to run—we can run a ConvNet image classifier model thousands of times a second, no problem. And they’re horrifically complicated, because they need logic that captures the horrific object-level complexity of the world, like the shape and coloration of trucks.

Whereas learning algorithms themselves are a terrible fit for automated design. They are famously computationally expensive—people often run learning algorithms for weeks straight on a heavy-duty supercomputer cluster. And they are not horrifically complicated. They fundamentally work by simple, general principles—things like gradient descent, and “if you’ve seen something, it’s likely that you’ll see it again”, and “things are often composed of other things”, and “things tend to be localized in time and space”, etc.

So in all respects, learning algorithms seem to be a natural fit for human design and a bad fit for automated design, while image classifiers are the reverse.

Possible objections to the learning-algorithm-outside-view argument

Objection: Learning a learning algorithm is not unheard of—it’s a thing! Humans do it when they take a course on study strategies. Machines do it in meta-learning ML papers.

Response: For the human example, yes, humans can learn meta-cognitive strategies which in turn impact future learning. But learning algorithms always involve an interaction between the algorithm itself and what-has-been-learned-so-far. Even gradient descent takes a different step depending on the current state of the model-in-training. See the “Inner As AGI” criterion near the top for why this is different from the thing I’m arguing against.

For meta-learning in ML, see the “Outer as lead designer” criterion near the top. I’m not a meta-learning expert, but my understanding is that meta-learning papers are not engaged in the radical project of designing a learning algorithm from scratch—where we just have no idea what the learning algorithm’s operating principles are. Rather, the meta-learning work I’ve seen is in the same category as hyperparameter search and neural architecture search, in that we take a human-designed learning algorithm, in which there are some adjustable parameters, and the meta-learning techniques are about using learning algorithms to adjust those parameters. Maybe there are exceptions, but if so, those efforts have not led to state-of-the-art results, tellingly. (At least, not that I know of.) For example, if you read the AlphaStar paper, you see a rather complicated learning algorithm—it involved supervised learning, pointer networks, TD(λ), V-trace, UPGO, and various other components—but every aspect of that learning algorithm was written by humans, except maybe for the values of some adjustable parameters.

Objection: If you can make AGI by combining a legible learning algorithm with a legible reward function, why haven’t AI researchers done so yet? Why did Evolution take billions of years to make a technological civilization?

Response: I think we don’t have AGI today for the same reason we didn’t have GPT-3 in 2015: In 2015, nobody had invented Transformers yet, let alone scaled them up. Some learning algorithms are better than others; I think that Transformers were an advance over previous learning algorithms, and by the same token I expect that yet-to-be-invented learning algorithms will be an advance over Transformers.

Incidentally, I think GPT-3 is great evidence that human-legible learning algorithms are up to the task of directly learning and using a common-sense world-model. I’m not saying that GPT-3 is necessarily directly on the path to AGI; instead I’m saying, How can you look at GPT-3 (a simple learning algorithm with a ridiculously simple objective) and then say, “Nope! AGI is way beyond what human-legible learning algorithms can do! We need a totally different path!”?

As for evolution, an AGI-capable learning algorithm can reach AGI but certainly doesn’t have to; it depends on the reward function and hyperparameters (including model size, i.e. size of the neocortex /​ pallium), and environment. One aspect of the environment is a culture full of ideas, which was a massive chicken-and-egg problem for early humans—early humans had no incentive to share ideas if no one was listening, and early humans had no incentive to absorb ideas if no one was saying them. AGI programmers do not face that problem.

Objection: Reasoning is special. Where does the capacity to reason come from, if not a separate outer-loop learning algorithm?

Response: I don’t think reasoning is special. See System 2 as working-memory augmented System 1 reasoning. I think an RL algorithm can learn to do a chain of reasoning in the same way as it learns to do a sequence of actions.

2. Split into cases based on how the algorithm comes to understand the world

To proceed further, I need to be a bit more specific.

There’s a certain capability, where an algorithm takes unstructured input data (e.g. sensory inputs) and uses it to build and expand a common-sense model of the world, rich with concepts that build on other concepts in a huge and ever-expanding web of knowledge. This capability is part of what we expect and demand from an AGI. We want to be able to ask it a very difficult question, on a topic it hasn’t considered before—maybe a topic nobody has ever thought about before!—and have the AGI develop an understanding of the domain, and the relevant considerations, and create a web of new concepts for thinking about that domain, and so on.

Let’s assume, following the evolution analogy, that there’s an outer algorithm that performs an automated search for an inner algorithm. The two cases are: (A) The inner algorithm (once trained) can do this knowledge-building thing by itself, without any real-time intervention from the outer algorithm; or (B) it can’t, but the inner and outer algorithm working together do have this capability (as in online learning, within-lifetime human learning, etc.—and here there isn’t necessarily an outer-vs-inner distinction in the first place). I’ll subdivide (A) into four subcases, and end up with 5 cases total.

Just as a teaser:

  • If we exactly reproduce the process of evolution of the human brain, with evolution as the outer layer and the human brain as the inner layer, then we’re in Case 2 below.

  • If you believe AGI will be developed along the lines of the “genome = code” analogy I endorsed above, then we’re in Case 1 below.

  • If there’s any scenario where the evolution analogy would work well, I think it would probably be Case 5 below. I’ll argue that Case 5 is unlikely to happen, but I suppose it’s not impossible.

OK, now let’s go through the cases.

Case 1: “Intelligence Via Online Learning”—The inner algorithm cannot build an ever-expanding web of knowledge & understanding by itself, but it can do so in conjunction with the outer algorithm

As mentioned near the top, I’m defining “evolution analogy” to exclude this case, because humans can acquire new understanding without needing to wait many centuries to create new generations of humans that can be further selected by evolution.

But within-lifetime human learning is in this category. We have an outer algorithm (our innate learning algorithm) which does an automated search for an inner algorithm (set of knowledge, ideas, habits of thought, etc.). But the inner algorithm by itself is not sufficient for intelligence; the outer algorithm is actively editing it, every second. After all, in order to solve a problem—or even carry on a conversation!—you’re constantly updating your database of knowledge and ideas in order to keep track of what’s going on. Your inner algorithm by itself would be like an amnesiac! (Admittedly, even splitting things up into outer /​ inner is kinda unhelpful here.)

Anyway, I think this kind of system is a very plausible model for what AGI will look like.

Let’s call this case “intelligence via online learning”. Online learning is when a learning algorithm comes across data sequentially, and learns from each new datapoint, forever, both in training and deployment.

Now, there’s a boring version of “online learning is relevant for AGIs” (see e.g. here), which I’m not talking about. It goes like this: “Of course AGI will probably use online learning. I mean, we have all these nice unsupervised learning techniques—one is predictive learning (a.k.a. “self-supervised learning”), another is TD learning, another is amplification (and related things like chunking, memoization, etc.), and so on. You can keep using these techniques in deployment, and then your AGI will keep getting more capable. So why not do that? You might as well!”

That’s not wrong, but that’s also not what I’m talking about. I’m talking about the case where I ask my AGI a question, it chugs along from time t=0 to t=10 and then gives an answer, and where the online-learning that it did during time 0<t<5 is absolutely critical for the further processing that happens during time 5<t<10.

This is how human learning works, but definitely not how, say, GPT-3 works. It’s easy to forget just how different they are! Consider these two scenarios:

  1. During training, the AGI comes across two contradictory expectations (e.g. “demand curves usually slope down” & “many studies find that minimum wage does not cause unemployment”). The AGI updates its internal models to a more nuanced and sophisticated understanding that can reconcile those two things. Going forward, it can build on that new knowledge.

  2. During deployment, the exact same thing happens, with the exact same result.

In the Intelligence-Via-Online-Learning paradigm (for example, human learning), there’s no distinction; both of these are the same algorithm doing the same thing. Specifically, there is no algorithmic distinction between “figuring things out in the course of learning something” vs “figuring things out to solve a new problem”.

Whereas in the evolution-analogy paradigm, these two cases would be handled by two totally different algorithmic processes—”outer algorithm editing the inner algorithm” during training and “inner algorithm running on its own” during deployment. We have to solve the same problem twice! (And not just any problem … this is kinda the core problem of AGI!) Solving the problem twice seems harder and less likely than solving it once, for reasons I’ll flesh out more in a later section.

Cases 2-5: After training, the inner algorithm by itself (i.e. without the outer algorithm’s involvement) can build an ever-expanding web of knowledge & understanding

Cases 2-3: The inner algorithm, by itself, builds an ever-expanding web of knowledge & understanding from scratch

As discussed above, I put the evolution-of-a-human-brain example squarely in this category: I think that all of a human’s “web of knowledge and understanding” is learned within a lifetime, although there are innate biases to look for some types of patterns rather than others (analogous to how a ConvNet will more easily learn localized, spatially-invariant patterns, but it still has to learn them). If it’s not “all” of a human’s knowledge that’s learned within a lifetime, then it’s at least “almost all”—the entire genome is <1GB (only a fraction of which can possibly encode “knowledge”), while there are >100 trillion synapses in the neocortex.

Case 2: Outer algorithm starts the inner algorithm from scratch, lets it run all the way to AGI-level performance, then edits the algorithm and restarts it from scratch

Assuming we use the simple, most-evolution-like approach, each episode (= run of the inner algorithm) has to be long enough to build a common-sense world-model from scratch.

How long are those episodes in wall-clock time? I admit, there is no law of physics that says that a machine can’t learn a human-level common-sense world-model, from scratch, within 1 millisecond. But given that it takes many years for a human brain to do so—despite that brain having a supercomputer-equivalent brain (maybe)—and given that the early versions of an AGI algorithm would presumably be just barely working at all, I think it’s a reasonably safe bet that it would at least weeks or months of wall-clock time per episode, and I would not be at all surprised if it took more than a year.

If that’s right, then developing this AGI algorithm will not look like evolution or gradient descent. It would look like a run-and-debug loop, or a manual hyperparameter search. It seems highly implausible that the programmers would just sit around for months and years and decades on end, waiting patiently for the outer algorithm to edit the inner algorithm, one excruciatingly-slow step at a time. I think the programmers would inspect the results of each episode, generate hypotheses for how to improve the algorithm, run small tests, etc.

In fact, with such a slow inner algorithm, there’s really no other choice. On human technological development timescales, the outer algorithm is not going to get many bits of information—probably not enough to design, from scratch, a new learning algorithm that the programmers would never have thought of. (By contrast, for example, the AlphaStar outer algorithm leveraged many gigabytes of information— agent steps—to design the inner algorithm.) Instead, if there is an outer algorithm at all, it would merely be tuning hyperparameters within a highly constrained space of human-designed learning algorithms, which is the best you can do with only dozens of bits of information.

Case 3: While the inner algorithm can build up knowledge from scratch, during development we try to preserve the “knowledge” data structure where possible, carrying it over from one version of the inner algorithm to the next

Back to the other possibility. Maybe we won’t restart the inner algorithm from scratch every time we edit it, since it’s so expensive to do so. Instead, maybe once in a while we’ll restart the algorithm from scratch (“re-initialize to random weights” or something analogous), but most of the time, we’ll take whatever data structure holds the AI’s world-knowledge, and preserve it between one version of the inner algorithm and its successor. Doing that is perfectly fine and plausible, but again, the result doesn’t look like evolution; it looks like a hyperparameter search within a highly-constrained class of human-designed algorithms. Why? Because the world-knowledge data structure—a huge part of how an AGI works!—needs to be designed by humans and inserted into the AGI architecture in a modular way, for this approach to be possible at all.

Cases 4-5: The inner algorithm cannot start from scratch—it needs to start with a base of preexisting knowledge & understanding. But then can expand that knowledge arbitrarily far by itself.

Case 4: The inner algorithm’s starting knowledge base is directly built by humans.

Well, just as in Case 3 above, this case does not look like evolution, it looks like a hyperparameter search within a highly-constrained class of human-designed algorithms, because humans are (by assumption) intelligently designing the types of data structures that will house the AGI’s knowledge, and that immediately and severely constrains how the AGI works.

Case 5: The inner algorithm’s starting knowledge base is built by the outer algorithm.

Just to make sure we’re on the same page here, the scenario we’re talking about right now is that the outer algorithm builds an inner algorithm which has both a bunch of knowledge and understanding about the world and a way to open-endedly expand that knowledge. So when you turn on the inner algorithm, it already has a good common-sense understanding of the world, and then you give the inner algorithm a new textbook, or a new problem to solve, and let the algorithm run for an hour, and at the end it will come out knowing a lot more than it started. For example, GPT-3 has at least part of that—the outer algorithm built an inner algorithm which has a bunch of knowledge and understanding about the world. The inner algorithm can do some amount of figuring things out, although I would say that it cannot open-endedly expand its knowledge without the involvement of the outer algorithm (i.e., fine-tuning on new information), if for no other reason than the finite context window.

As mentioned above, I do not think that there’s any precedent in nature for a Case-5 algorithm—I think that humans and other animals start life with various instincts and capabilities (some very impressive!), but literally zero “knowledge” in the usual sense of that term (i.e. an interlinking web of concepts that relate to each other and build on each other and enable predictions and planning). But of course a Case-5-type inner algorithm is not fundamentally impossible. As an existence proof, consider the algorithm that goes: “Start with this snapshot of an adult brain, and run it forward in time”.

And again, since we’re searching for an evolution analogy (and not just a low-dimensional hyperparameter search), the assumption is that the inner algorithm builds new knowledge using principles that the programmer does not understand.

There are a couple reasons that I’m skeptical that this will happen.

First, there’s a training problem. Let’s say we give our inner algorithm the task of “read this biology textbook and answer the quiz questions”. There are two ways that the inner algorithm could succeed:

  1. After training, the inner algorithm could start up in a state where it already understands the contents of the textbook.

  2. The inner algorithm could successfully learn the contents of the textbook within the episode.

By assumption, here in Case 5, we want both these things to happen. But they seem to be competing: The more that the inner algorithm knows at startup, the less incentive it has to learn. Well, it’s easy enough to incentivize understanding without incentivizing learning: just make the inner algorithm answer the quiz questions without having access to the textbook. (That’s the GPT-3 approach.) But how do you incentivize learning without incentivizing understanding? Whatever learning task you give the inner algorithm, the task is always made easier by starting with a better understanding of the world, right?

Evolution solved that problem by being in Case 2, not Case 5. As above, the genome encodes little if any of a human’s world-knowledge. So insofar as the human brain has an incentive to wind up understanding the world, it has to learn. You could say that there’s regularization (a.k.a. “Information funnel”) in the human brain algorithm—the genome can’t initialize the brain with terabytes of information. We could, by the same token, use regularization to force the inner algorithm here to learn stuff instead of already knowing it. But again, we’re talking about Case 5, so we need the inner algorithm to turn on already knowing terabytes of information about the world. So what do you do? I have a hard time seeing how it would work, although there could be strategies I’m not thinking of.

Second, there’s a “solving the problem twice” issue. As mentioned above, in Case 5 we need both the outer and the inner algorithm to be able to do open-ended construction of an ever-better understanding of the world—i.e., we need to solve the core problem of AGI twice with two totally different algorithms! (The first is a human-programmed learning algorithm, perhaps SGD, while the second is an incomprehensible-to-humans learning algorithm. The first stores information in weights, while the second stores information in activations, assuming a GPT-like architecture.)

I think the likeliest thing is that programmers would succeed at getting an outer algorithm capable of ever-better understanding of the world, but because of the training issue above, have trouble getting the inner algorithm to do the same—or realize that they don’t need to. Instead they would quickly pivot to the strategy of keeping the outer algorithm involved and in the loop while using the system, and not just while training. This is Case 1 above (“Intelligence Via Online Learning”). So for example, I don’t think GPT-N will lead to an AGI, but if I’m wrong, then I expect to be wrong because it has a path to AGI following Case 1, not Case 5.

Anyway, none of these are definitive arguments that Case 5 won’t happen. And if it does, then the evolution analogy would plausibly be OK after all. So this is probably the weakest link of this section of the blog post, and where I expect the most objections, which by the way I’m very interested to hear.

3. Computational efficiency: the inner algorithm can run efficiently only to the extent that humans (and the compiler toolchain) generally understand what it’s doing

But first: A digression into algorithms and their low-level implementations

Let’s consider two identical computers running two different trained neural net models of the same architecture—for example, one runs a GPT model trained to predict English words, and the other runs a GPT model trained to predict image pixels. Or maybe one runs a Deep Q Network trained to play Pong and the other runs a Deep Q Network trained to play Space Invaders.

Now, look at the low-level operations that these two computers’ processors are executing. (As a concrete example: here is a random example list of a certain chip’s low-level processor instructions; which of those instructions is the computer executing right now?) You’ll see that the two computers are doing more-or-less exactly the same thing all the time. Both computers are using exactly 1347 of their 2048 GPU cores. Oh hey, now both computers are copying a set of 32 bits from SRAM to DRAM. And now both computers are multiplying the bits in register 7 by the bits in register 49, and storing the result in register 6. The bits in those registers are different on the two computers, but the operation is the same. OK, not literally every operation is exactly the same—for example, maybe the neural net has ReLU activation functions, so there’s a “set bits to zero” processor instruction that only occurs about half the time, and often one computer will execute that set-to-zero instruction when the other doesn’t. But it’s awfully close to identical!

By contrast, if you look up close at one computer calculating a Fast Fourier Transform (FFT), and compare it to a second computer doing a Quicksort, their low-level processing will look totally different. One computer might be doing a 2′s complement while the other is fetching data from memory. One computer might be parallelizing operations across 4 CPU cores while the other is running in a single thread. Heck, one computer might be running an algorithm on its GPU while the other is using its CPU!

So the upshot of the above is: When running inference with two differently-trained neural net models of the same architecture, the low-level processing steps are essentially the same, whereas when running FFT vs quicksort, the low-level processing steps are totally different.

Why is that? And why does it matter?

The difference is not about the algorithms themselves. I don’t think there’s any sense in which two different GPT trained models are fundamentally “less different” from each other than the quicksort algorithm is from the FFT algorithm. It’s about how we humans built the algorithms. The FFT and quicksort started life as two different repositories of source code, which the compiler then parsed and transformed into two different execution strategies. Whereas the two different GPT trained models started life as one repository of source code for “a generic GPT trained model”, which the compiler then parsed into a generic execution strategy—a strategy that works equally well for every possible GPT trained model, no matter what the weights are.

To see more clearly that this is not about the algorithms themselves, let’s do a swap!

Part 1 of the swap: Is it possible to have one computer calculating an FFT while another does quicksort, yet the processors are doing essentially the same low-level processing steps in the same order? The answer is yes—but when we do this, the algorithms will run much slower than before, probably by many orders of magnitude! Here’s an easy strategy: we write both the FFT and the quicksort algorithms in the form of two different inputs to the same Universal Turing Machine, and have both our computers simulate the operation of that Turing machine, step by step along the simulated memory tape. Now each computer is running exactly the same assembly code, and executing essentially the same processor instructions, yet at the end of the day, one is doing an FFT and the other is doing a quicksort.

Part 2 of the swap: Conversely, is it possible to have each of two computers run a different trained model of the same neural net architecture, yet the two computers are doing wildly different low-level processing? …And in the process they wind up running their algorithms many orders of magnitude faster than the default implementation? This is the exact reverse of the above. And again the answer is yes! Let’s imagine a “superintelligent compiler” that can examine any algorithm, no matter how weirdly obfuscated or approximated, and deeply understand what it’s doing, and then rewrite it in a sensible, efficient way, with appropriate system calls, parallelization, data structures, etc. A “superintelligent compiler” could look at the 3 trillion weights of a giant RNN, and recognize that this particular trained model is in fact approximating a random access memory algorithm in an incredibly convoluted way … and then the superintelligent compiler rewrites that algorithm to just run on a CPU and use that chip’s actual RAM directly, and then it runs a billion times faster, and more accurately too!

So in summary: the reason that differently-trained neural nets use essentially the same low-level processing steps is not necessarily because the same low-level processing steps are the best and most sensible way to implement those algorithms, but rather it’s because we don’t have a “superintelligent compiler” that can look at the trillion weights of a giant trained RNN and then radically refactor the algorithm to use more appropriate processor instructions and parallelization strategies, and to move parts of the algorithm from GPU to CPU where appropriate, etc. etc. And I don’t expect this to change in the future—at least not before we have AGI.

Back to the main argument

The moral of the previous subsection is that if you search over a Turing-complete space of algorithms—for example large RNNs—you can find any possible algorithm, but you will not find most of those algorithms implemented in a compute-efficient way.

For example, vanilla RNNs mostly involve multiplying matrices.

If your inner algorithm needs a RAM, and the programmers didn’t know that, well maybe the outer algorithm will jerry rig an implementation of RAM that mostly involves multiplying matrices. But that implementation will be a whole lot less computationally efficient than just using the actual RAM built into your chip.

And if your inner algorithm needs to sort a list, and the programmers didn’t know that, well maybe the outer algorithm will jerry rig an implementation of a list-sorting algorithm that mostly involves multiplying matrices. But that implementation will be a whole lot less computationally efficient than the usual approach, where a list-sorting algorithm is written in normal code, and then humans and compilers can work together to create a sensible low-level implementation strategy that takes advantage of the fact that your chip has blazing-fast low-level capabilities to compare binary numbers and copy bit-strings and so on.

Still other times, the inner algorithm really does just need to multiply matrices! Or it needs to do something that can be efficiently implemented in a way that mostly involves multiplying matrices. And then that’s great! That part of the algorithm will run very efficiently! For example, did you know that the update rule for a certain type of Hopfield network happens to be equivalent to the attentional mechanism of a Transformer layer? So if your outer algorithm is looking for an algorithm that involves updating a Hopfield network, and you’re using a Transformer architecture for the inner algorithm, then good news for you, the inner algorithm is going to wind up with a very computationally-efficient implementation!

OK. So let’s say there are two projects trying to make AGI:

  • One project is motivated by the evolution analogy. They buy tons of compute to do a giant automated search for inner learning algorithms (which then run by themselves).

  • The other project is also searching for an inner learning algorithm, but using human design, i.e. trying to figure out what data structures and operations and learning rules are most suitable for AGI.

…Then my claim in this section is that the second team would have an advantage that if they succeed in finding that inner algorithm, their version will run faster than the first team’s, possibly by orders of magnitude. This is a run-time speed advantage, i.e. it comes on top of the additional advantage of not needing tons of compute to find the inner algorithm in the first place.

(You can still argue that the first team will win despite this handicap because humans are just not smart enough to design a learning algorithm that will learn itself all the way to AGI, so the second team is doomed. That’s not what I think, as discussed above, but that’s a different topic. Anyway, hopefully we can agree that this is at least one consideration in favor of the second team.)

I think this argument will carry more weight for you if you think that an AGI-capable learning algorithm needs several modular subsystems that do different types of calculations. That’s me—I’m firmly in that camp! For example, I mentioned AlphaStar above—it has LSTMs, self-attention, scatter connections, pointer networks, supervised learning, TD(λ), V-trace, UPGO, interface code connecting to the Starcraft executable, and so on. What are the odds that a single one-size-fits-all low-level processing strategy can do all those different types of calculations efficiently? I think that some of the necessary components would turn out to be a terrible fit, and would wind up bottlenecking the whole system.

I think of the brain like that too—oversimplifying a bit, there’s probabilistic program inference & self-supervised learning (involving neocortex & thalamus), reinforcement learning (basal ganglia), replay learning (hippocampus), supervised learning (amygdala), hardcoded input classifiers (tectum), memoization (cerebellum), and so on—and each is implemented by arranging different types of neurons into different types of low-level circuits. I think each of these modules is there for sensible and important design reasons, and therefore I expect that most or all of these modules will be part of a future AGI. Programmers have proven themselves quite capable of building learning algorithms with all those components; and if they do so, it would wind up with an efficient low-level execution strategy. Maybe an automated search could discover a monolithic black box containing all those different types of calculations, but if it did, again, it would be very unlikely to be able to run them efficiently, within the constraints of its predetermined, one-size-fits-all, low-level processing strategy.

Thanks to Richard Ngo & Daniel Kokotajlo for critical comments on a draft.