Examining Evolution as an Upper Bound for AGI Timelines

Cross-posted from https://mybrainsthoughts.com/?p=349

With the massive degree of progress in AI over the last decade or so, it’s natural to wonder about its future—particularly the timeline to achieving human (and superhuman) levels of general intelligence. Ajeya Cotra, a senior researcher at Open Philanthropy, recently (in 2020) put together a comprehensive report seeking to answer this question (actually, it answers the slightly different question of when transformative AI will appear, mainly because an exact definition of impact is easier than one of intelligence level), and over 169 pages she lays out a multi-step methodology to arrive at her answer. The report has generated a significant amount of discussion (for example, see this Astral Codex Ten review), and seems to have become an important anchor for many people’s views on AI timelines. On the whole, I found the report added useful structure around the AI timeline question, though I’m not sure its conclusions are particularly informative (due to the wide range of timelines across different methodologies). This post will provide a general overview of her approach (readers who are already familiar can skip the next section), and will then focus on one part of the overall methodology—specifically, the upper bound she chooses—and will seek to show that this bound may be vastly understated.

Part 1: Overview of the Report

In her report, Ajeya takes the following steps to estimate transformative AI timelines:

Determine the total amount of computation required to train a transformative AI model, based on the architectures and algorithms available in 2020.
Determine the rate at which the amount of computation available will change (due to reduced cost of computation and greater availability of capital) and the rate at which the computational requirements will be reduced (due to architectural / algorithmic progress).
Apply these rates, starting from 2020, to determine at what point sufficient computation will be available to train a transformative AI model.

1. Determine Total Amount of Computation Required for Transformative AI

To get an estimate, Ajeya leans heavily on the only working example of transformative intelligence we have—the human brain. She lays out four different approaches in an attempt to bound the computation required:

Human Lifetime: Total amount of computation done by a human brain over a lifetime (lower bound)
Neural Network: Total amount of computation required to train a model with the computational power (i.e., computations per second) of the human brain (Ajeya spends the most time on this approach and includes three variations)
Human Genome: Total amount of computation required to to train a model with the computational power of the human brain and as many parameters as there are bytes in the human genome (i.e., ~7.5*10⁸)
Evolution: Total amount of computation done by evolution (upper bound, and the focus of this post)

The Human Lifetime and Evolution approach are the most direct, as they simply come up with a single number representative of the total estimated computation that will be required for transformative AI (the process of arriving at the Evolution number will be the focus of this post). In the terms laid out in this post, the Human Lifetime approach assumes that most of the “work” of intelligence is done by the update algorithm, while the Evolution approach assumes most of the “work” lies in arriving at the initial structure / update algorithm. The Neural Network and Human Genome approaches, on the other hand, make estimations about indirect values such as the model computational power (i.e., computations per second) and number of model parameters (i.e., the size of the model), and then leverage observed relations between these values to reach a total computation figure. The chart below provides additional detail on the estimated and calculated values for each approach.

For each of these approaches, Ajeya also layers in a scaling factor based on the greater degree of efficiency generally observed in biological systems vs. man-made ones. The exact scaling factors are not important for the purposes of this post (they vary between ~10 and 1,000), but the below chart (from Ajeya’s report, attributed to Danny Hernandez on the basis of Paul Christiano’s research) provides a summary of the findings (which are interesting of their own right):

In the end, the four approaches offered greatly differing views of transformative AI computational requirements (all references to numbers of computations refer to floating point operations):

Human Lifetime: Median of 10²⁷computations
Neural Network: Between 10³² and 10³⁷computations
Human Genome: Median of 10³³computations
Evolution: Median of 10⁴¹computations

Besides the Human Lifetime estimate, all these numbers are massively greater than anything possible right now. The most complex AI models of today require ~10²⁴computations (e.g., Google’s PaLM) and the world’s fastest supercomputers can perform ~10²³ computations per day.

2. Determining Rate of Change of Computational Availability

As mentioned above, the following factors are examined:

Architectural / Algorithmic Progress: Computational requirements will decrease as the architectures and algorithms used improve (the computational requirements specified above are all based on the architectures and algorithms available in 2020). Estimated to reduce computation required by 50% every 2-3 years based on observed recent progress; the minimum possible level varies by approach (e.g., 10% of 2020 requirements for the Human Lifetime approach, 0.001% of 2020 requirements for the Evolution approach).
Reduced Cost of Computation: Estimated to reduce by 50% every 2.5 years (about in-line with current trends), down to a minimum level of 1 / 10⁶ (i.e., 0.0001%) in 50 years.
Increased Availability of Capital for AI: Estimated to reach a level of $1B in 2025, then double every 2 years after that, up to 1% of US GDP (currently would suggest $200B of available capital, and growing ~3% per year).

The impact of each of these factors, as estimated in the report, is significant; taken together, they result in even the massive amounts of computation required by the more computationally expensive benchmarks being available relatively quickly. For example, when considering the Evolution approach, 30 years of progress allows for a factor of ~10¹³ more computation being available vs. the largest projects of 2020 (~10^3.5 from architectural / algorithmic progress, ~10^3.5 from reduced computation, and ~10⁶ from increased availability of capital, primarily due to the jump up to $1B in 2025, vs. 2020’s $1M).

3. Apply the Rates

Ajeya pulls together all the elements of the report by applying a weighting to each of the approaches (to form a single probability distribution), then looks at the timeline of expected likelihood of achieving transformational AI (note that the Neural Network approach was divided into three scenarios, each with a different assumption on the time required per training iteration).

This framing suggests a 50% probability of achieving transformative AI by ~2052 (or, more specifically, of the computation required to do so being affordable), with all approaches except Evolution suggesting a 50% probability by at least 2060 (and a 70% probability by 2100). The Evolutionary approach, on the other hand, doesn’t show a 50% probability until ~2090 (and remains at ~50% in 2100 and beyond). The leveling off seen on the right side of the chart is more an artifact of the maximum / minimum assumptions for algorithmic / architectural progress and reduced cost of computation (the former bottoming out in ~45 years and the latter in ~50), with gains after that point driven solely by GDP increases (since the available capital is pegged to 1% of GDP).

Part 2: Review of the Evolution Approach

While much can be said about each of the benchmarks used, this post will focus specifically on the Evolution approach, as it forms the upper bound of the methodology (which, in Ajeya’s framing, suggests a maximally pessimistic view of 2090 as the ⁵⁰⁄₅₀ mark). The Evolution approach received only brief treatment in the report (most of the effort was spent on the Neural Network approach) and was structured in a relatively simplistic way, with the amount of computation calculated as: “length of time since neurons evolved” * “amount of computation occurring per unit of time”.

The length of time since neurons evolved can be evaluated with a fair degree of accuracy; Ajeya estimates it as ~6 x 10⁸ years (10¹⁶ seconds), starting from the point at which the Kingdom Animalia diverged from the Eukaryotes. The amount of computation occurring each moment, on the other hand, is harder to estimate. Ajeya leans on this post which estimates the number of organisms of various types existing at a given point in time; based on its findings, she makes the assumption that, for most of evolutionary history, the “average” organism was very small (akin to a small worm like C. Elegans), with 10²¹ organisms each performing 10⁴ calculations per second for a total of 10²⁵ computations per second.

For the purposes of this discussion, let’s assume that the above methodology is a sensible way of calculating the amount of computation performed by neurons over evolutionary history. The question then becomes whether that should be our target metric, if our goal is to estimate the amount of computation done by evolution. To better understand whether it should be, it will be helpful to first take a look at why this framing (focusing only on neural computation) fits so well in other cases—for example, the Human Lifetime estimation approach.

The essential idea of the Human Lifetime approach is that we can look at the newborn human brain as analogous to a new (randomly initialized) AI model requiring training. This analogy requires that we make the assumption that very little intelligence is “baked in” from the start (a very significant assumption), and if we do so the analogy holds quite well—the brain can be viewed as “running” some type of algorithm which allows it to update based on the inputs received and gradually make sense of them (and the world they relate to). There’s a type of gradient descent going on over the course of our life where, with more and more input data, we (more specifically, our brains) develop better and better representations of the world. Put differently, each computation done by the brain on these inputs moves the system, on average, toward a greater level of intelligence, much as each training step of an AI model moves it closer to the same. The “evolving unit” in this framework is the brain / model state, with each iteration of computation resulting in a “better” state, on average. Neural computations hold a “privileged position” over all other dynamics (e.g., those going on in the heart, the digestive system, and the outside world) because they are the ones moving the system in the direction of greater intelligence—a newborn’s brain is like an untrained model, and each synaptic event performed by the brain in its maturation process is like a training computation. Based on this framing, it is logical to take the estimated amount of computation done by the brain over the course of a life as our benchmark for an AI model (with adjustments for the efficiency of biological vs. human constructed artifacts), as long as we assume that the requisite training data will be available (as Ajeya does).

Now let us return back to the Evolution approach. Here, we no longer make the assumption that very little intelligence is “baked in” to the newborn brain. In fact, the crux of this approach is the idea that the vast majority of the computation required to construct intelligence has already been performed in arriving at the newborn brain; from there, only a minute amount of “refinement” is required (the lifetime of neural computations). From this perspective, the “evolving unit” is the initial structure / update algorithm of the brain (or really, the DNA sequence which defines it), rather than the brain state itself. An individual organism acquiring intelligence over the course of its life does not directly drive improvement in the brain structures biology is capable of; rather, only through the organism’s participation in the broader optimization process of natural selection does a gradient for structure appear. The neural computations of an individual organism hold no “privileged position” in the evolutionary process, and are instead only relevant in the degree to which they impact that organism’s fitness—meaning there’s no difference between these computations and those of the heart, digestive system, or even the outside world.

Returning to the original question: if our goal is measure the amount of computation done by evolution, what should be our target metric? Looking solely at neural computation implies that this is the only relevant computation done by evolution. However, as we have just seen, the gradient of the evolutionary process only appears at a higher level which includes the entirety of all organisms and the environment they reside within. The amount of computation done by evolution is inclusive of all these parts.

Defining computation at this level is more thorny than when looking only at the brain, but the molecular level seems a fair starting place, as DNA is the “evolving unit”. This means that we need to determine the computation required to simulate the process of evolution from the molecular level to get a true upper bound. Note that it could be argued that simulation at either a higher (e.g., cellular) or lower (e.g., quantum) level is actually required for there to be sufficient complexity for evolution to result in systems with human levels of intelligence—that question feels deserving of deeper analysis, but for the purposes of this post we will assume the molecular level is both required and sufficient.

Let’s say for the sake of argument that the computation of each atom is equivalent to ~1 calculation per second (this seems a significant underestimate, as each atom’s behavior will be dependent on that of all the atoms around it, and the impacts are continuous rather than once per second—but exactness here is not of great importance). If we then take the total number of in-scope atoms to be ~10⁴⁰ (assuming 10²⁸ per person * 10¹⁰ people * 10⁶ to account for other life / the environment), and the same 10¹⁶ seconds of evolutionary time, we reach a massive 10⁷⁰ evolutionary computations. For context, this is ~10³⁰times larger than the estimate found in the report, and implies the essential impossibility of achieving transformative AI by evolutionary means within the current computational paradigm. This does not mean that transformative AI itself is impossible—all it means is that the potential upper bound of computational work required is far larger than represented in the report. There’s little reason to believe we’ll actually run into this bound, particularly due to the fact that we already have examples of systems demonstrating human-level general intelligence (i.e., human brains) to build off of.

My main takeaway from this analysis is that we should have more uncertainty about when transformative AI will be achieved. If the current AI paradigm of transformer models with more and more parameters can scale up to transformative AI, then we may indeed see it in the next 15 − 50 years. However, if this type of approach is not sufficient—if there’s something different about the architecture of the human brain, and the architectures on the whole which can give rise to general intelligence (whether, for example, they are more heterogeneous, more parallel, or require different types of inputs) - then we may not currently be able to say much about timelines, other than that transformative AI will take a long time.

$2025 + 2 * l o g_{2} ($ 200 B / 1 $ B) \approx 2040$ . Actually slightly later, because GDP does grow somewhat over that time.

https://aiimpacts.org/trends-in-dram-price-per-gigabyte/ → “Since 2010, the price has fallen much more slowly, at a rate that would yield an order of magnitude over roughly 14 years.”

Proebsting’s Law is an observation that compilers roughly double the performance of the output program, all else being equal, with an 18-year doubling time. The 2001 reproduction suggested more like 20 years under optimistic assumptions. A 2022 informal test showed a 10-15% improvement on average in the last 10 years, which is closer to a 50-year doubling time.

$x^{14} = 10$ → $x \approx 1.1788$

$x^{2.5} = 2$ → $x \approx 1.3195$

Personally, I think most seemingly-exponential curves are subexponential, but that’s another matter.