My Thoughts on Takeoff Speeds

Epistemic Status: Spent a while thinking about one subset of the arguments in this debate. My thoughts here might be based on misunderstanding the details of the arguments, if so, I apologize.

There is a debate going on within the AI risk community about whether or not we will see “gradual”, “slow”, “fast”, or “discontinuous” progress in AGI development, with these terms in quotes because the definitions of these terms can mean entirely different things based on who uses them.

Because progress in AI is very difficult to quantify, these terms largely are forced to be qualitative. For example, “discontinuous” appears to mean that we might observe a huge leap in performance of an AI system, possibly due to a single major insight that allows the system to gain a strategic advantage within one domain and dominate. The term is important because it implies that we might not be able to foresee the consequences of increases in the capability of an AI system before the system is changed. If the rate of progress is “slow” enough, this might give us enough time to prepare and reduce the risk posed by the next increase in capability.

For that reason, we might wonder what type of progress we should expect, and if there is any evidence that points to “slow”, “fast”, or “discontinuous” progress at any given point in AI development.

The evidence that is usually used in favor of the discontinuous narrative is that evolution seemed to produce humans from previous ape-like ancestors who were no more capable than our Chimpanzee relatives, but in a relatively short time-scale (geologically speaking) humans came to develop language, generate art, culture, and science, and eventually dominate the world.

An interesting development in this conversation occurred recently with a contribution by Paul Christiano, who argued in his post Takeoff Speeds, that he did not find our observations about evolution to be very convincing arguments in favor of the discontinuous narrative.

To summarize his argument, we first note that evolution does not in general optimize for “intelligence.” Evolution optimizes for fitness, which may happen to result in increases in intelligence if that improvement happens to be beneficial in a given environment. Evolution does not pick out improvements that will be beneficial in the future, only ones that are beneficial in the current environment. These improvements may vary all over different domains, where improvements to one domain might seem very rapid at one point in time and slow or even backwards at other points in time.

In nature, fitness occasionally acts as a proxy measure for intelligence, and other times it does not. So during the times where it is not a proxy measure for intelligence, changes in intelligence could be zero, positive, or negative, but once it suddenly changes to being a proxy, we might see a rapid increase in intelligence.

For that reason, we may have observed discontinuous change in the rate of intelligence increase in humans because it was not being optimized for. Whereas, if it were being directly optimized for (as we expect will be the case within AGI development) we would be much more likely to see continuous rather than discontinuous progress.

I will argue that the above argument doesn’t really work, and that while evolution doesn’t tell us that much about whether or not to expect discontinuous or very rapid change in AI development, it is still at least weakly in favor of it.

I think the main issue is that we need to more strictly define how we measure “progress” in something. Within AI, we might measure “progress” as skill at a certain activity like playing a game, or accomplishing a task in a narrow domain which is much more easily measurable. Paul sometimes uses the rate of change in GDP as a proxy measure for how powerful our AI systems would be at some point in the future when our economy becomes dominated by AI capability. Sometimes we might switch from a definition like this to a definition like “how easily an AI can strategically dominate the entire world” which is sort of a more relevant definition when talking about AI risk.

Within Paul’s argument is a more subtle argument about optimizing different loss functions more generally. When applied to any loss function, not just fitness or intelligence, the argument seems to imply that if you were to optimize a loss function, you would see continuous change in that loss function over time, while you might observe discontinuous change in another loss function over the same time frame, as measured by the performance of the system being optimized over both losses. Whereas if you were to optimize the second loss function directly, you would not see discontinuous change.

This is certainly possible, and it is fairly easy to come up with specific situations where it would be true. But is this situation actually likely? This is the primary question. It seems to depend entirely on how we expect loss landscapes to actually look in practice, and in the details of our optimization technique that determine how we travel through the loss landscape.

Evolution and steepest descent are two of the most common types of optimization techniques that exist today. We tend to use these in the situation where a) Both our measure of progress and the representation of our parameters to optimize are both very easy to quantify and b) We have virtually no information about the loss landscape, and therefore aren’t able to make any “quick jumps” from point A to point B in the parameter space.

Slight digression on loss landscapes: In the terminology of optimization, a loss “landscape” or cost function landscape is a surface in the parameter space of an optimization problem. The individual coordinates are the values of each parameter and the cost function expressed as real numbers, and the surface is defined as the locations where the parameter values equal the output of the cost function when those parameters are passed in as inputs.

Both steepest descent and evolution work by looking around a small local region in parameter space (a single model might be represented by one point, a collection of organisms being evolved would look like a cloud of points localized around a small region). They both move the parameters in the direction that allows the greatest change in the value of the cost function (with evolution it’s not exactly like this, but would be in an idealized case with an enormous population size and the ability to vary in every parameter). Neither of them have any access to information about the landscape as a whole, or any geometric information besides the current best direction to move.

If you observe a discontinuous or rapid change in the value of the loss function, it means you travelled over a very steep slope or went over a cliff.

Biological evolution falls into this situation pretty neatly. Fitness is a pretty easy cost function to measure (you either reproduce or you don’t), and nature has found a nice way of representing the relevant parameters through the use of DNA and such. Evolution also has a bounded step size, since the details of reproduction generally limit the size of changes that can occur between generations.

This is important, because the fact that changes in evolution are incremental and small tells us something about what the loss landscape of intelligence looks like. At the very least it tells us that there are some “cliffs” in the landscape, and that it doesn’t look completely smooth.

If you’re doing steepest descent or evolution, it’s probably more likely that you’d be going over cliffs than if you were just travelling in random directions, since these methods will be looking for directions of greatest change. Therefore, you’d be more likely to see much faster change in your loss metric if you are optimizing for it directly than if you were optimizing for it indirectly or for something else entirely.

And of course, we’re talking about a specific method of optimization here, one of the most “stupid” forms, that doesn’t require much knowledge about the thing it is trying to optimize for. If you did have this knowledge, it would be possible to make much larger and more significant updates. So the discontinuous story seems to apply most strongly in the situation where you’d be capable of performing “large jumps” in parameter space. And if you knew how to optimize the thing you wanted directly, you’d probably be more likely to be in this situation.

It’s possible to construct loss functions where evolution / gradient descent would give you fairly gradual change, but where there are routes through the parameter space that would give you points of rapid or discontinuous change, and that where optimizing the function directly gives you continuous change, but optimizing some other function moves it through the discontinuous space. Saddle-points and local minima might prevent an evolution-based method from gaining much progress in a certain metric, but optimizing a different cost or moving in a random direction might shove it out of a local minimum.

But the original question was whether or not the fact that we observed fairly rapid change in the capability of humans is evidence for or against this situation. It seems like we would need to know far more about how the thing we are optimizing for instead of fitness moves us through the loss landscape of intelligence differently than how optimizing for fitness would move us through this landscape. I just don’t think we have much information about this to know that it *wouldn’t move us through the discontinuous or cliff-like parts.

It’s hard to say right now whether or not progress in AGI will be made mostly by methods like gradient descent or evolution, or because we gain some big insights that allow us to skip from point A to point B without moving through the steps in between. The latter seems like it *could* result in much more rapid capability improvement than the former. If we had some very good reason to skip from A to B, this would imply we knew a lot about the intelligence landscape to feel safe in doing so (like having a full theory of intelligence and alignment), in which case this discussion would be irrelevant. In the case we don’t, it would almost certainly mean discontinuous capability change in the bad way.

In any case, it’s the gradual incremental methods that have more likelihood of giving rise to continuous rather than discontinuous progress, and if we observe discontinuous change there, this seems to be non-negligible evidence in favor of a discontinuous landscape of intelligence.

Finally, Paul’s argument also seems to rely on AI researchers optimizing for intelligence directly, rather than using a proxy. I felt that this seemed like it actually was fairly load bearing, but not much discussion around it. I think it is fairly unlikely that we will be optimizing for intelligence directly, given that we’ve mostly been using lots of different narrow proxy measures so far, and I’m not sure that we’ll ever get a “true” measure that can be optimized for, until we have much better theories of intelligence. Within AI development, it seems like we have neither a simple cost function to optimize and which is easy to measure, nor a way to represent “mind design space” as anything which allows us to use simple, incremental optimization methods. If we had these things, it might be much easier to incrementally increase AI capability at a rate that we’d be able to react to in time. His argument that we will see continuous change seems to rely on optimizing for intelligence directly, so it would be nice to hear Paul expand on this if possible.

In conclusion, I think this leaves us in a position where trying to determine what kind of rate of progress in AI development we will see depends almost entirely on our “inside view” perspective. That is, our knowledge about intelligence itself, rather than our observations about how progress in intelligence has occurred in different situations. However, this “outside view” perspective gives us small but positive evidence in favor of discontinuous change. And this might be pretty much where we started out in this debate.