I’ve been having fun recently reading about “AI Risk”. There is lots of eloquent writing out there about this topic: I especially recommend Scott Alexander’s Superintelligence FAQ for those looking for a fun read. The subject has reached the public consciousness, with high profile people like Stephen Hawking and Elon Musk speaking publicly about it. There is also an increasing amount of funding and research effort being devoted to understanding AI risk. See for example the Future of Humanity Institute at Oxford, the Future of Life Institute at MIT, and the Machine Intelligence Research Institute in Berkeley, among others. These groups seem to be doing lots of interesting research, which I am mostly ignorant of. In this post I just want to talk about a simple exercise in asymptotics.

First, Some Background.

A “superintelligent” AI is loosely defined to be an entity that is much better than we are at essentially any cognitive/learning/planning task. Perhaps, by analogy, a superintelligent AI is to human beings as human beings are to Bengal tigers, in terms of general intelligence. It shouldn’t be hard to convince yourself that if we were in the company of a superintelligence, then we would be very right to be worried: after all, it is intelligence that allows human beings to totally dominate the world and drive Bengal tigers to near extinction, despite the fact that tigers physiologically dominate humans in most other respects. This is the case even if the superintelligence doesn’t have the destruction of humanity as a goal per-se (after all, we don’t have it out for tigers), and even if the superintelligence is just an unconscious but super-powerful optimization algorithm. I won’t rehash the arguments here (Scott does it better) but it essentially boils down to the fact that it is quite hard to anticipate what the results of optimizing an objective function will be, if the optimization is done over a sufficiently rich space of strategies. And if we get it wrong, and the optimization has some severely unpleasant side-effects? It is tempting to suggest that at that point, we just unplug the computer and start over. The problem is that if we unplug the intelligence, it won’t do as well at optimizing its objective function compared to if it took steps to prevent us from unplugging it. So if it’s strategy space is rich enough so that it is able to take steps to defend itself, it will. Lots of the most interesting research in this field seems to be about how to align optimization objectives with our own desires, or simply how to write down objective functions that don’t induce the optimization algorithm to try and prevent us from unplugging it, while also not incentivizing the algorithm to unplug itself (the corrigibility problem).

Ok. It seems uncontroversial that a hypothetical superintelligence would be something we should take very seriously as a danger. But isn’t it premature to worry about this, given how far off it seems to be? We aren’t even that good at making product recommendations, let alone optimization algorithms so powerful that they might inadvertently destroy all of humanity. Even if superintelligence will ultimately be something to take very seriously, are we even in a position to productively think about it now, given how little we know about how such a thing might work at a technical level? This seems to be the position that Andrew Ng was taking, in his much quoted statement that (paraphrasing) worrying about the dangers of super-intelligence right now is like worrying about overpopulation on Mars. Not that it might not eventually be a serious concern, but that we will get a higher return investing our intellectual efforts right now on more immediate problems.

The standard counter to this is that super-intelligence might always seem like it is well beyond our current capabilities—maybe centuries in the future—until, all of a sudden, it appears as the result of an uncontrollable chain reaction known as an “intelligence explosion”, or “singularity”. (As far as I can tell, very few people actually think that intelligence growth would exhibit an actual mathematical singularity—this seems instead to be a metaphor for exponential growth.) If this is what we expect, then now might very well be the time to worry about super-intelligence. The first argument of this form was put forth by British mathematician I.J. Good (of Good-Turing Frequency Estimation!):

“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.”

Scott Alexander summarizes the same argument a bit more quantitatively. In this passage, he is imagining the starting point being a full-brain simulation of Einstein—except run on faster hardware, so that our simulated Einstein operates at a much faster clock-speed than his historical namesake:

It might, like the historical Einstein, contemplate physics. Or it might contemplate an area very relevant to its own interests: artificial intelligence. In that case, instead of making a revolutionary physics breakthrough every few hours, it will make a revolutionary AI breakthrough every few hours. Each AI breakthrough it makes, it will have the opportunity to reprogram itself to take advantage of its discovery, becoming more intelligent, thus speeding up its breakthroughs further. The cycle will stop only when it reaches some physical limit – some technical challenge to further improvements that even an entity far smarter than Einstein cannot discover a way around.

To human programmers, such a cycle would look like a “critical mass”. Before the critical level, any AI advance delivers only modest benefits. But any tiny improvement that pushes an AI above the critical level would result in a feedback loop of inexorable self-improvement all the way up to some stratospheric limit of possible computing power.

This feedback loop would be exponential; relatively slow in the beginning, but blindingly fast as it approaches an asymptote. Consider the AI which starts off making forty breakthroughs per year – one every nine days. Now suppose it gains on average a 10% speed improvement with each breakthrough. It starts on January 1. Its first breakthrough comes January 10 or so. Its second comes a little faster, January 18. Its third is a little faster still, January 25. By the beginning of February, it’s sped up to producing one breakthrough every seven days, more or less. By the beginning of March, it’s making about one breakthrough every three days or so. But by March 20, it’s up to one breakthrough a day. By late on the night of March 29, it’s making a breakthrough every second.

As far as I can tell, this possibility of an exponentially-paced intelligence explosion is the main argument for folks devoting time to worrying about super-intelligent AI now, even though current technology doesn’t give us anything even close. So in the rest of this post, I want to push a little bit on the claim that the feedback loop induced by a self-improving AI would lead to exponential growth, and see what assumptions underlie it.

A Toy Model for Rates of Self Improvement

Lets write down an extremely simple toy model for how quickly the intelligence of a self improving system would grow, as a function of time. And I want to emphasize that the model I will propose is clearly a toy: it abstracts away everything that is interesting about the problem of designing an AI. But it should be sufficient to focus on a simple question of asymptotics, and the degree to which growth rates depend on the extent to which AI research exhibits diminishing marginal returns on investment. In the model, AI research accumulates with time: at time t, R(t) units of AI research have been conducted. Perhaps think of this as a quantification of the number of AI “breakthroughs” that have been made in Scott Alexander’s telling of the intelligence explosion argument. The intelligence of the system at time t, denoted I(t), will be some function of the accumulated research R(t). The model will make two assumptions:

The rate at which research is conducted is directly proportional to the current intelligence of the system. We can think about this either as a discrete dynamics, or as a differential equation. In the discrete case, we have: R(t+1)=R(t)+I(t), and in the continuous case: dRdt=I(t).

The relationship between the current intelligence of the system and the currently accumulated quantity of research is governed by some function f: I(t)=f(R(t)).

The function f can be thought of as capturing the marginal rate of return of additional research on the actual intelligence of an AI. For example, if we think AI research is something like pumping water from a well—a task for which doubling the work doubles the return—then, we would model f as linear: f(x)=x. In this case, AI research does not exhibit any diminishing marginal returns: a unit of research gives us just as much benefit in terms of increased intelligence, no matter how much we already understand about intelligence. On the other hand, if we think that AI research should exhibit diminishing marginal returns—as many creative endeavors seem to—then we would model f as an increasing concave function. For example, we might let f(x)=√x, or f(x)=x2/3, or f(x)=x1/3, etc. If we are really pessimistic about the difficulty of AI, we might even model f(x)=log(x). In these cases, intelligence is still increasing in research effort, but the rate of increase as a function of research effort is diminishing, as we understand more and more about AI. Note however that the rate at which research is being conducted is increasing, which might still lead us to exponential growth in intelligence, if it increases fast enough.

So how does our choice of f affect intelligence growth rates? First, lets consider the case in which f(x)=x – the case of no diminishing marginal returns on research investment. Here is a plot of the growth over 1000 time steps in the discrete model:

Here, we see exponential growth in intelligence. (It isn’t hard to directly work out that in this case, in the discrete model, we have I(t)=2t, and in the continuous model, we have I(t)=et). And the plot illustrates the argument for worrying about AI risk now. Viewed at this scale, progress in AI appears to plod along at unimpressive levels before suddenly shooting up to an unimaginable level: in this case, a quantity if written down as a decimal that would have more than 300 zeros.

It isn’t surprising that if we were to model severely diminishing returns – say f(x)=log(x), that this would not occur. Below, we plot what happens when f(x)=log(x), with time taken out all the way to 1,000,000 rather than merely 1000 as in the above plot:

Intelligence growth is not very impressive here. At time 1,000,000 we haven’t even reached 17. If you wanted to reach (say) an intelligence level of 30 you’d have to wait an unimaginably long time. In this case, we definitely don’t need to worry about an “intelligence explosion”, and probably not even about ever reaching anything that could be called a super-intelligence.

But what about moderate (polynomial) levels of diminishing marginal returns. What if we take f(x)=x1/3? Lets see:

Ok – now we are making more progress, but even though intelligence now has a polynomial relationship to research (and research speed is increasing, in a chain reaction!) the rate of growth in intelligence is still decreasing. What about if f(x)=√x? Lets see:

At least now the rate of growth doesn’t seem to be decreasing: but it is growing only linearly with time. Hardly an explosion. Maybe we just need to get more aggressive in our modeling. What if f(x)=x2/3?

Ok, now we’ve got something! At least now the rate of intelligence gains is increasing with time. But it is increasing more slowly than a quadratic function – a far cry from the exponential growth that characterizes an intelligence explosion.

Lets take a break from all of this plotting. The model we wrote down is simple enough that we can just go and solve the differential equation. Suppose we have f(x)=x1−ϵ for some ϵ>0. Then the differential equation solves to give us: I(t)=((1+ϵt)1/ϵ)1−ϵWhat this means is that for any positive value of ϵ, in this model, intelligence grows at only a polynomial rate. The only way this model gives us exponential growth is if we take ϵ→0, and insist that f(x)=x – i.e. that the intelligence design problem does not exhibit any diminishing marginal returns at all.

Thoughts

So what do we learn from this exercise? Of course one can quibble with the details of the model, and one can believe different things about what form for the function f best approximates reality. But for me, this model helps crystallize the extent to which the “exponential intelligence explosion” story crucially relies on intelligence design being one of those rare tasks that doesn’t exhibit any decreasing marginal returns on effort at all. This seems unlikely to me, and counter to experience. Of course, there are technological processes out there that do appear to exhibit exponential growth, at least for a little while. Moore’s law is the most salient example. But it is important to remember that even exponential growth for a little while need not seem explosive at human time scales. Doubling every day corresponds to exponential growth, but so does increasing by 1% a year. To paraphrase Ed Felten: our retirement plans extend beyond depositing a few dollars into a savings account, and waiting for the inevitable “wealth explosion” that will make us unimaginably rich.

Postscript

I don’t claim that anything in this post is either novel or surprising to folks who spend their time thinking about this sort of thing. There is at least one paper that writes down a model including diminishing marginal returns, which yields a linear rate of intelligence growth.

It is also interesting to note that in the model we wrote down, exponential growth is really a knife edge phenomenon. We already observed that we get exponential growth if f(x)=x, but not if f(x)=x1−ϵ for any ϵ>0. But what if we have f(x)=x1+ϵ for ϵ>0? In that case, we don’t get exponential growth either! As Hadi Elzayn pointed out to me, Osgood’s Test tell us that in this case, the function I(t) contains an actual mathematical singularity – it approaches an infinite value in finite time.

As far as I can tell, this possibility of an exponentially-paced intelligence explosion is the main argument for folks devoting time to worrying about super-intelligent AI now, even though current technology doesn’t give us anything even close. So in the rest of this post, I want to push a little bit on the claim that the feedback loop induced by a self-improving AI would lead to exponential growth, and see what assumptions underlie it.

I think few AI safety advocates believe this. It’s much more common to expect growth to be faster than exponential. As you point out, exponential growth is a knife-edge phenomenon.

As far as I can tell, very few people actually think that intelligence growth would exhibit an actual mathematical singularity

This is actually a pretty common view—not a literal singularity, but rapid technological acceleration until natural resource limitations (e.g. on total available solar energy and raw minerals) start binding. If you look at the history of technological progress, it looks a whole lot more like a hyperbola than like an exponential curve, so the hyperbolic growth forecast isn’t so insane. It’s the person arguing that growth rates are going to stop at 3% who is arguing against the bulk of historical precedent (and whose predecessors would have been wrong if they’d expected growth to stop at 0.3% or 0.03% or 0.003%...).

this seems instead to be a metaphor for exponential growth.

I think “singularity” usually either follows Vinge’s use (as the point beyond which you can’t predict what will happen, because the future is guided by actors smarter than you are) or as a reference to the dynamic that would produce a mathematical singularity if left unchecked.

Thanks for writing this Aaron! (And for engaging with some of the common arguments for/against AI safety work.)

I personally am very uncertain about whether to expect a singularity/fast take-off (I think it is plausible but far from certain). Some reasons that I am still very interested in AI safety are the following:

I think AI safety likely involves solving a number of difficult conceptual problems, such that it would take >5 years (I would guess something like 10-30 years, with very wide error bars) of research to have solutions that we are happy with. Moreover, many of the relevant problems have short-term analogues that can be worked on today. (Indeed, some of these align with your own research interests, e.g. imputing value functions of agents from actions/decisions; although I am particularly interested in the agnostic case where the value function might lie outside of the given model family, which I think makes things much harder.)

I suppose the summary point of the above is that even if you think AI is a ways off (my median estimate is ~50 years, again with high error bars) research is not something that can happen instantaneously, and conceptual research in particular can move slowly due to being harder to work on / parallelize.

While I have uncertainty about fast take-off, that still leaves some probability that fast take-off will happen, and in that world it is an important enough problem that it is worth thinking about. (It is also very worthwhile to think about the probability of fast take-off, as better estimates would help to better direct resources even within the AI safety space.)

Finally, I think there are a number of important safety problems even from sub-human AI systems. Tech-driven unemployment is I guess the standard one here, although I spend more time thinking about cyber-warfare/autonomous weapons, as well as changes in the balance of power between nation-states and corporations. These are not as clearly an existential risk as unfriendly AI, but I think in some forms would qualify as a global catastrophic risk; on the other hand I would guess that most people who care about AI safety (at least on this website) do not care about it for this reason, so this is more idiosyncratic to me.

Happy to expand on/discuss any of the above points if you are interested.

Good points all; these are good reasons to work on AI safety (and of course as a theorist I’m very happy to think about interesting problems even if they don’t have immediate impact :-) I’m definitely interested in the short-term issues, and have been spending a lot of my research time lately thinking about fairness/privacy in ML. Inverse-RL/revealed preferences learning is also quite interesting, and I’d love to see some more theory results in the agnostic case.

In a more typical endogenous growth model, output is the product of physical capital (e.g. how many computers you have) and a technology factor (e.g. how smart you are). You can either invest in producing more capital (building more computers) or doing research (becoming smarter). On these models, even returns of xϵ still lead to a mathematical singularity (while constant technology leads to exponential growth).

From this perspective, you are investigating whether there is an intelligence explosion with finite capital. If productivity grows sublinearly with inputs, you need to build more machines (and ultimately extract more resources from nature) in order to grow really fast. This might suggest that getting to a singularity would take years rather than weeks, but doesn’t much change the qualitative conclusion or substantially change the urgency (especially given that the early phase of takeoff would be driven by moving resources over from lower productivity areas into higher productivity areas).

I think it’s a mistake to think of “productivity is linear in effort” as the “no diminishing returns” model, and to consider it a degenerate extreme case. Linear returns is the model where doubling inputs leads to doubled outputs. A priori, it’s nearly as natural for constant additional effort leads to doubling of efficiency, so we need to actually look at the data to distinguish.

(It seems more theoretically natural—and more common in practice—for each clever trick to lead to a 10% increase in efficiency, then for each clever trick to lead to an absolute increase of 1 unit of efficiency.)

In semiconductors, as you point out, output has increased exponentially over time. Research investment has also increased exponentially, but with a significantly smaller exponent. So on your model the curve appears to be xα for α>1.

The performance curves database contains many interesting time series, and you’ll note that the y-axis is typically exponential. They don’t track inputs, so it’s a bit hard to draw conclusions, but comparing to overall increases in R&D investment it looks like superlinear returns are probably quite common.

A few years ago Katja looked into the rate of algorithmic progress, and found that it was very approximately comparable to the rate of progress in hardware (though it’s hard to know how much of that comes from realizing increasing economies of scale w.r.t. compute), across a range of domains. Algorithms seem like a particularly relevant domain to the current discussion.

Thanks for the very thoughtful comments; lots to chew on. As I hope was clear, I’m just an interested outside observer, and have not spent very long thinking about these issues, and don’t know much of the literature. (My blog post ended up as a cross post here because I posted it to facebook, and asked if anyone could point me to more serious literature thinking about this problem, and a commenter suggested that I should crosspost here for feedback)

I agree that linear feedback is more plausible if we think of research breakthroughs as producing multiplicative gains, a simple point that I hadn’t thought about.

Eliezer did exactly this calculation in an old LW post. Unfortunately I have no idea how to find it. Fortunately the calculation comes out the same no matter who does it!

As far as I can tell, this possibility of an exponentially-paced intelligence explosion is the main argument for folks devoting time to worrying about super-intelligent AI now, even though current technology doesn’t give us anything even close.

Not at all. The reasons we should work on AI alignment now are:

AI alignment is a hard problem

We don’t know how long it will take us to solve it

We don’t know how long it will be until superintelligent AI becomes possible

There is no strong reason to believe we will know superintelligent AI is coming far in advance

“Current technology doesn’t give us anything even close” is not extremely informative since we don’t know the metric w.r.t. which “close” should be measured. Heavier than air flight was believed impossible by many, until the Wright brothers did it. The technology of 1929 didn’t give anything close to an atom bomb or a moon landing, and yet the atom bomb was made 16 years later, and the moon landing 40 years later.

Regarding the differential equations, I don’t think it’s a very meaningful analysis if you haven’t even defined the scale on which you measure intelligence. If I(x) is some measure of intelligence that grows exponentially, then log I(x) is another measure of intelligence which grows linearly, and if I(x) grows linearly then exp I(x) grows exponentially.

Also, you might be interested in this paper by Yudkowsky.

if you do want to analyze the plausibility of an intelligence explosion then it seems worthwhile to respond in detail to previous work

If you replace “analyze the plausibility” with “convincingly demonstrate to skeptics” then this seems right.

The OP seems to be written more in the spirit of exploration rather than conclusive argument though, which seems valuable and doesn’t necessarily require responding in detail to prior work (in this case ~100 pages). Seems like kind of a soul-crushing way to respond to curiosity :)

(I hope my own comments didn’t come across harshly.)

(1) As Paul noted, the question of the exponent alpha is just the question of diminishing returns vs returns-to-scale.

Especially if you believe that the rate f=f(R) is a product of multiple terms (like e.g. Paul’s suggestion f=Rαt⋅Rαa with one exponent for computer tech advances and another for algorithmic advances) then you get returns-to-scale type dynamics (over certain regimes, i.e. until all fruit are picked) with finite-time blow-up.

(2) Also, an imho crucial aspect is the separation of time-scales between human-driven research and computation done by machines (transistors are faster than neurons and buying more hardware scales better than training a new person up to the bleeding edge of research, especially considering Scott’s amusing parable of the alchemists).

Let’s add a little flourish to your model: You had the rate of research I and the cumulative research R ; let’s give a name C to the capability of the AI system. Then, we can model ∂tR=I=f(R)=g(C)=g(h(R)) . This is your model, just splitting terms into h, which tells us how hard AI progress is, and g which tells us how good we are at producing research.

Now denote by q=q(C) the fraction of work that absolutely has to be done by humans, and by ε the speed-up factor for silicon over biology. Amdahl’s law gives you g(C)=1q(C)+ε(1−q(C))C , or somewhat simplified g(C)≥1q+εC . This predicts a rate of progress that first looks like 1/q , as long as human researcher input is the limiting factor, then becomes 1/(εC) when we have AIs designing AIs (recursive self-improvement, aka explosion), and then probably saturates at something (when the AI approaches optimality).

The crucial argument for fast take-off (as far as I understood it) is that we can expect q(C) to hit q=0 at some cross-over C∗, and we can expect this to happen with a nonzero derivative ∂Cq(C∗)≠0 . This is just the claim that human-level AI is possible, and that the intelligence of the human parts of the AI research project is not sitting at a magical point (aka: this is generic, you would need to fine-tune your model to get something else).

The change of the rate of research output from the 1/q(C) regime to the 1/(εC) regime sure looks like a hard-take-off singularity to me! And I would like to note that the function h , i.e. the hardness AI research and the diminishing-returns vs returns-to-scale debate does not enter this discussion at any point.

In other words: If you model AI research as done by a team of humans and proto-AIs assisting the humans; and if you assert non-fungibility of humans vs proto-AI-assistents (even if you buy a thousand times more hardware, you still need the generally intelligent human researchers for some parts); and if you assert that better proto-AI-assistents can do a larger proportion of the work (at all); and if you assert that computers are faster than humans; then you get a possibly quite wild change at q=0.

I’d like to note that the cross-over is not “human-level AI”, but rather “q≈0” , i.e. an AI that needs (almost) no human assistence to progress the field of AI research.

On the opposing side (that’s what Robin Hanson would probably say) you have the empirical argument that q should decay like a power-law long before we q=0 (“the last 10% take 90% of the work” is a folk formulation for “percentile 90-99 take nine time as much work as percentile 0-89″ aka power law, and is borne out quite well, empirically).

This does not have any impact on whether we cross q=0 with non-vanishing derivative, but would support Paul’s view that the world will be unrecognizably crazy long before q=0 .

PS. I am currently agnostic about the hard vs soft take-off debate. Yeah, I know, cowardly cop-out.

edit: In the above, C kinda encodes how fast / good our AI is and q encodes how general it is compared to humans. All AI singularity stuff tacitly assumes that human intelligence (assisted by stupid proto-AI) is sufficiently general to design an AI that exceeds or matches the generality of human intelligence. I consider this likely. The counterfactual world would have our AI capabilities saturate at some subhuman level for a long time, using terribly bad randomized/evolutionary algorithms, until it either stumbles unto an AI design that has better generality or we suffer unrelated extinction/heat-death. I consider it likely that human intelligence (assisted by proto-AI) is sufficiently general for a take-off. Heat-death is not an exaggeration: Algorithms with exponentially bad run-time are effectively useless.

Conversely, I consider it very well possible that human intelligence is insufficiently general to understand how human intelligence works! (we are really, really bad at understanding evolution/gradient-descent optimized anything, an that’s what we are)

Thanks for the corrections. I changed the text to “in Berkeley”. How should FLI be described? (I was just cribbing from Scott’s FAQ when claiming it was at MIT)

I agree with this, but I think you have to remember that many things with diminishing returns also have accelerating returns earlier on.

That is to say, logistic curves are all over the place. Business growth, practicing a new instrument, functionality of a software project over time, learning a language through immersion...

It’s absolutely plausible for intelligence self-improvement to work for a few IQ points and then peter out, for some architecture. Humans, for example, are horrible at improving their own brains—but also see EURISKO. But I’m skeptical that returns are always going to be so sharply diminishing, and if everyone else is improving slowly, whatever system “goes critical” first is going to be the one that matters.

## Takeoff Speed: Simple Asymptotics in a Toy Model.

Link post

I’ve been having fun recently reading about “AI Risk”. There is lots of eloquent writing out there about this topic: I especially recommend Scott Alexander’s Superintelligence FAQ for those looking for a fun read. The subject has reached the public consciousness, with high profile people like Stephen Hawking and Elon Musk speaking publicly about it. There is also an increasing amount of funding and research effort being devoted to understanding AI risk. See for example the Future of Humanity Institute at Oxford, the Future of Life Institute at MIT, and the Machine Intelligence Research Institute in Berkeley, among others. These groups seem to be doing lots of interesting research, which I am mostly ignorant of. In this post I just want to talk about a simple exercise in asymptotics.

## First, Some Background.

A “superintelligent” AI is loosely defined to be an entity that is much better than we are at essentially any cognitive/learning/planning task. Perhaps, by analogy, a superintelligent AI is to human beings as human beings are to Bengal tigers, in terms of general intelligence. It shouldn’t be hard to convince yourself that if we were in the company of a superintelligence, then we would be very right to be worried: after all, it is intelligence that allows human beings to totally dominate the world and drive Bengal tigers to near extinction, despite the fact that tigers physiologically dominate humans in most other respects. This is the case even if the superintelligence doesn’t have the destruction of humanity as a goal per-se (after all, we don’t have it out for tigers), and even if the superintelligence is just an unconscious but super-powerful optimization algorithm. I won’t rehash the arguments here (Scott does it better) but it essentially boils down to the fact that it is quite hard to anticipate what the results of optimizing an objective function will be, if the optimization is done over a sufficiently rich space of strategies. And if we get it wrong, and the optimization has some severely unpleasant side-effects? It is tempting to suggest that at that point, we just unplug the computer and start over. The problem is that if we unplug the intelligence, it won’t do as well at optimizing its objective function compared to if it took steps to prevent us from unplugging it. So if it’s strategy space is rich enough so that it is able to take steps to defend itself, it will. Lots of the most interesting research in this field seems to be about how to align optimization objectives with our own desires, or simply how to write down objective functions that don’t induce the optimization algorithm to try and prevent us from unplugging it, while also not incentivizing the algorithm to unplug itself (the corrigibility problem).

Ok. It seems uncontroversial that a hypothetical superintelligence would be something we should take very seriously as a danger. But isn’t it premature to worry about this, given how far off it seems to be? We aren’t even that good at making product recommendations, let alone optimization algorithms so powerful that they might inadvertently destroy all of humanity. Even if superintelligence will ultimately be something to take very seriously, are we even in a position to productively think about it now, given how little we know about how such a thing might work at a technical level? This seems to be the position that Andrew Ng was taking, in his much quoted statement that (paraphrasing) worrying about the dangers of super-intelligence right now is like worrying about overpopulation on Mars. Not that it might not eventually be a serious concern, but that we will get a higher return investing our intellectual efforts right now on more immediate problems.

The standard counter to this is that super-intelligence might always seem like it is well beyond our current capabilities—maybe centuries in the future—until, all of a sudden, it appears as the result of an uncontrollable chain reaction known as an “intelligence explosion”, or “singularity”. (As far as I can tell, very few people actually think that intelligence growth would exhibit an actual mathematical singularity—this seems instead to be a metaphor for exponential growth.) If this is what we expect, then now might very well be the time to worry about super-intelligence. The first argument of this form was put forth by British mathematician I.J. Good (of Good-Turing Frequency Estimation!):

Scott Alexander summarizes the same argument a bit more quantitatively. In this passage, he is imagining the starting point being a full-brain simulation of Einstein—except run on faster hardware, so that our simulated Einstein operates at a much faster clock-speed than his historical namesake:

As far as I can tell, this possibility of an exponentially-paced intelligence explosion is the main argument for folks devoting time to worrying about super-intelligent AI

now, even though current technology doesn’t give us anything even close. So in the rest of this post, I want to push a little bit on the claim that the feedback loop induced by a self-improving AI would lead to exponential growth, and see what assumptions underlie it.## A Toy Model for Rates of Self Improvement

Lets write down an extremely simple toy model for how quickly the intelligence of a self improving system would grow, as a function of time. And I want to emphasize that the model I will propose is clearly a toy: it abstracts away everything that is interesting about the problem of designing an AI. But it should be sufficient to focus on a simple question of asymptotics, and the degree to which growth rates depend on the extent to which AI research exhibits diminishing marginal returns on investment. In the model, AI research accumulates with time: at time t, R(t) units of AI research have been conducted. Perhaps think of this as a quantification of the number of AI “breakthroughs” that have been made in Scott Alexander’s telling of the intelligence explosion argument. The intelligence of the system at time t, denoted I(t), will be some function of the accumulated research R(t). The model will make two assumptions:

The rate at which research is conducted is directly proportional to the current intelligence of the system. We can think about this either as a discrete dynamics, or as a differential equation. In the discrete case, we have: R(t+1)=R(t)+I(t), and in the continuous case: dRdt=I(t).

The relationship between the current intelligence of the system and the currently accumulated quantity of research is governed by some function f: I(t)=f(R(t)).

The function f can be thought of as capturing the

marginal rate of returnof additional research on the actual intelligence of an AI. For example, if we think AI research is something like pumping water from a well—a task for which doubling the work doubles the return—then, we would model f as linear: f(x)=x. In this case, AI research does not exhibit any diminishing marginal returns: a unit of research gives us just as much benefit in terms of increased intelligence, no matter how much we already understand about intelligence. On the other hand, if we think that AI research should exhibit diminishing marginal returns—as many creative endeavors seem to—then we would model f as an increasing concave function. For example, we might let f(x)=√x, or f(x)=x2/3, or f(x)=x1/3, etc. If we are really pessimistic about the difficulty of AI, we might even model f(x)=log(x). In these cases, intelligence is still increasing in research effort, but the rate of increase as a function of research effort is diminishing, as we understand more and more about AI. Note however that the rate at which research is being conducted is increasing, which might still lead us to exponential growth in intelligence, if it increases fast enough.So how does our choice of f affect intelligence growth rates? First, lets consider the case in which f(x)=x – the case of no diminishing marginal returns on research investment. Here is a plot of the growth over 1000 time steps in the discrete model:

Here, we see exponential growth in intelligence. (It isn’t hard to directly work out that in this case, in the discrete model, we have I(t)=2t, and in the continuous model, we have I(t)=et). And the plot illustrates the argument for worrying about AI risk

now. Viewed at this scale, progress in AI appears to plod along at unimpressive levels before suddenly shooting up to an unimaginable level: in this case, a quantity if written down as a decimal that would have more than 300 zeros.It isn’t surprising that if we were to model severely diminishing returns – say f(x)=log(x), that this would not occur. Below, we plot what happens when f(x)=log(x), with time taken out all the way to 1,000,000 rather than merely 1000 as in the above plot:

Intelligence growth is not very impressive here. At time 1,000,000 we haven’t even reached 17. If you wanted to reach (say) an intelligence level of 30 you’d have to wait an unimaginably long time. In this case, we definitely don’t need to worry about an “intelligence explosion”, and probably not even about

everreaching anything that could be called a super-intelligence.But what about moderate (polynomial) levels of diminishing marginal returns. What if we take f(x)=x1/3? Lets see:

Ok – now we are making more progress, but even though intelligence now has a polynomial relationship to research (and research speed is increasing, in a chain reaction!) the rate of growth in intelligence is still

decreasing. What about if f(x)=√x? Lets see:At least now the rate of growth doesn’t seem to be decreasing: but it is growing only linearly with time. Hardly an explosion. Maybe we just need to get more aggressive in our modeling. What if f(x)=x2/3?

Ok, now we’ve got something! At least now the rate of intelligence gains is

increasingwith time. But it is increasing more slowly than a quadratic function – a far cry from the exponential growth that characterizes an intelligence explosion.Lets take a break from all of this plotting. The model we wrote down is simple enough that we can just go and solve the differential equation. Suppose we have f(x)=x1−ϵ for some ϵ>0. Then the differential equation solves to give us: I(t)=((1+ϵt)1/ϵ)1−ϵWhat this means is that for

anypositive value of ϵ, in this model, intelligence grows at only a polynomial rate. The only way this model gives us exponential growth is if we take ϵ→0, and insist that f(x)=x – i.e. that the intelligence design problem does not exhibit any diminishing marginal returns at all.## Thoughts

So what do we learn from this exercise? Of course one can quibble with the details of the model, and one can believe different things about what form for the function f best approximates reality. But for me, this model helps crystallize the extent to which the “exponential intelligence explosion” story crucially relies on intelligence design being one of those rare tasks that doesn’t exhibit any decreasing marginal returns on effort at all. This seems unlikely to me, and counter to experience.

Of course, there

aretechnological processes out there that do appear to exhibit exponential growth, at least for a little while. Moore’s law is the most salient example. But it is important to remember that even exponential growth for a little while need notseemexplosive at human time scales. Doubling every day corresponds to exponential growth, but so does increasing by 1% a year. To paraphrase Ed Felten: our retirement plans extend beyond depositing a few dollars into a savings account, and waiting for the inevitable “wealth explosion” that will make us unimaginably rich.## Postscript

I don’t claim that anything in this post is either novel or surprising to folks who spend their time thinking about this sort of thing. There is at least one paper that writes down a model including diminishing marginal returns, which yields a linear rate of intelligence growth.

It is also interesting to note that in the model we wrote down, exponential growth is really a knife edge phenomenon. We already observed that we get exponential growth if f(x)=x, but not if f(x)=x1−ϵ for any ϵ>0. But what if we have f(x)=x1+ϵ for ϵ>0? In that case, we don’t get exponential growth either! As Hadi Elzayn pointed out to me, Osgood’s Test tell us that in this case, the function I(t) contains an actual mathematical singularity – it approaches an infinite value in finite time.

I think few AI safety advocates believe this. It’s much more common to expect growth to be faster than exponential. As you point out, exponential growth is a knife-edge phenomenon.

This is actually a pretty common view—not a literal singularity, but rapid technological acceleration until natural resource limitations (e.g. on total available solar energy and raw minerals) start binding. If you look at the history of technological progress, it looks a whole lot more like a hyperbola than like an exponential curve, so the hyperbolic growth forecast isn’t so insane. It’s the person arguing that growth rates are going to stop at 3% who is arguing against the bulk of historical precedent (and whose predecessors would have been wrong if they’d expected growth to stop at 0.3% or 0.03% or 0.003%...).

I think “singularity” usually either follows Vinge’s use (as the point beyond which you can’t predict what will happen, because the future is guided by actors smarter than you are) or as a reference to the dynamic that would produce a mathematical singularity if left unchecked.

Thanks for writing this Aaron! (And for engaging with some of the common arguments for/against AI safety work.)

I personally am very uncertain about whether to expect a singularity/fast take-off (I think it is plausible but far from certain). Some reasons that I am still very interested in AI safety are the following:

I think AI safety likely involves solving a number of difficult conceptual problems, such that it would take >5 years (I would guess something like 10-30 years, with very wide error bars) of research to have solutions that we are happy with. Moreover, many of the relevant problems have short-term analogues that can be worked on today. (Indeed, some of these align with your own research interests, e.g. imputing value functions of agents from actions/decisions; although I am particularly interested in the agnostic case where the value function might lie outside of the given model family, which I think makes things much harder.)

I suppose the summary point of the above is that even if you think AI is a ways off (my median estimate is ~50 years, again with high error bars) research is not something that can happen instantaneously, and conceptual research in particular can move slowly due to being harder to work on / parallelize.

While I have uncertainty about fast take-off, that still leaves some probability that fast take-off will happen, and in that world it is an important enough problem that it is worth thinking about. (It is also very worthwhile to think about the probability of fast take-off, as better estimates would help to better direct resources even within the AI safety space.)

Finally, I think there are a number of important safety problems even from sub-human AI systems. Tech-driven unemployment is I guess the standard one here, although I spend more time thinking about cyber-warfare/autonomous weapons, as well as changes in the balance of power between nation-states and corporations. These are not as clearly an existential risk as unfriendly AI, but I think in some forms would qualify as a global catastrophic risk; on the other hand I would guess that most people who care about AI safety (at least on this website) do not care about it for this reason, so this is more idiosyncratic to me.

Happy to expand on/discuss any of the above points if you are interested.

Best,

Jacob

Good points all; these are good reasons to work on AI safety (and of course as a theorist I’m very happy to think about interesting problems even if they don’t have immediate impact :-) I’m definitely interested in the short-term issues, and have been spending a lot of my research time lately thinking about fairness/privacy in ML. Inverse-RL/revealed preferences learning is also quite interesting, and I’d love to see some more theory results in the agnostic case.

In a more typical endogenous growth model, output is the product of physical capital (e.g. how many computers you have) and a technology factor (e.g. how smart you are). You can either invest in producing more capital (building more computers) or doing research (becoming smarter). On these models, even returns of xϵ still lead to a mathematical singularity (while constant technology leads to exponential growth).

From this perspective, you are investigating whether there is an intelligence explosion with finite capital. If productivity grows sublinearly with inputs, you need to build more machines (and ultimately extract more resources from nature) in order to grow really fast. This might suggest that getting to a singularity would take years rather than weeks, but doesn’t much change the qualitative conclusion or substantially change the urgency (especially given that the early phase of takeoff would be driven by moving resources over from lower productivity areas into higher productivity areas).

I think it’s a mistake to think of “productivity is linear in effort” as the “no diminishing returns” model, and to consider it a degenerate extreme case. Linear returns is the model where doubling inputs leads to doubled outputs. A priori, it’s nearly as natural for constant additional effort leads to doubling of efficiency, so we need to actually look at the data to distinguish.

(It seems more theoretically natural—and more common in practice—for each clever trick to lead to a 10% increase in efficiency, then for each clever trick to lead to an absolute increase of 1 unit of efficiency.)

In semiconductors, as you point out, output has increased exponentially over time. Research investment has also increased exponentially, but with a significantly smaller exponent. So on your model the curve appears to be xα for α>1.

The performance curves database contains many interesting time series, and you’ll note that the y-axis is typically exponential. They don’t track inputs, so it’s a bit hard to draw conclusions, but comparing to overall increases in R&D investment it looks like superlinear returns are probably quite common.

A few years ago Katja looked into the rate of algorithmic progress, and found that it was very approximately comparable to the rate of progress in hardware (though it’s hard to know how much of that comes from realizing increasing economies of scale w.r.t. compute), across a range of domains. Algorithms seem like a particularly relevant domain to the current discussion.

Hi all,

Thanks for the very thoughtful comments; lots to chew on. As I hope was clear, I’m just an interested outside observer, and have not spent very long thinking about these issues, and don’t know much of the literature. (My blog post ended up as a cross post here because I posted it to facebook, and asked if anyone could point me to more serious literature thinking about this problem, and a commenter suggested that I should crosspost here for feedback)

I agree that linear feedback is more plausible if we think of research breakthroughs as producing multiplicative gains, a simple point that I hadn’t thought about.

Eliezer did exactly this calculation in an old LW post. Unfortunately I have no idea how to find it. Fortunately the calculation comes out the same no matter who does it!

Not at all. The reasons we should work on AI alignment now are:

AI alignment is a hard problem

We don’t know how long it will take us to solve it

We don’t know how long it will be until superintelligent AI becomes possible

There is no strong reason to believe we will know superintelligent AI is coming far in advance

“Current technology doesn’t give us anything even close” is not extremely informative since we don’t know the metric w.r.t. which “close” should be measured. Heavier than air flight was believed

impossibleby many, until the Wright brothers did it. The technology of 1929 didn’t give anything close to an atom bomb or a moon landing, and yet the atom bomb was made 16 years later, and the moon landing 40 years later.Regarding the differential equations, I don’t think it’s a very meaningful analysis if you haven’t even defined the

scaleon which you measure intelligence. If I(x) is some measure of intelligence that grows exponentially, then log I(x) is another measure of intelligence which grows linearly, and if I(x) grows linearly then exp I(x) grows exponentially.Also, you might be interested in this paper by Yudkowsky.

If you replace “analyze the plausibility” with “convincingly demonstrate to skeptics” then this seems right.

The OP seems to be written more in the spirit of exploration rather than conclusive argument though, which seems valuable and doesn’t necessarily require responding in detail to prior work (in this case ~100 pages). Seems like kind of a soul-crushing way to respond to curiosity :)

(I hope my own comments didn’t come across harshly.)

You’re right, sorry. Edited.

(1) As Paul noted, the question of the exponent alpha is just the question of diminishing returns vs returns-to-scale.

Especially if you believe that the rate f=f(R) is a product of multiple terms (like e.g. Paul’s suggestion f=Rαt⋅Rαa with one exponent for computer tech advances and another for algorithmic advances) then you get returns-to-scale type dynamics (over certain regimes, i.e. until all fruit are picked) with finite-time blow-up.

(2) Also, an imho crucial aspect is the separation of time-scales between human-driven research and computation done by machines (transistors are faster than neurons and buying more hardware scales better than training a new person up to the bleeding edge of research, especially considering Scott’s amusing parable of the alchemists).

Let’s add a little flourish to your model: You had the rate of research I and the cumulative research R ; let’s give a name C to the capability of the AI system. Then, we can model ∂tR=I=f(R)=g(C)=g(h(R)) . This is your model, just splitting terms into h, which tells us how hard AI progress is, and g which tells us how good we are at producing research.

Now denote by q=q(C) the fraction of work that absolutely has to be done by humans, and by ε the speed-up factor for silicon over biology. Amdahl’s law gives you g(C)=1q(C)+ε(1−q(C))C , or somewhat simplified g(C)≥1q+εC . This predicts a rate of progress that first looks like 1/q , as long as human researcher input is the limiting factor, then becomes 1/(εC) when we have AIs designing AIs (recursive self-improvement, aka explosion), and then probably saturates at something (when the AI approaches optimality).

The crucial argument for fast take-off (as far as I understood it) is that we can expect q(C) to hit q=0 at some cross-over C∗, and we can expect this to happen with a nonzero derivative ∂Cq(C∗)≠0 . This is just the claim that human-level AI is possible, and that the intelligence of the human parts of the AI research project is not sitting at a magical point (aka: this is generic, you would need to fine-tune your model to get something else).

The change of the rate of research output from the 1/q(C) regime to the 1/(εC) regime sure looks like a hard-take-off singularity to me! And I would like to note that the function h , i.e. the hardness AI research and the diminishing-returns vs returns-to-scale debate does not enter this discussion at any point.

In other words: If you model AI research as done by a team of humans and proto-AIs assisting the humans; and if you assert non-fungibility of humans vs proto-AI-assistents (even if you buy a thousand times more hardware, you still need the generally intelligent human researchers for some parts); and if you assert that better proto-AI-assistents can do a larger proportion of the work (at all); and if you assert that computers are faster than humans; then you get a possibly quite wild change at q=0.

I’d like to note that the cross-over is not “human-level AI”, but rather “q≈0” , i.e. an AI that needs (almost) no human assistence to progress the field of AI research.

On the opposing side (that’s what Robin Hanson would probably say) you have the empirical argument that q should decay like a power-law long before we q=0 (“the last 10% take 90% of the work” is a folk formulation for “percentile 90-99 take nine time as much work as percentile 0-89″ aka power law, and is borne out quite well, empirically).

This does not have any impact on whether we cross q=0 with non-vanishing derivative, but would support Paul’s view that the world will be unrecognizably crazy long before q=0 .

PS. I am currently agnostic about the hard vs soft take-off debate. Yeah, I know, cowardly cop-out.

edit: In the above, C kinda encodes how fast / good our AI is and q encodes how general it is compared to humans. All AI singularity stuff tacitly assumes that human intelligence (assisted by stupid proto-AI) is sufficiently general to design an AI that exceeds or matches the generality of human intelligence. I consider this likely. The counterfactual world would have our AI capabilities saturate at some subhuman level for a long time, using terribly bad randomized/evolutionary algorithms, until it either stumbles unto an AI design that has better generality or we suffer unrelated extinction/heat-death. I consider it likely that human intelligence (assisted by proto-AI) is sufficiently general for a take-off. Heat-death is not an exaggeration: Algorithms with exponentially bad run-time are effectively useless.

Conversely, I consider it very well possible that human intelligence is insufficiently general to understand how human intelligence works! (we are really, really bad at understanding evolution/gradient-descent optimized anything, an that’s what we are)

Just wanted to clarify that MIRI is

inBerkeley (the city), but is not affiliated with UC Berkeley (the university).Very minor nitpick, but just to add, FLI is as far as I know not formally affiliated with MIT. (FHI is in fact a formal institute at Oxford.)

Thanks for the corrections. I changed the text to “in Berkeley”. How should FLI be described? (I was just cribbing from Scott’s FAQ when claiming it was at MIT)

You could say that it’s in Cambridge, MA...

See more here: https://en.wikipedia.org/wiki/Future

ofLife_InstituteAre you open to me copying over the complete content of the post? This makes it easier for people to reference and read over here.

Sure

Done! (with proper LaTeX rendering!)

Thanks!

I agree with this, but I think you have to remember that many things with diminishing returns also have accelerating returns earlier on.

That is to say, logistic curves are all over the place. Business growth, practicing a new instrument, functionality of a software project over time, learning a language through immersion...

It’s absolutely plausible for intelligence self-improvement to work for a few IQ points and then peter out, for some architecture. Humans, for example, are horrible at improving their own brains—but also see EURISKO. But I’m skeptical that returns are always going to be so sharply diminishing, and if everyone else is improving slowly, whatever system “goes critical” first is going to be the one that matters.