Why all the fuss about recursive self-improvement?

This article was outlined by Nate Soares, inflated by Rob Bensinger, and then edited by Nate. Content warning: the tone of this post feels defensive to me. I don’t generally enjoy writing in “defensive” mode, but I’ve had this argument thrice recently in surprising places, and so it seemed worth writing my thoughts up anyway.

In last year’s Ngo/Yudkowsky conversation, one of Richard’s big criticisms of Eliezer was, roughly, ‘Why the heck have you spent so much time focusing on recursive self-improvement? Is that not indicative of poor reasoning about AGI?’

I’ve heard similar criticisms of MIRI and FHI’s past focus on orthogonality and instrumental convergence: these notions seem obvious, so either MIRI and FHI must be totally confused about what the big central debates in AI alignment are, or they must have some very weird set of beliefs on which these notions are somehow super-relevant.

This seems to be a pretty common criticism of past-MIRI (and, similarly, of past-FHI); in the past month or so, I’ve heard it two other times while talking to other OpenAI and Open Phil people.

This argument looks misguided to me, and I hypothesize that a bunch of the misguidedness is coming from a simple failure to understand the relevant history.

I joined this field in 2013-2014, which is far from “early”, but is early enough that I can attest that recursive self-improvement, orthogonality, etc. were geared towards a different argumentative environment, one dominated by claims like “AGI is impossible”, “AGI won’t be able to exceed humans by much”, and “AGI will naturally be good”.

A possible response: “Okay, but ‘sufficiently smart AGI will recursively self-improve’ and ‘AI isn’t automatically nice’ are still obvious. You should have just ignored the people who couldn’t immediately see this, and focused on the arguments that would be relevant to hypothetical savvy people in the future, once the latter joined in the discussion.”

I have some sympathy for this argument. Some considerations weighing against, though, are:

I think it makes more sense to filter on argument validity, rather than “obviousness”. What’s obvious varies a lot from individual to individual. If just about everyone talking about AGI is saying “obviously false” things (as was indeed the case in 2010), then it makes sense to at least try publicly writing up the obvious counter-arguments.
This seems to assume that the old arguments (e.g., in Superintelligence) didn’t work. In contrast, I think it’s quite plausible that “everyone with a drop of sense in them agrees with those arguments today” is true in large part because these propositions were explicitly laid out and argued for in the past. The claims we take as background now are the claims that were fought for by the old guard.
I think this argument overstates how many people in ML today grok the “obvious” points. E.g., based on a recent DeepMind Podcast episode, these sound like likely points of disagreement with David Silver.

But even if you think this was a strategic error, I still think it’s important to recognize that MIRI and FHI were arguing correctly against the mistaken views of the time, rather than arguing poorly against future views.

Recursive self-improvement

Why did past-MIRI talk so much about recursive self-improvement? Was it because Eliezer was super confident that humanity was going to get to AGI via the route of a seed AI that understands its own source code?

I doubt it. My read is that Eliezer did have “seed AI” as a top guess, back before the deep learning revolution. But I don’t think that’s the main source of all the discussion of recursive self-improvement in the period around 2008.

Rather, my read of the history is that MIRI was operating in an argumentative environment where:

Ray Kurzweil was claiming things along the lines of ‘Moore’s Law will continue into the indefinite future, even past the point where AGI can contribute to AGI research.’ (The Five Theses, in 2013, is a list of the key things Kurzweilians were getting wrong.)
Robin Hanson was claiming things along the lines of ‘The power is in the culture; superintelligences wouldn’t be able to outstrip the rest of humanity.’

The memetic environment was one where most people were either ignoring the topic altogether, or asserting ‘AGI cannot fly all that high’, or asserting ‘AGI flying high would be business-as-usual (e.g., with respect to growth rates)’.

The weighty conclusion of the “recursive self-improvement” meme is not “expect seed AI”. The weighty conclusion is “sufficiently smart AI will rapidly improve to heights that leave humans in the dust”.

Note that this conclusion is still, to the best of my knowledge, completely true, and recursive self-improvement is a correct argument for it.

Which is not to say that recursive self-improvement happens before the end of the world; if the first AGI’s mind is sufficiently complex and kludgy, it’s entirely possible that the cognitions it implements are able to (e.g.) crack nanotech well enough to kill all humans, before they’re able to crack themselves.

The big update over the last decade has been that humans might be able to fumble their way to AGI that can do crazy stuff before it does much self-improvement.

(Though, to be clear, from my perspective it’s still entirely plausible that you will be able to turn the first general reasoners to their own architecture and get a big boost, and so there’s still a decent chance that self-improvement plays an important early role. (Probably destroying the world in the process, of course. Doubly so given that I expect it’s even harder to understand and align a system if it’s self-improving.))

In other words, it doesn’t seem to me like developments like deep learning have undermined the recursive self-improvement argument in any real way. The argument seems solid to me, and reality seems quite consistent with it.

Taking into account its past context, recursive self-improvement was a super conservative argument that has been vindicated in its conservatism.

It was an argument for the proposition “AGI will be able to exceed the heck out of humans”. And AlphaZero came along and was like, “Yep, that’s true.”

Recursive self-improvement was a super conservative argument for “AI blows past human culture eventually”; when reality then comes along and says “yes, this happens in 2016 when the systems are far from truly general”, the update to make is that this way of thinking about AGI sharply outperformed, not that this way of thinking was silly because it talked about sci-fi stuff like recursive self-improvement when it turns out you can do crazy stuff without even going that far. As Eliezer put it, “reality held a more extreme position than I did on the Yudkowsky-Hanson spectrum”.

If arguments like recursive self-improvement and orthogonality seem irrelevant and obvious now, then great! Intellectual progress has been made. If we’re lucky and get to the next stop on the train, then I’ll hopefully be able to link back to this post when people look back and ask why we were arguing about all these other silly obvious things back in 2022.

Deep learning

I think “MIRI staff spent a bunch of time talking about instrumental convergence, orthogonality, recursive self-improvement, etc.” is a silly criticism.

On the other hand, I think “MIRI staff were slow to update about how far deep learning might go” is a fair criticism, and we lose Bayes points here, especially relative to people who were vocally bullish about deep learning before late 2015 / early 2016.

In 2003, deep learning didn’t work, and nothing else worked all that well either. A reasonable guess was that we’d need to understand intelligence in order to get unstuck; and if you understand intelligence, then an obvious way to achieve superintelligence is to build a simple, small, clean AI that can take over the hard work of improving itself. This is the idea of “seed AI”, as I understand it. I don’t think 2003-Eliezer thought this direction was certain, but I think he had a bunch of probability mass on it.^[1]

I think that Eliezer’s model was somewhat surprised by humanity’s subsequent failure to gain much understanding of intelligence, and also by the fact that humanity was able to find relatively brute-force-ish methods that were computationally tractable enough to produce a lot of intelligence anyway.

But I also think this was a reasonable take in 2003. Other people had even better takes — Shane Legg comes to mind. He stuck his neck out early with narrow predictions that panned out. Props to Shane.

I personally had run-of-the-mill bad ideas about AI as late as 2010, and didn’t turn my attention to this field until about 2013, which means that I lost a bunch of Bayes points relative to the people who managed to figure out in 1990 or 2000 that AGI will be our final invention. (Yes, even if the people who called it in 2000 were expecting seed AI rather than deep learning, back when nothing was really working. I reject the Copenhagen Theory Of Forecasting, according to which you gain special epistemic advantage from not having noticed the problem early enough to guess wrongly.)

My sense is that MIRI started taking the deep learning revolution much more seriously in 2013, while having reservations about whether broadly deep-learning-like techniques would be the first way humanity reached AGI. Even now, it’s not completely obvious to me that this will be the broad paradigm in which AGI is first developed, though something like that seems fairly likely at this point. But, if memory serves, during the Jan. 2015 Puerto Rico conference I was treating the chance of deep learning going all the way as being in the 10-40% range; so I don’t think it would be fair to characterize me as being totally blindsided.

My impression is that Eliezer and I, at least, updated harder in ²⁰¹⁵⁄₁₆, in the wake of AlphaGo, than a bunch of other locals (and I, at least, think I’ve been less surprised than various other vocal locals by GPT, PaLM, etc. in recent years).

Could we have done better? Yes. Did we lose Bayes points? Yes, especially relative to folks like Shane Legg.

But since 2016, it mostly looks to me like with each AGI advancement, others update towards my current position. So I’m feeling pretty good about the predictive power of my current models.

Maybe this all sounds like revisionism to you, and your impression of FOOM-debate-era Eliezer was that he loved GOFAI and thought recursive self-improvement was the only advantage digital intelligence could have over human intelligence.

And, I wasn’t here in that era. But I note that Eliezer said the opposite at the time; and the track record for such claims seems to hold more examples of “mistakenly rounding the other side’s views off to a simpler, more-cognitively-available caricature”, and fewer examples of “peering past the veil of the author’s text to see his hidden soul”.

Also: It’s important to ask proponents of a theory what they predict will happen, before crowing about how their theory made a misprediction. You’re always welcome to ask for my predictions in advance.

(I’ve been making this offer to people who disagree with me about whether I have egg on my face since 2015, and have rarely been taken up on it. E.g.: yes, we too predict that it’s easy to get GPT-3 to tell you the answers that humans label “aligned” to simple word problems about what we think of as “ethical”, or whatever. That’s never where we thought the difficulty of the alignment problem was in the first place. Before saying that this shows that alignment is actually easy contra everything MIRI folk said, consider asking some MIRI folk for their predictions about what you’ll see.)

^
In particular, I think Eliezer’s best guess was AI systems that would look small, clean, and well-understood relative to the large opaque artifacts produced by deep learning. That doesn’t mean that he was picturing GOFAI; there exist a wide range of possibilities of the form “you understand intelligence well enough to not have to hand off the entire task to a gradient-descent-ish process to do it for you” that do not reduce to “coding everything by hand”, and certainly don’t reduce to “reasoning deductively rather than probabilistically”.