Will AI undergo discontinuous progress?

This post grew out of con­ver­sa­tions with sev­eral peo­ple, in­clud­ing Daniel Koko­ta­jlo, grue_slinky and Linda Lise­fors, and is based in large part on a col­lec­tion of scat­tered com­ments and blog-posts across less­wrong, along with some pod­cast in­ter­views—e.g. here. The in-text links near quotes will take you to my sources.

I am at­tempt­ing to dis­t­in­guish two pos­si­bil­ities which are of­ten run to­gether—that progress in AI to­wards AGI (‘take­off’) will be dis­con­tin­u­ous and that it will be fast, but con­tin­u­ous. Re­solv­ing this dis­tinc­tion also ad­dresses the claim that there has been a sig­nifi­cant shift in ar­gu­ments for AI pre­sent­ing an ex­is­ten­tial risk: from older ar­gu­ments dis­cussing an ul­tra-fast in­tel­li­gence ex­plo­sion oc­cur­ring in a sin­gle ‘seed AI’ to more mod­er­ate sce­nar­ios.

I ar­gue that the ‘shift in ar­gu­ments on AI safety’ is not a to­tal change in ba­sic as­sump­tions (which some ob­servers have claimed) but just a re­duc­tion in con­fi­dence about a speci­fi­cally dis­con­tin­u­ous take­off. Fi­nally, I try to ex­plic­itly op­er­a­tional­ize the prac­ti­cal differ­ences be­tween dis­con­tin­u­ous take­off and fast, con­tin­u­ous take­off.

Fur­ther Reading

Sum­mary: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from Longtermists

Paul Chris­ti­ano’s origi­nal post

MIRIs Thoughts on Dis­con­tin­u­ous takeoff

Mis­con­cep­tions about con­tin­u­ous takeoff

AI Im­pacts origi­nal post

Soft Take­off can still lead to De­ci­sive Strate­gic Advantage

Defin­ing Dis­con­tin­u­ous Progress

What do I mean by ‘dis­con­tin­u­ous’? If we were to graph world GDP over the last 10,000 years, it fits onto a hy­per­bolic growth pat­tern. We could call this ‘con­tin­u­ous’ since it is fol­low­ing a sin­gle trend, or we could call it ‘dis­con­tin­u­ous’ be­cause, on the scale of mil­len­nia, the in­dus­trial rev­olu­tion ex­ploded out of nowhere. I will call these sorts of hy­per­bolic trends ‘con­tin­u­ous, but fast’, in line with Paul Chris­ti­ano, who ar­gued for con­tin­u­ous take­off, defin­ing it this way:

AI is just an­other, faster step in the hy­per­bolic growth we are cur­rently ex­pe­rienc­ing, which cor­re­sponds to a fur­ther in­crease in rate but not a dis­con­ti­nu­ity (or even a dis­con­ti­nu­ity in rate).

I’ll be us­ing Paul’s un­der­stand­ing of ‘dis­con­tin­u­ous’ and ‘fast’ here. For progress in AI to be dis­con­tin­u­ous, we need a switch to a new growth mode, which will show up as a step func­tion in the ca­pa­bil­ity of AI or in the rate of change of the ca­pa­bil­ity of the AI over time. For take­off to be fast, it is enough that there is one sin­gle growth mode that is hy­per­bolic or some other func­tion that is very fast-grow­ing.

The view that progress in AI will be dis­con­tin­u­ous, not merely very fast by nor­mal hu­man stan­dards, was pop­u­lar and is still held by many. Here is a canon­i­cal ex­pla­na­tion of the view, from Eliezer Yud­kowsky in 2008. Com­pare this to the more re­cent ‘what failure looks like’ to un­der­stand the in­tu­itive force of the claim that views on AI risk have to­tally changed since 2008.

Re­cur­sive self-im­prove­ment—an AI rewrit­ing its own cog­ni­tive al­gorithms—iden­ti­fies the ob­ject level of the AI with a force act­ing on the metacog­ni­tive level; it “closes the loop” or “folds the graph in on it­self”...
...When you fold a whole chain of differ­en­tial equa­tions in on it­self like this, it should ei­ther pe­ter out rapidly as im­prove­ments fail to yield fur­ther im­prove­ments, or else go FOOM. An ex­actly right law of diminish­ing re­turns that lets the sys­tem fly through the soft take­off key­hole is un­likely—far more un­likely than see­ing such be­hav­ior in a sys­tem with a roughly-con­stant un­der­ly­ing op­ti­mizer, like evolu­tion im­prov­ing brains, or hu­man brains im­prov­ing tech­nol­ogy. Our pre­sent life is no good in­di­ca­tor of things to come.
Or to try and com­press it down to a slo­gan that fits on a T-Shirt—not that I’m say­ing this is a good idea—“Moore’s Law is ex­po­nen­tial now; it would be re­ally odd if it stayed ex­po­nen­tial with the im­prov­ing com­put­ers do­ing the re­search.” I’m not say­ing you liter­ally get dy/​dt = e^y that goes to in­finity af­ter finite time—and hard­ware im­prove­ment is in some ways the least in­ter­est­ing fac­tor here—but should we re­ally see the same curve we do now?
RSI is the biggest, most in­ter­est­ing, hard­est-to-an­a­lyze, sharpest break-with-the-past con­tribut­ing to the no­tion of a “hard take­off” aka “AI go FOOM”, but it’s nowhere near be­ing the only such fac­tor. The ad­vent of hu­man in­tel­li­gence was a dis­con­ti­nu­ity with the past even with­out RSI...
...which is to say that ob­served evolu­tion­ary his­tory—the dis­con­ti­nu­ity be­tween hu­mans, and chimps who share 95% of our DNA—lightly sug­gests a crit­i­cal thresh­old built into the ca­pa­bil­ities that we think of as “gen­eral in­tel­li­gence”, a ma­chine that be­comes far more pow­er­ful once the last gear is added.

Also see these quotes, one sum­ma­riz­ing the view by Paul Chris­ti­ano and one re­cent re­mark from Rob Besinger, sum­ma­riz­ing the two key rea­sons given above for ex­pect­ing dis­con­tin­u­ous take­off, re­cur­sive self-im­prove­ment and a dis­con­ti­nu­ity in ca­pa­bil­ity.

some sys­tems “fiz­zle out” when they try to de­sign a bet­ter AI, gen­er­at­ing a few im­prove­ments be­fore run­ning out of steam, while oth­ers are able to au­tonomously gen­er­ate more and more im­prove­ments.
MIRI folks tend to have differ­ent views from Paul about AGI, some of which im­ply that AGI is more likely to be novel and de­pen­dent on new in­sights.

I will ar­gue that the more re­cent re­duc­tion of con­fi­dence in dis­con­tin­u­ous take­off is cor­rect, but at the same time many of the ‘origi­nal’ ar­gu­ments (e.g. those given in 2008 by Yud­kowsky) for fast, dis­con­tin­u­ous take­off are not mis­taken and can be seen as also sup­port­ing fast, con­tin­u­ous take­off.

We should se­ri­ously in­ves­ti­gate the con­tin­u­ous/​dis­con­tin­u­ous dis­tinc­tion speci­fi­cally by nar­row­ing our fo­cus onto ar­gu­ments that ac­tu­ally dis­t­in­guish be­tween the two: con­cep­tual in­ves­ti­ga­tion about the na­ture of fu­ture AGI, and the prac­ti­cal con­se­quences for al­ign­ment work of con­tin­u­ous/​dis­con­tin­u­ous take­off.

I have tried to pre­sent the ar­gu­ments in or­der from least to most con­tro­ver­sial, start­ing with the out­side view on tech­nolog­i­cal progress.

There have been other posts dis­cussing ar­gu­ments for dis­con­tin­u­ous progress with ap­prox­i­mately this fram­ing. I am not go­ing to re­peat their good work here by run­ning over ev­ery ar­gu­ment and coun­ter­ar­gu­ment (See Fur­ther Read­ing). What I’m try­ing to do here is get at the un­der­ly­ing as­sump­tions of ei­ther side.

The Out­side View

There has re­cently been a switch be­tween talk­ing about AI progress be­ing fast vs slow to talk­ing about it as con­tin­u­ous vs dis­con­tin­u­ous. Paul Chris­ti­ano ex­plains what con­tin­u­ous progress means:

I be­lieve that be­fore we have in­cred­ibly pow­er­ful AI, we will have AI which is merely very pow­er­ful. This won’t be enough to cre­ate 100% GDP growth, but it will be enough to lead to (say) 50% GDP growth. I think the likely gap be­tween these events is years rather than months or decades.

It is not an es­sen­tial part of the defi­ni­tion that the gap be years; even if the gap is rather short, we still call it con­tin­u­ous take­off if AI progress in­creases with­out sud­den jumps; ca­pa­bil­ity in­creases fol­low­ing se­ries of lo­gis­tic curves merg­ing to­gether as differ­ent com­po­nent tech­nolo­gies are in­vented.

This way of un­der­stand­ing progress as be­ing ‘dis­con­tin­u­ous’ isn’t un­con­tro­ver­sial, but it was de­vel­oped be­cause call­ing the take­off ‘slow’ in­stead of ‘fast’ could be seen as a mis­nomer. Con­tin­u­ous take­off is a state­ment about what hap­pens be­fore we reach the point where a fast take­off is sup­posed to hap­pen, and is perfectly con­sis­tent with the claim that given the stated pre­con­di­tions for fast take­off, fast take­off will hap­pen. It’s a state­ment that se­ri­ous prob­lems, pos­si­bly se­ri­ous enough to pose an ex­is­ten­tial threat, will show up be­fore the win­dow where we ex­pect fast take­off sce­nar­ios to oc­cur.

Out­side view: Technology

The start­ing point for the ar­gu­ment that progress in AI should be con­tin­u­ous is just the ob­ser­va­tion that this is usu­ally how things work with a tech­nol­ogy, es­pe­cially in a situ­a­tion where progress is be­ing driven by many ac­tors work­ing at a prob­lem from differ­ent an­gles. If you can do some­thing well in 1 year, it is usu­ally pos­si­ble to do it slightly less well in 0.9 years. Why is it ‘usu­ally pos­si­ble’? Be­cause nearly all tech­nolo­gies in­volve nu­mer­ous smaller in­no­va­tions, and it is usu­ally pos­si­ble to get some­what good re­sults with­out some of them. That is why even if each in­di­vi­d­ual com­po­nent in­no­va­tion fol­lows a lo­gis­tic suc­cess curve that is un­sta­ble be­tween ‘doesnt work at all’ and ‘works’ if you were to plot re­sults/​use­ful­ness against effort, progress looks con­tin­u­ous. Note this is what is ex­plic­itly re­jected by Yud­kowsky-2008.

When this doesn’t hap­pen and we get dis­con­tin­u­ous progress, it is be­cause of one of two rea­sons—ei­ther there is some fun­da­men­tal rea­son why the tech­nol­ogy can­not work at all with­out all the pieces lin­ing up in place, or there is just not much effort be­ing put in, so a few ac­tors can leap ahead of the rest of the world and make sev­eral of the com­po­nent break­throughs in rapid suc­ces­sion.

I’ll go through three illus­tra­tive ex­am­ples of each situ­a­tion—the nor­mal case, the low-effort case and the fun­da­men­tal rea­sons case.

Guns

Guns fol­lowed the con­tin­u­ous progress model. They started out worse than com­pet­ing ranged weapons like cross­bows. By the 15th cen­tury, the Arque­bus ar­rived, which had some ad­van­tages and dis­ad­van­tages com­pared to the cross­bow (eas­ier to use, more dam­ag­ing, but slower to fire and much less ac­cu­rate). Then came the mus­ket, and later rifles. There were many in­di­vi­d­ual in­ven­tions that went from ‘not ex­ist­ing at all’ to ‘ex­ist­ing’ in rel­a­tively short in­ter­vals, but the over­all progress looked roughly con­tin­u­ous. How­ever, the speed of progress still in­creased dra­mat­i­cally dur­ing the in­dus­trial rev­olu­tion and con­tinued to in­crease, with­out ever be­ing ‘dis­con­tin­u­ous’.

Aircraft

Look­ing speci­fi­cally at heav­ier-than-air flight, it seems clear enough that we went from ‘not be­ing able to do this at all’ to be­ing able to do it in a rel­a­tively short time—dis­con­tin­u­ous progress. The Wright broth­ers re­search drew on a few other pi­o­neers like Otto Lilien­thal, but they still made a rapid se­ries of break­throughs in a fairly short pe­riod: us­ing a wind tun­nel to rapidly pro­to­type wing shapes, build­ing a suffi­ciently lightweight en­g­ine, work­ing out a method of con­trol. This was pos­si­ble be­cause at the time, un­like guns, a very small frac­tion of hu­man re­search effort was go­ing into de­vel­op­ing flight. It was also pos­si­ble for an­other rea­son—the na­ture of the prob­lem im­plied that suc­cess was always go­ing to be dis­con­tin­u­ous. While there are a few in­ter­me­di­ate steps, like gliders, there aren’t many be­tween ‘not be­ing able to fly’ and ‘be­ing able to fly’, so progress was un­ex­pected and dis­con­tin­u­ous. I think we can at­tribute most of the flight case to the low de­gree of effort on a global scale.

Nu­clear Weapons

Nu­clear Weapons are a purer case of fun­da­men­tal phys­i­cal facts caus­ing a dis­con­ti­nu­ity. A fis­sion chain re­ac­tion sim­ply will not oc­cur with­out many break­throughs all be­ing brought to­gether, so even if a sig­nifi­cant frac­tion of all the world’s de­sign effort is de­voted to re­search we still get dis­con­tin­u­ous progress. If the Man­hat­tan pro­ject had ura­nium cen­trifuges but didn’t have the Monte Carlo al­gorithms needed to prop­erly simu­late the dy­nam­ics of the sphere’s im­plo­sion, they would just have had some mostly-use­less metal. It’s worth not­ing here that a lot of early writ­ing about an in­tel­li­gence ex­plo­sion ex­plic­itly com­pares it to a nu­clear chain re­ac­tion.

AI Im­pacts has done a far more thor­ough and in-depth in­ves­ti­ga­tion of progress along var­i­ous met­rics, con­firm­ing the in­tu­ition that dis­con­tin­u­ous progress usu­ally oc­curs where fun­da­men­tal phys­i­cal facts im­ply it—e.g. as we switch be­tween differ­ent meth­ods of travel or com­mu­ni­ca­tion.

We should ex­pect, prima fa­cie, a ran­dom ex­am­ple of tech­nolog­i­cal progress to be con­tin­u­ous, and we need spe­cific, good rea­sons to think that progress will be dis­con­tin­u­ous. On the other hand, there are more than a few ex­am­ples of dis­con­tin­u­ous progress caused ei­ther by the na­ture of the prob­lem or differ­en­tial effort, so there is not a colos­sal bur­den of proof on dis­con­tin­u­ous progress. I think that both the MIRI peo­ple (who ar­gue for dis­con­tin­u­ous progress) and Chris­ti­ano (con­tin­u­ous) pretty much agree about this ini­tial point.

A shift in ba­sic as­sump­tions?

The ar­gu­ment has re­cently been made (by Will MacAskill on the 80,000 hours pod­cast) that there has been a switch from the ‘old’ ar­gu­ments fo­cussing on a seed AI lead­ing to an in­tel­li­gence ex­plo­sion to new ar­gu­ments:

Paul’s pub­lished on this, and said he doesn’t think doom looks like a sud­den ex­plo­sion in a sin­gle AI sys­tem that takes over. In­stead he thinks grad­u­ally just AI’s get more and more and more power and they’re just some­what mis­al­igned with hu­man in­ter­ests. And so in the end you kind of get what you can measure

And that these ar­gu­ments don’t have very much in com­mon with the older Bostrom/​Yud­kowsky sce­nario (of a sin­gle AGI un­der­go­ing an in­tel­li­gence ex­plo­sion) - ex­cept the con­clu­sion that AI pre­sents a uniquely dan­ger­ous ex­is­ten­tial risk. If true, this would be a cause for con­cern as it would sug­gest we haven’t be­come much less con­fused about ba­sic ques­tions over the last decade. MacAskill again:

I have no con­cep­tion of how com­mon ad­her­ence to differ­ent ar­gu­ments are — but cer­tainly many of the most promi­nent peo­ple are no longer push­ing the Bostrom ar­gu­ments.

If you look back to the sec­tion on Defin­ing Dis­con­tin­u­ous Progress, this will seem plau­si­ble—the very rapid an­niha­la­tion caused by a sin­gle mis­al­igned AGI go­ing FOOM and the ac­cu­mu­la­tion of ‘you get what you mea­sure’ er­rors in com­plex sys­tems that com­pound seem like to­tally differ­ent con­cerns.

De­spite this, I will show later how the old ar­gu­ments for dis­con­tin­u­ous progress and the new ar­gu­ments for con­tin­u­ous progress share a lot in com­mon.

I claim that the Bostrom/​Yud­kowsky ar­gu­ment for an in­tel­li­gence ex­plo­sion es­tab­lishes a suffi­cient con­di­tion for very rapid growth, and the cur­rent dis­agree­ment is about what hap­pens be­tween now and that point. This should raise our con­fi­dence that some ba­sic is­sues re­lated to AI timelines are re­solved. How­ever, the fact that this claim, if true, has not been rec­og­nized and that dis­cus­sion of these is­sues is still as frag­mented as it is should be a cause for con­cern more gen­er­ally.

I will now turn to in­side-view ar­gu­ments for dis­con­tin­u­ous progress, be­gin­ning with the In­tel­li­gence ex­plo­sion, to jus­tify what I have just claimed.

Failed Ar­gu­ments for Discontinuity

If you think AI progress will be dis­con­tin­u­ous, it is gen­er­ally be­cause you think AI is a spe­cial case like Nu­clear Weapons where sev­eral break­throughs need to com­bine to pro­duce a sud­den gain in ca­pa­bil­ity, or that one big break­through pro­duces nearly all the in­crease on its own. If you think we will cre­ate AGI at all, it is gen­er­ally be­cause you think it will be hugely eco­nom­i­cally valuable, so the low-effort case like flight does not ap­ply—if there are ways to pro­duce and de­ploy slightly worse trans­for­ma­tive AIs sooner, they will prob­a­bly be found.

The ar­gu­ments for AI be­ing a spe­cial case ei­ther in­voke spe­cific ev­i­dence from how cur­rent progress in Ma­chine Learn­ing looks or from hu­man evolu­tion­ary his­tory, or are con­cep­tual. I be­lieve (along with Paul Chris­ti­ano) the ev­i­den­tial ar­gu­ments aren’t that use­ful, and my con­clu­sion will be that we’re left try­ing to as­sess the con­cep­tual ar­gu­ments about the na­ture of in­tel­li­gence, which are hard to judge. I’ll offer some ways to at­tempt that, but not any defini­tive an­swer.

But first, what rele­vance does the old In­tel­li­gence Ex­plo­sion Hy­poth­e­sis have to this ques­tion—is it an ar­gu­ment for dis­con­tin­u­ous progress? No, not on its own.

Re­cur­sive self-improvement

Peo­ple don’t talk about the re­cur­sive self-im­prove­ment as much as they used to, be­cause since the Ma­chine Learn­ing rev­olu­tion a full re­cur­sive self-im­prove­ment pro­cess has seemed less nec­es­sary to cre­ate an AGI that is dra­mat­i­cally su­pe­rior to hu­mans (by anal­ogy with how gra­di­ent de­scent is able to pro­duce ca­pa­bil­ity gains in cur­rent AI). In­stead, the fo­cus is more on the gen­eral idea of ‘rapid ca­pa­bil­ity gain’. From Chol­let vs Yud­kowsky:

The ba­sic premise is that, in the near fu­ture, a first “seed AI” will be cre­ated, with gen­eral prob­lem-solv­ing abil­ities slightly sur­pass­ing that of hu­mans. This seed AI would start de­sign­ing bet­ter AIs, ini­ti­at­ing a re­cur­sive self-im­prove­ment loop that would im­me­di­ately leave hu­man in­tel­li­gence in the dust, over­tak­ing it by or­ders of mag­ni­tude in a short time.
I agree this is more or less what I meant by “seed AI” when I coined the term back in 1998. To­day, nine­teen years later, I would talk about a gen­eral ques­tion of “ca­pa­bil­ity gain” or how the power of a cog­ni­tive sys­tem scales with in­creased re­sources and fur­ther op­ti­miza­tion. The idea of re­cur­sive self-im­prove­ment is only one in­put into the gen­eral ques­tions of ca­pa­bil­ity gain; for ex­am­ple, we re­cently saw some im­pres­sively fast scal­ing of Go-play­ing abil­ity with­out any­thing I’d re­motely con­sider as seed AI be­ing in­volved.

How­ever, the ar­gu­ment that given a cer­tain level of AI ca­pa­bil­ities, the rate of ca­pa­bil­ity gain will be very high doesn’t by it­self ar­gue for dis­con­ti­nu­ity. It does mean that the rate of progress in AI has to in­crease be­tween now and then, but doesn’t say how it will in­crease.

There needs to be an ini­tial asym­me­try in the situ­a­tion that means an AI be­yond a cer­tain level of so­phis­ti­ca­tion can ex­pe­rience rapid ca­pa­bil­ity gain and one be­low it ex­pe­riences the cur­rent (fairly unim­pres­sive) ca­pa­bil­ity gains with in­creased op­ti­miza­tion power and re­sources.

The origi­nal ‘in­tel­li­gence ex­plo­sion’ ar­gu­ment puts an up­per bound or suffi­cient con­di­tion on when we en­ter a regime of very rapid growth—if we have some­thing ‘above hu­man level’ in all rele­vant ca­pa­bil­ities, it will be ca­pa­ble of im­prov­ing its ca­pa­bil­ities. And we know that at the cur­rent level, ca­pa­bil­ity gains with in­creased op­ti­miza­tion are (usu­ally) not too im­pres­sive.

The gen­eral ques­tion has to be asked; will we see a sud­den in­crease in the rate of ca­pa­bil­ity gain some­where be­tween now and the hu­man level’ where the rate must be very high.

The ad­di­tional claim needed to es­tab­lish a dis­con­ti­nu­ity is that re­cur­sive self-im­prove­ment sud­denly goes from ‘not be­ing pos­si­ble at all’ (our cur­rent situ­a­tion) to pos­si­ble, so there is a dis­con­ti­nu­ity as we en­ter a new growth mode and the graph abruptly ‘folds back in on it­self’.

In Chris­ti­ano’s graph, the gra­di­ent at the right end where highly ca­pa­ble AI is already around is pretty much the same in both sce­nar­ios, re­flect­ing the ba­sic re­cur­sive self-im­prove­ment ar­gu­ment.

Here I have taken a di­a­gram from Su­per­in­tel­li­gence and added a red curve to rep­re­sent a fast but con­tin­u­ous take­off sce­nario.

In Bostrom’s sce­nario, there are two key mo­ments that rep­re­sent dis­con­ti­nu­ities in rate, though not nec­es­sar­ily in ab­solute ca­pa­bil­ities—the first is that around the ‘hu­man baseline’ - when an AI can com­plete all cog­ni­tive tasks a hu­man can, we en­ter a new growth mode much faster than the one be­fore be­cause the AI can re­cur­sively self-im­prove. The gra­di­ent is fairly con­stant up un­til that point. The next dis­con­ti­nu­ity is at the ‘crossover’ where AI is perform­ing the ma­jor­ity of ca­pa­bil­ity im­prove­ment it­self.

As in Chris­ti­ano’s di­a­gram, rates of progress in the red, con­tin­u­ous sce­nario are very similar to the black sce­nario af­ter we have su­per­hu­man AI, but the differ­ence is that there is a steady ac­cel­er­a­tion of progress be­fore ‘hu­man level’. This is be­cause be­fore we have AI that is able to ac­cel­er­ate progress to a huge de­gree, we have AI that is able to ac­cel­er­ate progress to a lesser ex­tent, and so on un­til we have cur­rent AI, which is only slightly able to ac­cel­er­ate growth. Re­cur­sive self-im­prove­ment, like most other ca­pa­bil­ities, does not sud­denly go from ‘not pos­si­ble at all’ to ‘pos­si­ble’.

The other thing to note is that, since we get sub­stan­tial ac­cel­er­a­tion of progress be­fore the ‘hu­man baseline’, the over­all timeline is shorter in the red sce­nario, hold­ing other as­sump­tions about the ob­jec­tive difficulty of AI re­search con­stant.

The rea­son this con­tin­u­ous sce­nario might not oc­cur is if there is an ini­tial dis­con­ti­nu­ity which means that we can­not get a par­tic­u­lar kind of re­cur­sive self-im­prove­ment with AIs that are slightly be­low some level of ca­pa­bil­ity, but can get it with AIs that are slightly above that level.

If AI is in a nu­clear-chain re­ac­tion like situ­a­tion where we need to reach a crit­i­cal­ity thresh­old for the rate to sud­denly ex­pe­rience a dis­con­tin­u­ous jump. We re­turn to the origi­nal claim, which we now see needs an in­de­pen­dent jus­tifi­ca­tion:

some sys­tems “fiz­zle out” when they try to de­sign a bet­ter AI, gen­er­at­ing a few im­prove­ments be­fore run­ning out of steam, while oth­ers are able to au­tonomously gen­er­ate more and more improvements

Con­nect­ing old and new arguments

With less con­fi­dence than the pre­vi­ous sec­tion, I think this is a point that most peo­ple agree on—that there needs to be an ini­tial dis­con­ti­nu­ity in the rate of re­turn on cog­ni­tive in­vest­ment, for there to be two qual­i­ta­tively differ­ent growth regimes, for there to be Dis­con­tin­u­ous progress.

This also an­swers MacAskill’s ob­jec­tion that too much has changed in ba­sic as­sump­tions. The old re­cur­sive self-im­prove­ment ar­gu­ment, by giv­ing a sig­nifi­cant con­di­tion for fast growth that seems fea­si­ble (Hu­man baseline AI), leads nat­u­rally to an in­ves­ti­ga­tion of what will hap­pen in the course of reach­ing that fast growth regime. Chris­ti­ano and other cur­rent no­tions of con­tin­u­ous take­off are perfectly con­sis­tent with the coun­ter­fac­tual claim that, if an already su­per­hu­man ‘seed AI’ were dropped into a world empty of other AI, it would un­dergo re­cur­sive self-im­prove­ment.

This in it­self, in con­junc­tion with other ba­sic philo­soph­i­cal claims like the or­thog­o­nal­ity the­sis, is suffi­cient to pro­mote AI al­ign­ment to at­ten­tion. Then, fol­low­ing on from that, we de­vel­oped differ­ent mod­els of how progress will look be­tween now and AGI.

So it is not quite right to say, ‘many of the most promi­nent peo­ple are no longer push­ing the Bostrom ar­gu­ments’. From Paul Chris­ti­ano’s 80,000 hours in­ter­view:

Another thing that’s im­por­tant to clar­ify is that I think there’s rough agree­ment amongst the al­ign­ment and safety crowd about what would hap­pen if we did hu­man level AI. That is ev­ery­one agrees that at that point, progress has prob­a­bly ex­ploded and is oc­cur­ring very quickly, and the main dis­agree­ment is about what hap­pens in ad­vance of that. I think I have the view that in ad­vance of that, the world has already changed very sub­stan­tially.

The above di­a­gram rep­re­sents our un­cer­tainty—we know the rate of progress now is rel­a­tively slow and that when AI ca­pa­bil­ity is at the hu­man baseline, it must be very fast. What re­mains to be seen is what hap­pens in be­tween.

The re­main­der of this post is about the ar­eas where there is not much agree­ment in AI safety—oth­ers rea­sons that there may be a thresh­old for gen­er­al­ity, re­cur­sive self-im­prove­ment or a sig­ni­fance to the ‘hu­man level’.

Ev­i­dence from Evolution

We do have an ex­am­ple of op­ti­miza­tion pres­sure be­ing ap­plied to pro­duce gains in in­tel­li­gence: hu­man evolu­tion. The his­tor­i­cal record cer­tainly looks dis­con­tin­u­ous, in that rel­a­tively small changes in hu­man brains over rel­a­tively short timescales pro­duced dra­matic visi­ble changes in our ca­pa­bil­ities. How­ever, this is mis­lead­ing. The very first thing we should un­der­stand is that evolu­tion is a con­tin­u­ous pro­cess—it can­not re­design brains from scratch.

Evolu­tion was op­ti­miz­ing for fit­ness, and driv­ing in­creases in in­tel­li­gence only in­di­rectly and in­ter­mit­tently by op­ti­miz­ing for win­ning at so­cial com­pe­ti­tion. What hap­pened in hu­man evolu­tion is that it briefly switched to op­ti­miz­ing for in­creased in­tel­li­gence, and as soon as that hap­pened our in­tel­li­gence grew very rapidly but con­tin­u­ously.

[If] evolu­tion were se­lect­ing pri­mar­ily or in large part for tech­nolog­i­cal ap­ti­tude, then the differ­ence be­tween chimps and hu­mans would sug­gest that tripling com­pute and do­ing a tiny bit of ad­di­tional fine-tun­ing can rad­i­cally ex­pand power, un­der­min­ing the con­tin­u­ous change story.
But chimp evolu­tion is not pri­mar­ily se­lect­ing for mak­ing and us­ing tech­nol­ogy, for do­ing sci­ence, or for fa­cil­i­tat­ing cul­tural ac­cu­mu­la­tion. The task faced by a chimp is largely in­de­pen­dent of the abil­ities that give hu­mans such a huge fit­ness ad­van­tage. It’s not com­pletely in­de­pen­dent—the over­lap is the only rea­son that evolu­tion even­tu­ally pro­duces hu­mans—but it’s differ­ent enough that we should not be sur­prised if there are sim­ple changes to chimps that would make them much bet­ter at de­sign­ing tech­nol­ogy or do­ing sci­ence or ac­cu­mu­lat­ing cul­ture.

I have a the­ory about why this didn’t get dis­cussed ear­lier—it un­for­tu­nately sounds similar to the fa­mous bad ar­gu­ment against AGI be­ing an ex­is­ten­tial risk: the ‘in­tel­li­gence isn’t a su­per­power’ ar­gu­ment. From Chol­let vs Yud­kowsky:

In­tel­li­gence is not a su­per­power; ex­cep­tional in­tel­li­gence does not, on its own, con­fer you with pro­por­tion­ally ex­cep­tional power over your cir­cum­stances.
…said the Homo sapi­ens, sur­rounded by countless pow­er­ful ar­ti­facts whose abil­ities, let alone mechanisms, would be ut­terly in­com­pre­hen­si­ble to the or­ganisms of any less in­tel­li­gent Earthly species.

I worry that in ar­gu­ing against the claim that gen­eral in­tel­li­gence isn’t a mean­ingful con­cept or can’t be used to com­pare differ­ent an­i­mals, some peo­ple have been im­plic­itly as­sum­ing that evolu­tion has been putting a de­cent amount of effort into op­ti­miz­ing for gen­eral in­tel­li­gence all along. Alter­na­tively, that ar­gu­ing for one sounds like an­other, or that a lot of peo­ple have been ar­gu­ing for both to­gether and haven’t dis­t­in­guished be­tween them.



Claiming that you can mean­ingfully com­pare evolved minds on the gen­er­al­ity of their in­tel­li­gence needs to be dis­t­in­guished from claiming that evolu­tion has been op­ti­miz­ing for gen­eral in­tel­li­gence rea­son­ably hard for a long time, and that con­sis­tent pres­sure ‘push­ing up the scale’ hits a point near hu­mans where ca­pa­bil­ities sud­denly ex­plode de­spite a con­stant op­ti­miza­tion effort. There is no ev­i­dence that evolu­tion was putting roughly con­stant effort into in­creas­ing hu­man in­tel­li­gence. We could analo­gize the de­vel­op­ment of hu­man in­tel­li­gence to the air­craft case in the last sec­tion—where there was rel­a­tively lit­tle effort put into its de­vel­op­ment un­til a sud­den burst of ac­tivity led to mas­sive gains.

So what can we in­fer from evolu­tion­ary his­tory? We know that hu­man minds can be pro­duced by an in­cre­men­tal and rel­a­tively sim­ple op­ti­miza­tion pro­cess op­er­at­ing over time. More­over, the differ­ence be­tween ‘pro­duced by an in­cre­men­tal pro­cess’ and ‘de­vel­oped con­tin­u­ously’ is small. If in­tel­li­gence re­quired the de­vel­op­ment of a good deal of com­plex, ex­pe­di­ent or detri­men­tal ca­pa­bil­ities which were only use­ful when fi­nally brought to­gether, evolu­tion would not have pro­duced it.

Chris­ti­ano ar­gues that this is a rea­son to think progress in AI will be con­tin­u­ous, but this seems to me to be a weak rea­son. Clearly, there ex­ists a con­tin­u­ous path to gen­eral in­tel­li­gence, but that does not mean it is the eas­iest path and Chris­ti­ano’s other ar­gu­ment sug­gests that the way that we ap­proach AGI will not look much like the route evolu­tion took.

Evolu­tion also sug­gests that, in some ab­solute sense, the amount of effort re­quired to pro­duce in­creases in in­tel­li­gence is not that large. Espe­cially if you started out with the model that in­tel­li­gence was be­ing op­ti­mized all along, you should up­date to be­liev­ing AGI is much eas­ier to cre­ate than pre­vi­ously ex­pected.

The Con­cep­tual Arguments

We are left with the difficult to judge con­cep­tual ques­tion of whether we should ex­pect a dis­con­tin­u­ous jump in ca­pa­bil­ities when a set list of AI de­vel­op­ments are brought to­gether. Chris­ti­ano in his origi­nal es­say just states that there doesn’t seem to be any in­de­pen­dent rea­sons to ex­pect this to be the case. Com­ing up with any rea­sons for or against dis­con­tin­u­ous progress es­sen­tially re­quire us to pre­dict how AGI will work be­fore build­ing it. Rob Besinger told me some­thing similar.

There still re­mains the ques­tion of whether the tech­nolog­i­cal path to “op­ti­miz­ing messy phys­i­cal en­vi­ron­ments” (or “sci­ence AI”, or what­ever we want to call it) looks like a small num­ber of “we didn’t know how to do this at all, and now we do know how to do this and can sud­denly take much bet­ter ad­van­tage of available com­pute” events, vs. look­ing like a large num­ber of in­di­vi­d­u­ally low-im­pact events spread out over time.

Rob Besinger also said in one post that MIRI’s rea­sons for pre­dict­ing dis­con­tin­u­ous take­off boil down to differ­ent ideas about what AGI will be like—sug­gest­ing that this con­sti­tutes the fun­da­men­tal dis­agree­ment.

MIRI folks tend to have differ­ent views from Paul about AGI, some of which im­ply that AGI is more likely to be novel and de­pen­dent on new in­sights. (Un­fair car­i­ca­ture: Imag­ine two peo­ple in the early 20th cen­tury who don’t have a tech­ni­cal un­der­stand­ing of nu­clear physics yet, try­ing to ar­gue about how pow­er­ful a nu­clear-chain-re­ac­tion-based bomb might be. If one side were to model that kind of bomb as “sort of like TNT 3.0” while the other is mod­el­ling it as “sort of like a small Sun”, they’re likely to dis­agree about whether nu­clear weapons are go­ing to be a small v. large im­prove­ment over TNT...)

I sug­gest we ac­tu­ally try to enu­mer­ate the new de­vel­op­ments we will need to pro­duce AGI, ones which could ar­rive dis­cretely in the form of paradigm shifts. We might try to imag­ine or pre­dict which skills must be com­bined to reach the abil­ity to do origi­nal AI re­search. Stu­art Rus­sell pro­vided a list of these ca­pac­i­ties in Hu­man Com­pat­i­ble.

Stu­art Rus­sell’s List

  • hu­man-like lan­guage comprehension

  • cu­mu­la­tive learning

  • dis­cov­er­ing new ac­tion sets

  • man­ag­ing its own men­tal activity

For refer­ence, I’ve in­cluded two ca­pa­bil­ities we already have that I imag­ine be­ing on a similar list in 1960

AI Im­pacts List

  • Causal mod­els: Build­ing causal mod­els of the world that are rich, flex­ible, and ex­plana­tory — Lake et al. (2016)9, Mar­cus (2018)10, Pearl (2018)11

  • Com­po­si­tion­al­ity: Ex­ploit­ing sys­tem­atic, com­po­si­tional re­la­tions be­tween en­tities of mean­ing, both lin­guis­tic and con­cep­tual — Fodor and Pylyshyn (1988)12, Mar­cus (2001)13, Lake and Ba­roni (2017)14

  • Sym­bolic rules: Learn­ing ab­stract rules rather than ex­tract­ing statis­ti­cal pat­terns — Mar­cus (2018)15

  • Hier­ar­chi­cal struc­ture: Deal­ing with hi­er­ar­chi­cal struc­ture, e.g. that of lan­guage — Mar­cus (2018)16

  • Trans­fer learn­ing: Learn­ing les­sons from one task that trans­fer to other tasks that are similar, or that differ in sys­tem­atic ways — Mar­cus (2018)17, Lake et al. (2016)18

  • Com­mon sense un­der­stand­ing: Us­ing com­mon sense to un­der­stand lan­guage and rea­son about new situ­a­tions — Brooks (2019)19, Mar­cus and Davis (2015)20

Note that dis­con­ti­nu­ities may con­sist ei­ther in sud­den in­creases in ca­pa­bil­ity (e.g. if a sud­den break­through lets us build AI with full hu­man like cu­mu­la­tive learn­ing), or sud­den in­creases in the rate of im­prove­ment (e.g. some­thing that takes us over the pur­ported ‘re­cur­sive self-im­prove­ment thresh­old’) or sud­den in­creases in the abil­ity to make use of hard­ware or knowl­edge over­hang (Sud­denly pro­duc­ing an AI with hu­man-like lan­guage com­pre­hen­sion, which would be able to read all ex­ist­ing books). Per­haps the dis­agree­ment looks like this:

An AI with (e.g.) good per­cep­tion and ob­ject recog­ni­tion, lan­guage com­pre­hen­sion, cu­mu­la­tive learn­ing ca­pa­bil­ity and abil­ity to dis­cover new ac­tion sets but a merely ad­e­quate or bad abil­ity to man­age its men­tal ac­tivity would be (Paul thinks) rea­son­ably ca­pa­ble com­pared to an AI that is good at all of these things, but (MIRI thinks) it would be much less ca­pa­ble. MIRI has con­cep­tual ar­gu­ments (to do with the na­ture of gen­eral in­tel­li­gence) and em­piri­cal ar­gu­ments (com­par­ing hu­man/​chimp brains and prag­matic ca­pa­bil­ities) in favour of this hy­poth­e­sis, and Paul thinks the con­cep­tual ar­gu­ments are too murky and un­clear to be per­sua­sive and that the em­piri­cal ar­gu­ments don’t show what MIRI thinks they show.

Ad­ju­di­cat­ing this dis­agree­ment is a mat­ter for an­other post—for now, I will sim­ply note that it does seem like an AI sig­nifi­cantly lack­ing in one of the ca­pa­bil­ities on Stu­art Rus­sell’s list but profi­cient in one of the oth­ers seems in­tu­itively like it would be much more ca­pa­ble than cur­rent AI, but still less ca­pa­ble than very ad­vanced AI. How se­ri­ously to take this in­tu­ition, I don’t know.

Summary

  • The case for con­tin­u­ous progress rests on three claims

    • A pri­ori, we ex­pect con­tin­u­ous progress be­cause it is usu­ally pos­si­ble to do some­thing slightly worse slightly ear­lier. The his­tor­i­cal record con­firms this

    • Evolu­tion’s op­ti­miza­tion is too differ­ent from the op­ti­miza­tion of AI re­search to be mean­ingful ev­i­dence—if you op­ti­mize speci­fi­cally for use­ful­ness it might ap­pear much ear­lier and more gradually

    • There are no clear con­cep­tual rea­sons to ex­pect a ‘gen­er­al­ity thresh­old’ or the sud­den (rather than grad­ual) emer­gence of the abil­ity to do re­cur­sive self-improvement

Rele­vance to AI Safety

If we have a high de­gree of con­fi­dence in dis­con­tin­u­ous progress, we more-or-less know that we’ll get a sud­den take­off where su­per­in­tel­li­gent AI ap­pears out of nowhere and forms a sin­gle­ton. On the other hand, if we ex­pect con­tin­u­ous progress then the rate of ca­pa­bil­ity gain is much more difficult to judge. That is already a strong rea­son to care about whether progress will be con­tin­u­ous or not.

De­ci­sive Strate­gic Ad­van­tage (DSA) lead­ing to Sin­gle­tons is still pos­si­ble with con­tin­u­ous progress—we just need a spe­cific rea­son for a gap to emerge (like a big re­search pro­gram). An ex­am­ple sce­nario for a de­ci­sive strate­gic ad­van­tage lead­ing to a sin­gle­ton with con­tin­u­ous, rel­a­tively fast progress:

At some point early in the tran­si­tion to much faster in­no­va­tion rates, the lead­ing AI com­pa­nies “go quiet.” Sev­eral of them ei­ther get huge in­vest­ments or are na­tion­al­ized and given effec­tively un­limited fund­ing. The world as a whole con­tinues to in­no­vate, and the lead­ing com­pa­nies benefit from this pub­lic re­search, but they hoard their own in­no­va­tions to them­selves. Mean­while the benefits of these AI in­no­va­tions are start­ing to be felt; all pro­jects have sig­nifi­cantly in­creased (and con­stantly in­creas­ing) rates of in­no­va­tion. But the fastest in­creases go to the lead­ing pro­ject, which is one year ahead of the sec­ond-best pro­ject. (This sort of gap is nor­mal for tech pro­jects to­day, es­pe­cially the rare mas­sively-funded ones, I think.) Per­haps via a com­bi­na­tion of spy­ing, sel­l­ing, and leaks, that lead nar­rows to six months mid­way through the pro­cess. But by that time things are mov­ing so quickly that a six months’ lead is like a 15-150 year lead dur­ing the era of the In­dus­trial Revolu­tion. It’s not guaran­teed and per­haps still not prob­a­ble, but at least it’s rea­son­ably likely that the lead­ing pro­ject will be able to take over the world if it chooses to.

Let’s fac­tor out the ques­tion of fast or slow take­off, and try to com­pare two AI timelines that are similarly fast in ob­jec­tive time, but one con­tains a dis­con­tin­u­ous leap in ca­pa­bil­ity and the other doesn’t. What are the rele­vant differ­ences with re­spect to AI Safety? In the dis­con­tin­u­ous sce­nario, we do not re­quire the clas­sic ‘seed AI’ that re­cur­sively self-im­proves—that sce­nario is too spe­cific and more use­ful as a thought ex­per­i­ment. In­stead, in the dis­con­tin­u­ous sce­nario it is merely a fact of the mat­ter that at a cer­tain point re­turns on op­ti­miza­tion ex­plode and ca­pa­bil­ity gain be­comes very rapid where be­fore it was very slow. In the other case, progress is con­tin­u­ous but fast, though pre­sum­ably not quite as fast.

Any ap­proach to al­ign­ment that re­lies on a less ad­vanced agent su­per­vis­ing a more ad­vanced agent will prob­a­bly not work in the dis­con­tin­u­ous case, since the differ­ence be­tween agents on one side and an­other side of the dis­con­ti­nu­ity would be too great. An iter­ated ap­proach that re­lies on groups of less in­tel­li­gent agents su­per­vis­ing a more in­tel­li­gent agent could work even in a very fast but con­tin­u­ous take­off, be­cause even if the pro­cess took a small amount of ob­jec­tive time, con­tin­u­ous in­cre­ments of in­creased ca­pa­bil­ity would still be pos­si­ble, and less in­tel­li­gent agents could su­per­vise more in­tel­li­gent agents.

Dis­con­tin­u­ous take­off sug­gests am­bi­tious value learn­ing ap­proaches, while con­tin­u­ous take­off sug­gests iter­ated ap­proaches like IDA.

See my ear­lier post for a dis­cus­sion of value learn­ing ap­proaches.

In cases where the take­off is both slow and con­tin­u­ous, we might ex­pect AI to be an out­growth of mod­ern ML, par­tic­u­larly deep re­in­force­ment learn­ing, in which case the best ap­proach might be a very nar­row ap­proach to AI al­ign­ment.

The ob­jec­tive time taken for progress in AI is more sig­nifi­cant than whether that progress is con­tin­u­ous or dis­con­tin­u­ous, but the pres­ence of dis­con­ti­nu­ities is sig­nifi­cant for two key rea­sons. First, dis­con­ti­nu­ities im­ply much faster ob­jec­tive time. Se­cond, big dis­con­ti­nu­ities will prob­a­bly thwart iter­a­tive ap­proaches to value learn­ing, re­quiring one-shot, am­bi­tious ap­proaches.