Drexler on AI Risk

Link post

Eric Drexler has pub­lished a book-length pa­per on AI risk, de­scribing an ap­proach that he calls Com­pre­hen­sive AI Ser­vices (CAIS).

His pri­mary goal seems to be re­fram­ing AI risk dis­cus­sions to use a rather differ­ent paradigm than the one that Nick Bostrom and Eliezer Yud­kowsky have been pro­mot­ing. (There isn’t yet any paradigm that’s widely ac­cepted, so this isn’t a Kuh­nian paradigm shift; it’s bet­ter char­ac­ter­ized as an amor­phous field that is strug­gling to es­tab­lish its first paradigm). Duel­ing paradigms seems to be the best that the AI safety field can man­age to achieve for now.

I’ll start by men­tion­ing some im­por­tant claims that Drexler doesn’t dis­pute:

  • an in­tel­li­gence ex­plo­sion might hap­pen some­what sud­denly, in the fairly near fu­ture;

  • it’s hard to re­li­ably al­ign an AI’s val­ues with hu­man val­ues;

  • re­cur­sive self-im­prove­ment, as imag­ined by Bostrom /​ Yud­kowsky, would pose sig­nifi­cant dan­gers.

Drexler likely dis­agrees about some of the claims made by Bostrom /​ Yud­kowsky on those points, but he shares enough of their con­cerns about them that those dis­agree­ments don’t ex­plain why Drexler ap­proaches AI safety differ­ently. (Drexler is more cau­tious than most writ­ers about mak­ing any pre­dic­tions con­cern­ing these three claims).

CAIS isn’t a full solu­tion to AI risks. In­stead, it’s bet­ter thought of as an at­tempt to re­duce the risk of world con­quest by the first AGI that reaches some thresh­old, pre­serve ex­ist­ing cor­rigi­bil­ity some­what past hu­man-level AI, and post­pone need for a per­ma­nent solu­tion un­til we have more in­tel­li­gence.

Stop An­thro­po­mor­phis­ing In­tel­li­gence!

What I see as the most im­por­tant dis­tinc­tion be­tween the CAIS paradigm and the Bostrom /​ Yud­kowsky paradigm is Drexler’s ob­jec­tion to hav­ing ad­vanced AI be a unified, gen­eral-pur­pose agent.

In­tel­li­gence doesn’t re­quire a broad mind-like util­ity func­tion. Mindspace is a small sub­set of the space of in­tel­li­gence.

In­stead, Drexler sug­gests com­pos­ing broad AI sys­tems out of many, di­verse, nar­rower-pur­pose com­po­nents. Nor­mal soft­ware en­g­ineer­ing pro­duces com­po­nents with goals that are limited to a spe­cific out­put. Drexler claims there’s no need to add world-ori­ented goals that would cause a sys­tem to care about large parts of space­time.

Sys­tems built out of com­po­nents with nar­row goals don’t need to de­velop much broader goals. Ex­ist­ing trends in AI re­search sug­gest that bet­ter-than-hu­man in­tel­li­gence can be achieved via tools that have nar­row goals.

The AI-ser­vices model in­vites a func­tional anal­y­sis of ser­vice de­vel­op­ment and de­liv­ery, and that anal­y­sis sug­gests that prac­ti­cal tasks in the CAIS model are read­ily or nat­u­rally bounded in scope and du­ra­tion. For ex­am­ple, the task of pro­vid­ing a ser­vice is dis­tinct from the task of de­vel­op­ing a sys­tem to provide that ser­vice, and tasks of both kinds must be com­pleted with­out un­due cost or de­lay.

Drexler’s main ex­am­ple of nar­row goals is Google’s ma­chine trans­la­tion, which has no goals be­yond trans­lat­ing the next unit of text. That doesn’t im­ply any ob­vi­ous con­straint on how so­phis­ti­cated its world-model can be. It would be quite nat­u­ral for AI progress con­tinue with com­po­nents whose “util­ity func­tion” re­mains bounded like this.

It looks like this differ­ence be­tween nar­row and broad goals can be turned into a fairly rigor­ous dis­tinc­tion, but I’m dis­satis­fied with available de­scrip­tions of the dis­tinc­tion. (I’d also like bet­ter names for them.)

There are lots of clear-cut cases: nar­row-task soft­ware that just waits for com­mands, and on get­ting a com­mand, it pro­duces a re­sult, then re­turns to its prior state; ver­sus a gen­eral-pur­pose agent which is de­signed to max­i­mize the price of a com­pany’s stock.

But we need some nar­row-task soft­ware to re­mem­ber some in­for­ma­tion, and once we al­low mem­ory, it gets com­pli­cated to an­a­lyze whether the soft­ware’s goal is “nar­row”.

Drexler seems less op­ti­mistic than I am about clar­ify­ing this dis­tinc­tion:

There is no bright line be­tween safe CAI ser­vices and un­safe AGI agents, and AGI is per­haps best re­garded as a po­ten­tial branch from an R&D-au­toma­tion/​CAIS path.

Be­cause there is no bright line be­tween agents and non-agents, or be­tween ra­tio­nal util­ity max­i­miza­tion and re­ac­tive be­hav­iors shaped by blind evolu­tion, avoid­ing risky be­hav­iors calls for at least two com­ple­men­tary per­spec­tives: both (1) de­sign-ori­ented stud­ies that can guide im­ple­men­ta­tion of sys­tems that will provide req­ui­site de­grees of e.g., sta­bil­ity, re­li­a­bil­ity, and trans­parency, and (2) agent-ori­ented stud­ies sup­port de­sign by ex­plor­ing the char­ac­ter­is­tics of sys­tems that could dis­play emer­gent, un­in­tended, and po­ten­tially risky agent-like be­hav­iors.

It may be true that a bright line can’t be ex­plained clearly to lay­men, but I have a strong in­tu­ition that ma­chine learn­ing (ML) de­vel­op­ers will be able to ex­plain it to each other well enough to agree on how to clas­sify the cases that mat­ter.

6.7 Sys­tems com­posed of ra­tio­nal agents need not max­i­mize a util­ity func­tion There is no canon­i­cal way to ag­gre­gate util­ities over agents, and game the­ory shows that in­ter­act­ing sets of ra­tio­nal agents need not achieve even Pareto op­ti­mal­ity. Agents can com­pete to perform a task, or can perform ad­ver­sar­ial tasks such as propos­ing and crit­i­ciz­ing ac­tions; from an ex­ter­nal client’s per­spec­tive, these un­co­op­er­a­tive in­ter­ac­tions are fea­tures, not bugs (con­sider the grow­ing util­ity of gen­er­a­tive ad­ver­sar­ial net­works ). Fur­ther, adap­tive col­lu­sion can be cleanly avoided: Fixed func­tions, for ex­am­ple, can­not ne­go­ti­ate or adapt their be­hav­ior to al­ign with an­other agent’s pur­pose. … There is, of course, an even more fun­da­men­tal ob­jec­tion to draw­ing a bound­ary around a set of agents and treat­ing them as a sin­gle en­tity: In in­ter­act­ing with a set of agents, one can choose to com­mu­ni­cate with one or an­other (e.g. with an agent or its com­peti­tor); if we as­sume that the agents are in effect a sin­gle en­tity, we are as­sum­ing a con­straint on com­mu­ni­ca­tion that does not ex­ist in the multi-agent model. The mod­els are fun­da­men­tally, struc­turally in­equiv­a­lent.

A Nan­otech Analogy

Drexler origi­nally de­scribed nan­otech­nol­ogy in terms of self-repli­cat­ing ma­chines.

Later, con­cerns about grey goo caused him to shift his recom­men­da­tions to­ward a safer strat­egy, where no sin­gle ma­chine would be able to repli­cate it­self, but where the benefits of nan­otech­nol­ogy could be used re­cur­sively to im­prove nanofac­to­ries.

Similarly, some of the more sci­ence-fic­tion style analy­ses sug­gest that an AI with re­cur­sive self-im­prove­ment could quickly con­quer the world.

Drexler’s CAIS pro­posal re­moves the “self-” from re­cur­sive self-im­prove­ment, in much the same way that nanofac­to­ries re­moved the “self-” from nanobot self-repli­ca­tion, re­plac­ing it with a more de­cen­tral­ized pro­cess that in­volves pre­serv­ing more fea­tures of ex­ist­ing fac­to­ries /​ AI im­ple­men­ta­tions. The AI equiv­a­lent of nanofac­to­ries con­sists of a set of AI ser­vices, each with a nar­row goal, which co­or­di­nate in ways that don’t qual­ify as a unified agent.

It sort of looks like Drexler’s nan­otech back­ground has had an im­por­tant in­fluence on his views. Eliezer’s some­what con­flict­ing view seems to fol­low a more sci­ence-fic­tion-like pat­tern of ex­pect­ing one man to save (or de­stroy?) the world. And I could gen­er­ate similar sto­ries for main­stream AI re­searchers.

That doesn’t sug­gest much about who’s right, but it does sug­gest that peo­ple are be­ing in­fluenced by con­sid­er­a­tions that are only marginally rele­vant.

How Pow­er­ful is CAIS

Will CAIS be slower to de­velop than re­cur­sive self-im­prove­ment? Maybe. It de­pends some­what on how fast re­cur­sive self-im­prove­ment is.

I’m un­cer­tain whether to be­lieve that hu­man over­sight is com­pat­i­ble with rapid de­vel­op­ment. Some of that un­cer­tainty comes from con­fu­sion about what to com­pare it to (an agent AGI that needs no hu­man feed­back? or one that of­ten asks hu­mans for ap­proval?).

Some peo­ple ex­pect unified agents to be more pow­er­ful than CAIS. How plau­si­ble are their con­cerns?

Some of it is dis­agree­ment over the ex­tent to which hu­man-level AI will be built with cur­rently un­der­stood tech­niques. (See Vic­to­ria Krakovna’s chart of what var­i­ous peo­ple be­lieve about this).

Could some of it be due to analo­gies to peo­ple? We have ex­pe­rience with some very agenty busi­ness­men (e.g. Elon Musk or Bill Gates), and some bu­reau­cra­cies made up of not-so-agenty em­ploy­ees (the post office, or Com­cast). I’m tempted to use the in­tu­itions I get from those ex­am­ples to con­clude that an unified agent AI will be more vi­sion­ary and ea­ger to im­prove. But I worry that do­ing so an­thro­po­mor­phises in­tel­li­gence in a way that mis­leads, since I can’t say any­thing more rigor­ous than “these pat­terns look rele­vant”.

But if that anal­ogy doesn’t help, then the nov­elty of the situ­a­tion hints we should dis­trust Drexler’s ex­trap­o­la­tion from stan­dard soft­ware prac­tices (with­out plac­ing much con­fi­dence in any al­ter­na­tive).

Cure Cancer Example

Drexler wants some limits on what gets au­to­mated. E.g. he wants to avoid a situ­a­tion where an AI is told to cure can­cer, and does so with­out fur­ther hu­man in­ter­ac­tion. That would risk gen­er­at­ing a solu­tion for which the sys­tem mis­judges hu­man ap­proval (e.g. mind up­load­ing or cry­onic sus­pen­sion).

In­stead, he wants hu­mans to de­com­pose that into nar­rower goals (with sub­stan­tial AI as­sis­tance), such that hu­mans could ver­ify that the goals are com­pat­i­ble with hu­man welfare (or re­ject those that are too hard too eval­u­ate).

This seems likely to de­lay can­cer cures com­pared to what an agent AGI would do, maybe by hours, maybe by months, as the hu­mans check the sub­tasks. I ex­pect most peo­ple would ac­cept such a de­lay as a rea­son­able price for re­duc­ing AI risks. I haven’t thought of a re­al­is­tic ex­am­ple where I ex­pect the de­lay would gen­er­ate a strong in­cen­tive for us­ing an agent AGI, but the can­cer ex­am­ple is close enough to be un­set­tling.

This anal­y­sis is re­as­sur­ing com­pared to Su­per­in­tel­li­gence, but not as re­as­sur­ing as I’d like.

As I was writ­ing the last few para­graphs, and think­ing about Wei Dai’s ob­jec­tions, I found it hard to clearly model how CAIS would han­dle the can­cer ex­am­ple.

Some of Wei Dai’s ob­jec­tions re­sult from a dis­agree­ment about whether agent AGI has benefits. But his ob­jec­tions sug­gest other ques­tions, for which I needed to think care­fully in or­der to guess how Drexler would an­swer them: How much does CAIS de­pend on hu­man judg­ment about what tasks to give to a ser­vice? Prob­a­bly quite heav­ily, in some cases. How much does CAIS de­pend on the sys­tem hav­ing good es­ti­mates of hu­man ap­proval? Prob­a­bly not too much, as long as ex­perts are aware of how good those es­ti­mates are, and are will­ing and able to re­strict ac­cess to some rel­a­tively risky high-level ser­vices.

I ex­pect ML re­searchers can iden­tify a safe way to use CAIS, but it doesn’t look very close to an idiot-proof frame­work, at least not with­out sig­nifi­cant trial and er­ror. I pre­sume there will in the long run be a need for an idiot-proof in­ter­face to most such ser­vices, but I ex­pect those to be de­vel­oped later.

What In­cen­tives will in­fluence AI Devel­op­ers?

With grey goo, it was pretty clear that most nan­otech de­vel­op­ers would clearly pre­fer the nanofac­tory ap­proach, due to it be­ing safer, and hav­ing few down­sides.

With CAIS, the in­cen­tives are less clear, be­cause it’s harder to tell whether there will be benefits to agent AGI’s.

Much de­pends on the con­tro­ver­sial as­sump­tion that rel­a­tively re­spon­si­ble or­ga­ni­za­tions will de­velop CAIS well be­fore other en­tities are able to de­velop any form of equally pow­er­ful AI. I con­sider that plau­si­ble, but it seems to be one of the weak­est parts of Drexler’s anal­y­sis.

If I knew that AI re­quired ex­pen­sive hard­ware, I might be con­fi­dent that the first hu­man-level AI’s would be de­vel­oped at large, rel­a­tively risk-averse in­sti­tu­tions.

But Drexler has a novel(?) ap­proach (sec­tion 40) which sug­gests that ex­ist­ing su­per­com­put­ers have about hu­man-level raw com­put­ing power. That pro­vides a rea­son for wor­ry­ing that a wider va­ri­ety of en­tities could de­velop pow­er­ful AI.

Drexler seems to ex­trap­o­late cur­rent trends, im­ply­ing that the first en­tity to gen­er­ate hu­man-level AI will look like Google or OpenAI. Devel­op­ers there seem likely to be suffi­ciently satis­fied with the kind of in­tel­li­gence ex­plo­sion that CAIS seems likely to pro­duce that it will only take mod­er­ate con­cern about risks to de­ter them from pur­su­ing some­thing more dan­ger­ous.

Whereas a poorly funded startup, or the stereo­typ­i­cal lone hacker in a base­ment, might be more tempted to gam­ble on an agent AGI. I have some hope that hu­man-level AI will re­quire a wide va­ri­ety of ser­vice-like com­po­nents, maybe too much for a small or­ga­ni­za­tion to han­dle. But I don’t like rely­ing on that.

Pre­sum­ably the pub­li­cly available AI ser­vices won’t be suffi­ciently gen­eral and pow­er­ful to en­able ran­dom peo­ple to as­sem­ble them into an agent AGI? Com­bin­ing a robo­car + Google trans­late + an air­craft designer

  • a the­o­rem prover doesn’t sound dan­ger­ous. Sec­tion 27.7 pre­dicts that “se­nior hu­man de­ci­sion mak­ers” would have ac­cess to a ser­vice with some strate­gic plan­ning abil­ity (which would have enough power to gen­er­ate plans with dan­ger­ously broad goals), and they would likely re­strict ac­cess to those high-level ser­vices. See also sec­tion 39.10 for why any one ser­vice doesn’t need to have a very broad pur­pose.

I’m un­sure where Siri and Alexa fit in this frame­work. Their de­sign­ers have some in­cen­tive to in­cor­po­rate goals that ex­tend well into the fu­ture, in or­der to bet­ter adapt to in­di­vi­d­ual cus­tomers, by im­prov­ing their mod­els of each cus­tomers de­sires. I can imag­ine that be­ing fully com­pat­i­ble with a CAIS ap­proach, but I can also imag­ine them be­ing given util­ity func­tions that would cause them to act quite agenty.

How Valuable is Mo­du­lar­ity?

CAIS may be eas­ier to de­velop, since mod­u­lar­ity nor­mally makes soft­ware de­vel­op­ment eas­ier. On the other hand, mod­u­lar­ity seems less im­por­tant for ML. On the grip­ping hand, AI de­vel­op­ers will likely be com­bin­ing ML with other tech­niques, and mod­u­lar­ity seems likely to be valuable for those sys­tems, even if the ML parts are not mod­u­lar. Sec­tion 37 lists ex­am­ples of sys­tems com­posed of both ML and tra­di­tional soft­ware.

And as noted in a re­cent pa­per from Google, “Only a small frac­tion of real-world ML sys­tems is com­posed of the ML code [...] The re­quired sur­round­ing in­fras­truc­ture is vast and com­plex.” [Scul­ley et al. 2015]

Neu­ral net­works and sym­bolic/​al­gorith­mic AI tech­nolo­gies are com­ple­ments, not al­ter­na­tives; they are be­ing in­te­grated in mul­ti­ple ways at lev­els that range from com­po­nents and al­gorithms to sys­tem ar­chi­tec­tures.

How much less im­por­tant is mod­u­lar­ity for ML? A typ­i­cal ML sys­tem seems to do plenty of re-learn­ing from scratch, when we could imag­ine it del­e­gat­ing tasks to other com­po­nents. On the other hand, ML de­vel­op­ers seem to be fairly strongly stick­ing to the pat­tern of as­sign­ing only nar­row goals to any in­stance of an ML ser­vice, typ­i­cally us­ing high-level hu­man judg­ment to in­te­grate that with other parts.

I ex­pect robo­cars to provide a good test of how much ML is push­ing soft­ware de­vel­op­ment away from mod­u­lar­ity. I’d ex­pect if CAIS is gen­er­ally cor­rect, a robo­car would have more than 10 in­de­pen­dently trained ML mod­ules in­te­grated into the main soft­ware that does the driv­ing, whereas I’d ex­pect less than 10 if Drexler were wrong about mod­u­lar­ity. My cur­sory search did not find any clear an­swer—can any­one re­solve this?

I sus­pect that most ML liter­a­ture tends to em­pha­size mono­lithic soft­ware be­cause that’s eas­ier to un­der­stand, and be­cause those pa­pers fo­cus on spe­cific new ML fea­tures, to which mod­u­lar­ity is not very rele­vant.

Maybe there’s a use­ful anal­ogy to mar­kets—maybe peo­ple un­der­es­ti­mate CAIS be­cause very de­cen­tral­ized sys­tems are harder for peo­ple to model. Peo­ple of­ten imag­ine that de­cen­tral­ized mar­kets are less effi­cient that cen­tral­ized com­mand and con­trol, and only seem to tol­er­ate mar­kets af­ter see­ing lots of ev­i­dence (e.g. the col­lapse of com­mu­nism). On the other hand, Eliezer and Bostrom don’t seem es­pe­cially prone to un­der­es­ti­mate mar­kets, so I have low con­fi­dence that this guess ex­plains much.

Alas, skep­ti­cism of de­cen­tral­ized sys­tems might mean that we’re doomed to learn the hard way that the same prin­ci­ples ap­ply to AI de­vel­op­ment (or fail to learn, be­cause we don’t sur­vive the first mis­take).

Trans­parency?

MIRI has been wor­ry­ing about the opaque­ness of neu­ral nets and similar ap­proaches to AI, be­cause it’s hard to eval­u­ate the safety of a large, opaque sys­tem. I sus­pect that com­plex world-mod­els are in­her­ently hard to an­a­lyze. So I’d be rather pes­simistic if I thought we needed the kind of trans­parency that MIRI hopes for.

Drexler points out that opaque­ness causes fewer prob­lems un­der the CAIS paradigm. In­di­vi­d­ual com­po­nents may of­ten be pretty opaque, but in­ter­ac­tions be­tween com­po­nents seem more likely to fol­low a trans­par­ent pro­to­col (as­sum­ing de­sign­ers value that). And as long as the opaque com­po­nents have suffi­ciently limited goals, the risks that might hide un­der that opaque­ness are con­strained.

Trans­par­ent pro­to­cols en­able faster de­vel­op­ment by hu­mans, but I’m con­cerned that it will be even faster to have AI’s gen­er­at­ing sys­tems with less trans­par­ent pro­to­cols.

Implications

The differ­ences be­tween CAIS and agent AGI ought to define a thresh­old, which could func­tion as a fire alarm for AI ex­perts. If AI de­vel­op­ers need to switch to broad util­ity func­tions in or­der to com­pete, that will provide a clear sign that AI risks are high, and that some­thing’s wrong with the CAIS paradigm.

CAIS in­di­cates that it’s im­por­tant to have a con­sor­tium of AI com­pa­nies to pro­mote safety guidelines, and to prop­a­gate a con­sen­sus view on how to stay on the safe side of the nar­row ver­sus broad task thresh­old.

CAIS helps re­duce the pres­sure to clas­sify typ­i­cal AI re­search as dan­ger­ous, and there­fore re­duces AI re­searcher’s mo­ti­va­tion to re­sist AI safety re­search.

Some im­pli­ca­tions for AI safety re­searchers in gen­eral: don’t im­ply that any­one knows whether re­cur­sive self-im­prove­ment will beat other forms of re­cur­sive im­prove­ment. We don’t want to tempt AI re­searchers to try re­cur­sive self-im­prove­ment (by tel­ling peo­ple it’s much more pow­er­ful). And we don’t want to err much in the other di­rec­tion, be­cause we don’t want peo­ple to be com­pla­cent about the risks of re­cur­sive self-im­prove­ment.

Conclusion

CAIS seems some­what more grounded in ex­ist­ing soft­ware prac­tices than, say, the paradigm used in Su­per­in­tel­li­gence, and pro­vides more rea­sons for hope. Yet it pro­vides lit­tle rea­son for com­pla­cency:

The R&D-au­toma­tion/​AI-ser­vices model sug­gests that con­ven­tional AI risks (e.g., failures, abuse, and eco­nomic dis­rup­tion) are apt to ar­rive more swiftly than ex­pected, and per­haps in more acute forms. While this model sug­gests that ex­treme AI risks may be rel­a­tively avoid­able, it also em­pha­sizes that such risks could arise more quickly than ex­pected.

I see im­por­tant un­cer­tainty in whether CAIS will be as fast and effi­cient as agent AGI, and I don’t ex­pect any easy re­s­olu­tion to that un­cer­tainty.

This pa­per is a good start­ing point, but we need some­one to trans­form it into some­thing more rigor­ous.

CAIS is suffi­ciently similar to stan­dard prac­tices that it doesn’t re­quire much work to at­tempt it, and cre­ates few risks.

I’m around 50% con­fi­dent that CAIS plus a nor­mal de­gree of vigilance by AI de­vel­op­ers will be suffi­cient to avoid global catas­tro­phe from AI.