Reframing Superintelligence: Comprehensive AI Services as General Intelligence

Link post

Since the CAIS tech­ni­cal re­port is a gar­gan­tuan 210 page doc­u­ment, I figured I’d write a post to sum­ma­rize it. I have fo­cused on the ear­lier chap­ters, be­cause I found those to be more im­por­tant for un­der­stand­ing the core model. Later chap­ters spec­u­late about more con­crete de­tails of how AI might de­velop, as well as the im­pli­ca­tions of the CAIS model on strat­egy.

The Model

The core idea is to look at the path­way by which we will de­velop gen­eral in­tel­li­gence, rather than as­sum­ing that at some point we will get a su­per­in­tel­li­gent AGI agent. To pre­dict how AI will progress in the fu­ture, we can look at how AI pro­gresses cur­rently—through re­search and de­vel­op­ment (R&D) pro­cesses. AI re­searchers con­sider a prob­lem, define a search space, for­mu­late an ob­jec­tive, and use an op­ti­miza­tion tech­nique in or­der to ob­tain an AI sys­tem, called a ser­vice, that performs the task.

A ser­vice is an AI sys­tem that de­liv­ers bounded re­sults for some task us­ing bounded re­sources in bounded time. Su­per­in­tel­li­gent lan­guage trans­la­tion would count as a ser­vice, even though it re­quires a very de­tailed un­der­stand­ing of the world, in­clud­ing en­g­ineer­ing, his­tory, sci­ence, etc. Epi­sodic RL agents also count as ser­vices.

While each of the AI R&D sub­tasks is cur­rently performed by a hu­man, as AI pro­gresses we should ex­pect that we will au­to­mate these tasks as well. At that point, we will have au­to­mated R&D, lead­ing to re­cur­sive tech­nolog­i­cal im­prove­ment. This is not re­cur­sive self-im­prove­ment, be­cause the im­prove­ment comes from R&D ser­vices cre­at­ing im­prove­ments in ba­sic AI build­ing blocks, and those im­prove­ments feed back into the R&D ser­vices. All of this should hap­pen be­fore we get any pow­er­ful AGI agents that can do ar­bi­trary gen­eral rea­son­ing.

Why Com­pre­hen­sive?

Since ser­vices are fo­cused on par­tic­u­lar tasks, you might think that they aren’t gen­eral in­tel­li­gence, since there would be some tasks for which there is no ser­vice. How­ever, pretty much ev­ery­thing we do can be thought of as a task—in­clud­ing the task of cre­at­ing a new ser­vice. When we have a new task that we would like au­to­mated, our ser­vice-cre­at­ing-ser­vice can cre­ate a new ser­vice for that task, per­haps by train­ing a new AI sys­tem, or by tak­ing a bunch of ex­ist­ing ser­vices and putting them to­gether, etc. In this way, the col­lec­tion of ser­vices can perform any task, and so as an ag­gre­gate is gen­er­ally in­tel­li­gent. As a re­sult, we can call this Com­pre­hen­sive AI Ser­vices, or CAIS. The “Com­pre­hen­sive” in CAIS is the ana­log of the “Gen­eral” in AGI. So, we’ll have the ca­pa­bil­ities of an AGI agent, be­fore we can ac­tu­ally make a mono­lithic AGI agent.

Isn’t this just as dan­ger­ous as AGI?

You might ar­gue that each in­di­vi­d­ual ser­vice must be dan­ger­ous, since it is su­per­in­tel­li­gent at its par­tic­u­lar task. How­ever, since the ser­vice is op­ti­miz­ing for some bounded task, it is not go­ing to run a long-term plan­ning pro­cess, and so it will not have any of the stan­dard con­ver­gent in­stru­men­tal sub­goals (un­less the sub­goals are helpful for the task be­fore reach­ing the bound).

In ad­di­tion, all of the op­ti­miza­tion pres­sure on the ser­vice is push­ing it to­wards a par­tic­u­lar nar­row task. This sort of strong op­ti­miza­tion tends to fo­cus be­hav­ior. Any long term plan­ning pro­cesses that con­sider weird plans for achiev­ing goals (similar to “break out of the box”) will typ­i­cally not find any such plan and will be elimi­nated in fa­vor of cog­ni­tion that will ac­tu­ally help achieve the task. Think of how a race­car is op­ti­mized for speed, while a bus is op­ti­mized for car­ry­ing pas­sen­gers, rather than hav­ing a “gen­er­ally ca­pa­ble ve­hi­cle”.

It’s also worth not­ing what we mean by su­per­in­tel­li­gent here. In this case, we mean that the ser­vice is ex­tremely com­pe­tent at its as­signed task. It need not be learn­ing at all. We see this dis­tinc­tion with RL agents—when they are trained us­ing some­thing like PPO, they are learn­ing, but at test time you can sim­ply ex­e­cute them with­out any PPO and they will perform the be­hav­ior they pre­vi­ously learned and won’t change that be­hav­ior at all.

(My opinion: I think this isn’t en­gag­ing with the worry with RL agents—typ­i­cally, we’re wor­ried about the set­ting where the RL agent is learn­ing or plan­ning at test time, which can hap­pen in learn-to-learn and on­line learn­ing set­tings, or even with vanilla RL if the learned policy has ac­cess to ex­ter­nal mem­ory and can im­ple­ment a plan­ning pro­cess sep­a­rately from the train­ing pro­ce­dure.)

On a differ­ent note, you might ar­gue that if we an­a­lyze the sys­tem of ser­vices as a whole, then it cer­tainly looks gen­er­ally in­tel­li­gent, and so should be re­garded as an AGI agent. How­ever, “AGI agent” usu­ally car­ries the an­thro­po­mor­phic con­no­ta­tion of VNM ra­tio­nal­ity /​ ex­pected util­ity max­i­miza­tion /​ goal-di­rect­ed­ness. While it seems pos­si­ble and even likely that each in­di­vi­d­ual ser­vice can be well-mod­eled as VNM ra­tio­nal (albeit with a bounded util­ity func­tion), it is not the case that a sys­tem of VNM ra­tio­nal agents will it­self look VNM ra­tio­nal—in fact, game the­ory is all about how sys­tems of ra­tio­nal agents have weird be­hav­ior.

In ad­di­tion, there are sev­eral as­pects of CAIS that make it more safe than a clas­sic mono­lithic AGI agent. Un­der CAIS, each ser­vice in­ter­acts with other ser­vices via clearly defined chan­nels of com­mu­ni­ca­tion, so that the sys­tem is in­ter­pretable and trans­par­ent, even though each ser­vice may be opaque. We can rea­son about what in­for­ma­tion is pre­sent in the in­puts to in­fer what the ser­vice could pos­si­bly know. We could also provide ac­cess to some ca­pa­bil­ity through an ex­ter­nal re­source dur­ing train­ing, so that the ser­vice doesn’t de­velop that ca­pa­bil­ity it­self.

This in­ter­pretabil­ity al­lows us to mon­i­tor the ser­vice—for ex­am­ple, we could look at which sub­ser­vices it ac­cesses in or­der to make sure it isn’t do­ing any­thing crazy. But what if hav­ing a hu­man in the loop leads to un­ac­cept­able de­lays? Well, this would only hap­pen for de­ployed ap­pli­ca­tions, where hav­ing a hu­man in the loop seems ex­pected, and should also be eco­nom­i­cally in­cen­tivized be­cause it leads to bet­ter be­hav­ior. Ba­sic AI R&D can con­tinue to be im­proved au­tonomously with­out a hu­man in the loop, so you could still see an in­tel­li­gence ex­plo­sion. Note that tac­ti­cal tasks re­quiring quick re­ac­tion times prob­a­bly would be del­e­gated to AI ser­vices, but the im­por­tant strate­gic de­ci­sions could still be left in hu­man hands (as­sisted by AI ser­vices, of course).

What hap­pens when we cre­ate AGI?

Well, it might not be valuable to cre­ate an AGI. We want to perform many differ­ent tasks, and it makes sense for these to be done by di­verse ser­vices. It would not be com­pet­i­tive to in­clude all ca­pa­bil­ities in a sin­gle mono­lithic agent. This is analo­gous to how spe­cial­iza­tion of la­bor is a good idea for us hu­mans.

(My opinion: It seems like the les­son of deep learn­ing is that if you can do some­thing end-to-end, that will work bet­ter than a struc­tured ap­proach. This has hap­pened with com­puter vi­sion, nat­u­ral lan­guage pro­cess­ing, and seems to be in the pro­cess of hap­pen­ing with robotics. So I don’t buy this—while it seems true that we will get CAIS be­fore AGI since struc­tured ap­proaches tend to be available sooner and to work with less com­pute, I ex­pect that a mono­lithic AGI agent would out­perform CAIS at most tasks once we can make one.)

That said, if we ever do build AGI, we can lev­er­age the ser­vices from our CAIS-world in or­der to make it safe. We could use su­per­in­tel­li­gent se­cu­rity ser­vices to con­strain any AGI agent that we build. For ex­am­ple, we could have ser­vices trained to iden­tify long-term plan­ning pro­cesses and to perform ad­ver­sar­ial test­ing and red team­ing.

Safety in the CAIS world

While CAIS sug­gests that we will not have AGI agents, this does not mean that we au­to­mat­i­cally get safety. We will still have AI sys­tems that take high im­pact ac­tions, and if they take even one wrong ac­tion of this sort it could be catas­trophic. One way this could hap­pen is if the sys­tem of ser­vices starts to show agen­tic be­hav­ior—our stan­dard AI safety work could ap­ply to this sce­nario.

In or­der to en­sure safety, we should have AI safety re­searchers figure out and cod­ify the best de­vel­op­ment prac­tices that need to be fol­lowed. For ex­am­ple, we could try to always use pre­dic­tive mod­els of hu­man (dis)ap­proval as a san­ity check on any plan that is be­ing en­acted. We could also train AI ser­vices that can ad­ver­sar­i­ally check new ser­vices to make sure they are safe.

Summary

The CAIS model sug­gests that be­fore we get to a world with mono­lithic AGI agents, we will already have seen an in­tel­li­gence ex­plo­sion due to au­to­mated R&D. This re­frames the prob­lems of AI safety and has im­pli­ca­tions for what tech­ni­cal safety re­searchers should be do­ing.