Comments on CAIS

Over the last few months I’ve talked with Eric Drexler a num­ber of times about his Com­pre­hen­sive AI Ser­vices (CAIS) model of AI de­vel­op­ment, and read most of his tech­ni­cal re­port on the topic. I think these are im­por­tant ideas which are well worth en­gag­ing with, de­spite per­son­ally be­ing skep­ti­cal about many of the con­clu­sions. Below I’ve sum­marised what I see as the core com­po­nents of Eric’s view, fol­lowed by some of own ar­gu­ments. Note that these are only my per­sonal opinions. I did make some changes to the sum­mary based on Eric’s com­ments on early drafts, to bet­ter re­flect his po­si­tion—how­ever, there are likely still ways I’ve mis­rep­re­sented him. Also note that this was writ­ten be­fore read­ing Ro­hin’s sum­mary of the same re­port, al­though I do broadly agree with most of Ro­hin’s points.

One use­ful piece of con­text for this model is Eric’s back­ground in nan­otech­nol­ogy, and his ad­vo­cacy for the de­vel­op­ment of nan­otech as “atom­i­cally pre­cise man­u­fac­tur­ing” rather than self-repli­cat­ing nanoma­chines. The re­la­tion­ship be­tween these two frame­works has clear par­allels with the re­la­tion­ship be­tween CAIS and a re­cur­sively self-im­prov­ing su­per­in­tel­li­gence.

The CAIS model:

  1. The stan­dard ar­gu­ments in AI safety are con­cerned with the de­vel­op­ment of a sin­gle AGI agent do­ing open-ended op­ti­mi­sa­tion. Be­fore we build such an en­tity (if we do so at all), we will build AI ser­vices which each perform a bounded task with bounded re­sources, and which can be com­bined to achieve su­per­hu­man perfor­mance on a wide range of tasks.

  2. AI ser­vices may or may not be “agents”. How­ever, un­der CAIS there will be no en­tity op­ti­mis­ing ex­tremely hard to­wards its goals in the way that most AI safety re­searchers have been wor­ry­ing about, be­cause:

    1. Each ser­vice will be rel­a­tively spe­cial­ised and my­opic (fo­cused on cur­rent epi­sodic perfor­mance, not max­imi­sa­tion over the whole fu­ture). This is true of ba­si­cally all cur­rent AI ap­pli­ca­tions, e.g. image clas­sifiers or Google Trans­late.

    2. Although ra­tio­nal agents can be proved equiv­a­lent to util­ity-max­imisers, the same is not nec­es­sar­ily true of sys­tems of ra­tio­nal agents. Most such sys­tems are fun­da­men­tally differ­ent in struc­ture from ra­tio­nal agents—for ex­am­ple, in­di­vi­d­ual agents within the sys­tem can com­pete with or crit­i­cise each other. And since AI ser­vices aren’t “ra­tio­nal agents” in the first place, a sys­tem com­posed of them is even less likely to im­ple­ment a util­ity-max­imiser.

    3. There won’t be very much de­mand for unified AIs which au­tonomously carry out large-scale tasks re­quiring gen­eral ca­pa­bil­ities, be­cause sys­tems of AI ser­vices will be able to perform those tasks just as well or bet­ter.

  3. Early AI ser­vices could do things like mas­sively dis­rupt fi­nan­cial mar­kets, in­crease the rate of sci­en­tific dis­cov­ery, help run com­pa­nies, etc. Even­tu­ally they should be able to do any task that hu­mans can, at our level or higher.

    1. They could also be used to re­cur­sively im­prove AI tech­nolo­gies and to de­velop AI ap­pli­ca­tions, but usu­ally with hu­mans in the loop—in roughly the same way that sci­ence al­lows us to build bet­ter tools with which to do bet­ter sci­ence.

  4. Our pri­ori­ties in do­ing AI safety re­search can and should be in­formed by this model:

    1. A main role for tech­ni­cal AI safety re­searchers should be to look at the emer­gent prop­er­ties of sys­tems of AI ser­vices, e.g. which com­bi­na­tions of ar­chi­tec­tures, tasks and se­lec­tion pres­sures could lead to risky be­havi­our, as well as the stan­dard prob­lems of spec­i­fy­ing bounded tasks.

    2. AI safety ex­perts can also give on­go­ing ad­vice and steer the de­vel­op­ment of AI ser­vices. AI safety re­searchers shouldn’t think of safety as a one-shot prob­lem, but rather a se­ries of on­go­ing ad­just­ments.

    3. AI ser­vices will make it much eas­ier to pre­vent the de­vel­op­ment of un­bounded agent-like AGI through meth­ods like in­creas­ing co­or­di­na­tion and en­abling surveillance, if the poli­ti­cal will can be mus­tered.

I’m broadly sym­pa­thetic to the em­piri­cal claim that we’ll de­velop AI ser­vices which can re­place hu­mans at most cog­ni­tively difficult jobs sig­nifi­cantly be­fore we de­velop any sin­gle su­per­hu­man AGI (one unified sys­tem that can do nearly all cog­ni­tive tasks as well as or bet­ter than any hu­man). One plau­si­ble mechanism is that deep learn­ing con­tinues to suc­ceed on tasks where there’s lots of train­ing data, but doesn’t learn how to rea­son in gen­eral ways—e.g. it could learn from court doc­u­ments how to imi­tate lawyers well enough to re­place them in most cases, with­out be­ing able to un­der­stand law in the way hu­mans do. Self-driv­ing cars are an­other per­ti­nent ex­am­ple. If that pat­tern re­peats across most hu­man pro­fes­sions, we might see mas­sive so­cietal shifts well be­fore AI be­comes dan­ger­ous in the ad­ver­sar­ial way that’s usu­ally dis­cussed in the con­text of AI safety.

If I had to sum up my ob­jec­tions to Eric’s frame­work in one sen­tence, it would be: “the more pow­er­ful each ser­vice is, the harder it is to en­sure it’s in­di­vi­d­u­ally safe; the less pow­er­ful each ser­vice is, the harder it is to com­bine them in a way that’s com­pet­i­tive with unified agents.” I’ve laid out my ar­gu­ments in more de­tail be­low.

Richard’s view:

  1. Open-ended agentlike AI seems like the most likely can­di­date for the first strongly su­per­hu­man AGI sys­tem.

    1. As a ba­sic prior, our only ex­am­ple of gen­eral in­tel­li­gence so far is our­selves—a species com­posed of agentlike in­di­vi­d­u­als who pur­sue open-ended goals. So it makes sense to ex­pect AGIs to be similar—es­pe­cially if you be­lieve that our progress in ar­tifi­cial in­tel­li­gence is largely driven by semi-ran­dom search with lots of com­pute (like evolu­tion was) rather than prin­ci­pled in­tel­li­gent de­sign.

      1. In par­tic­u­lar, the way we trained on the world—both as a species and as in­di­vi­d­u­als—was by in­ter­act­ing with it in a fairly un­con­strained way. Many ma­chine learn­ing re­searchers be­lieve that we’ll get su­per­hu­man AGI via a similar ap­proach, by train­ing RL agents in simu­lated wor­lds. Even if we then used such agents as “ser­vices”, they wouldn’t be bounded in the way pre­dicted by CAIS.

    2. Many com­plex tasks don’t eas­ily de­com­pose into sep­a­rable sub­tasks. For in­stance, while writ­ing this post I had to keep my holis­tic im­pres­sion of Eric’s ideas in mind most of the time. This im­pres­sion was formed through hav­ing con­ver­sa­tions and read­ing es­says, but was up­dated fre­quently as I wrote this post, and also draws on a wide range of my back­ground knowl­edge. I don’t see how CAIS would split the task of un­der­stand­ing a high-level idea be­tween mul­ti­ple ser­vices, or (if it were done by a sin­gle ser­vice) how that ser­vice would in­ter­act with an es­say-writ­ing ser­vice, or an AI-safety-re­search ser­vice.

      1. Note that this isn’t an ar­gu­ment against AGI be­ing mod­u­lar, but rather an ar­gu­ment that re­quiring the roles of each mod­ule and the ways they in­ter­face with each other to be hu­man-speci­fied or even just hu­man-com­pre­hen­si­ble will be very un­com­pet­i­tive com­pared with learn­ing them in an un­con­strained way. Even on to­day’s rel­a­tively sim­ple tasks, we already see end-to-end train­ing out­com­pet­ing other ap­proaches, and learned rep­re­sen­ta­tions out­perform­ing hu­man-made rep­re­sen­ta­tions. The ba­sic rea­son is that we aren’t smart enough to un­der­stand how the best cog­ni­tive struc­tures or rep­re­sen­ta­tions work. Yet it’s key to CAIS that each ser­vice performs a spe­cific known task, rather than just do­ing use­ful com­pu­ta­tion in gen­eral—oth­er­wise we could con­sider each lobe of the hu­man brain to be a “ser­vice”, and the com­bi­na­tion of them to be un­safe in all the stan­dard ways.

      2. It’s not clear to me whether this is also an ar­gu­ment against IDA. I think that it prob­a­bly is, but to a lesser ex­tent, be­cause IDA al­lows mul­ti­ple lay­ers of task de­com­po­si­tion which are in­com­pre­hen­si­ble to hu­mans be­fore bot­tom­ing out in sub­tasks which we can perform.

    3. Even if task de­com­po­si­tion can be solved, hu­mans reuse most of the same cog­ni­tive fac­ul­ties for most of the tasks that we can carry out. If many AI ser­vices end up re­quiring similar fac­ul­ties to each other, it would likely be more effi­cient to unify them into a sin­gle en­tity. It would also be more effi­cient if that en­tity could pick up new tasks in the same rapid way that hu­mans do, be­cause then you wouldn’t need to keep re­train­ing. At that point, it seems like you no longer have an AI ser­vice but rather the same sort of AGI that we’re usu­ally wor­ried about. (In other words, meta-learn­ing is very im­por­tant but doesn’t fit nat­u­rally into CAIS).

    4. Hu­mans think in terms of in­di­vi­d­u­als with goals, and so even if there’s an equally good ap­proach to AGI which doesn’t con­ceive of it as a sin­gle goal-di­rected agent, re­searchers will be bi­ased against it.

  2. Even as­sum­ing that the first su­per­in­tel­li­gent AGI is in fact a sys­tem of ser­vices as de­scribed by the CAIS frame­work, it will be much more like an agent op­ti­mis­ing for an open-ended goal than Eric claims.

    1. There’ll be sig­nifi­cant pres­sure to re­duce the ex­tent to which hu­mans are in the loop of AI ser­vices, for effi­ciency rea­sons. E.g. when a CEO can’t im­prove on the strate­gic ad­vice given to it by an AI, or the im­ple­men­ta­tion by an­other AI, there’s no rea­son to have that CEO any more. Then we’ll see con­soli­da­tion of nar­row AIs into one over­all sys­tem which makes de­ci­sions and takes ac­tions, and may well be given an un­bounded goal like “max­imise share­holder value”. (Eric agrees that this is dan­ger­ous, and con­sid­ers it more rele­vant than other threat mod­els).

    2. Even if we have lots of in­di­vi­d­u­ally bounded-yet-effi­ca­cious mod­ules, the task of com­bin­ing them to perform well in new tasks seems like a difficult one which will re­quire a broad un­der­stand­ing of the world. An over­seer ser­vice which is trained to com­bine those mod­ules to perform ar­bi­trary tasks may be dan­ger­ous be­cause if it is goal-ori­ented, it can use those mod­ules to fulfil its goals (on the as­sump­tion that for most com­plex tasks, some com­bi­na­tion of mod­ules performs well—if not, then we’ll be us­ing a differ­ent ap­proach any­way).

      1. While I ac­cept that many ser­vices can be trained in a way which makes them nat­u­rally bounded and my­opic, this is much less clear to me in the case of an over­seer which is re­spon­si­ble for large-scale al­lo­ca­tion of other ser­vices. In ad­di­tion to su­per­hu­man plan­ning ca­pa­bil­ities and world-knowl­edge, it would prob­a­bly re­quire ar­bi­trar­ily long epi­sodes so that it can im­ple­ment and mon­i­tor com­plex plans. My guess is that Eric would ar­gue that this over­seer would it­self be com­posed of bounded ser­vices, in which case the real dis­agree­ment is how com­pet­i­tive that de­com­po­si­tion would be (which re­lates to point 1.2 above).

  3. Even as­sum­ing that the first su­per­in­tel­li­gent AGI is in fact a sys­tem of ser­vices as de­scribed the CAIS frame­work, fo­cus­ing on su­per­in­tel­li­gent agents which pur­sue un­bounded goals is still more use­ful for tech­ni­cal re­searchers. (Note that I’m less con­fi­dent in this claim than the oth­ers).

    1. Even­tu­ally we’ll have the tech­nol­ogy to build unified agents do­ing un­bounded max­imi­sa­tion. Once built, such agents will even­tu­ally over­take CAIS su­per­in­tel­li­gences be­cause they’ll have more effi­cient in­ter­nal struc­ture and will be op­ti­mis­ing harder for self-im­prove­ment. We shouldn’t rely on global co­or­di­na­tion to pre­vent peo­ple from build­ing un­bounded op­ti­misers, be­cause it’s hard and hu­mans are gen­er­ally bad at it.

    2. Con­di­tional on both sorts of su­per­in­tel­li­gences ex­ist­ing, I think (and I would guess that Eric agrees) that CAIS su­per­in­tel­li­gences are sig­nifi­cantly less likely to cause ex­is­ten­tial catas­tro­phe. And in gen­eral, it’s eas­ier to re­duce the ab­solute like­li­hood of an event the more likely it is (even a 10% re­duc­tion of a 50% risk is more im­pact­ful than a 90% re­duc­tion of a 5% risk). So un­less we think that tech­ni­cal re­search to re­duce the prob­a­bil­ity of CAIS catas­tro­phes is sig­nifi­cantly more tractable than other tech­ni­cal AI safety re­search, it shouldn’t be our main fo­cus.

As a more gen­eral note, I think that one of the main strengths of CAIS is in forc­ing us to be more spe­cific about what tasks we en­visage AGI be­ing used for, rather than pic­tur­ing it di­vorced from de­vel­op­ment and de­ploy­ment sce­nar­ios. How­ever, I worry that the fuzzi­ness of the usual con­cept of AGI has now been re­placed by a fuzzy no­tion of “ser­vice” which makes sense in our cur­rent con­text, but may not in the con­text of much more pow­er­ful AI tech­nol­ogy. So while CAIS may be a good model of early steps to­wards AGI, I think it is a worse model of the pe­riod I’m most wor­ried about. I find CAIS most valuable in its role as a re­search agenda (as op­posed to a pre­dic­tive frame­work): it seems worth fur­ther in­ves­ti­gat­ing the prop­er­ties of AIs com­posed of mod­u­lar and bounded sub­sys­tems, and the ways in which they might be safer (or more dan­ger­ous) than al­ter­na­tives.

Many thanks to Eric for the time he spent ex­plain­ing his ideas and com­ment­ing on drafts. I also par­tic­u­larly ap­pre­ci­ated feed­back from Owain Evans, Ro­hin Shah and Jan Leike.