AGIs as populations

I think there’s a rea­son­ably high prob­a­bil­ity that we will end up train­ing AGI in a multi-agent set­ting. But in that case, we shouldn’t just be in­ter­ested in how in­tel­li­gent each agent pro­duced by this train­ing pro­cess is, but also in the com­bined in­tel­lec­tual ca­pa­bil­ities of a pop­u­la­tion of agents. If those agents co­op­er­ate, they will ex­ceed the ca­pa­bil­ities of any one of them—and then it might be use­ful to think of the whole pop­u­la­tion as one AGI. Ar­guably, on a large-scale view, this is how we should think of hu­mans. Each in­di­vi­d­ual hu­man is gen­er­ally in­tel­li­gent in our own right. Yet from the per­spec­tive of chim­panzees, the prob­lem was not that any sin­gle hu­man was in­tel­li­gent enough to take over the world, but rather that mil­lions of hu­mans un­der­went cul­tural evolu­tion to make the hu­man pop­u­la­tion as a whole much more in­tel­li­gent.

This idea isn’t just rele­vant to multi-agent train­ing though: even if we train a sin­gle AGI, we will have strong in­cen­tives to copy it many times to get it to do more use­ful work. If that work in­volves gen­er­at­ing new knowl­edge, then putting copies in con­tact with each other to share that knowl­edge would also in­crease effi­ciency. And so, one way or an­other, I ex­pect that we’ll even­tu­ally end up deal­ing with a “pop­u­la­tion” of AIs. Let’s call the re­sult­ing sys­tem, com­posed of many AIs work­ing to­gether, a pop­u­la­tion AGI.

We should be clear about the differ­ences be­tween three pos­si­bil­ities which each in­volve mul­ti­ple en­tities work­ing to­gether:

  1. A sin­gle AGI com­posed of mul­ti­ple mod­ules, trained in an end-to-end way.

  2. The Com­pre­hen­sive AI Ser­vices (CAIS) model of a sys­tem of in­ter­linked AIs which work to­gether to com­plete tasks.

  3. A pop­u­la­tion AGI as de­scribed above, con­sist­ing of many in­di­vi­d­ual AIs work­ing to­gether in com­pa­rable ways to how a pop­u­la­tion of hu­mans might col­lab­o­rate.

This es­say will only dis­cuss the third pos­si­bil­ity, which differs from the other two in sev­eral ways:

  • Un­like the mod­ules of a sin­gle AGI, the mem­bers of a pop­u­la­tion AGI are not trained in a cen­tral­ised way, on a sin­gle ob­jec­tive func­tion. Rather, op­ti­mi­sa­tion takes place with re­spect to the poli­cies of in­di­vi­d­ual mem­bers, with co­op­er­a­tion be­tween them emerg­ing (ei­ther dur­ing train­ing or de­ploy­ment) be­cause it fits the in­cen­tives of in­di­vi­d­u­als.

  • Un­like CAIS ser­vices and sin­gle AGI mod­ules, the mem­bers of a pop­u­la­tion AGI are fairly ho­mo­ge­neous; they weren’t all trained on to­tally differ­ent tasks (and in fact may start off iden­ti­cal to each other).

  • Un­like CAIS ser­vices and sin­gle AGI mod­ules, the mem­bers of a pop­u­la­tion AGI are each gen­er­ally in­tel­li­gent by them­selves—and there­fore ca­pa­ble of play­ing mul­ti­ple roles in the pop­u­la­tion AGI, and in­ter­act­ing in flex­ible ways.

  • Un­like CAIS ser­vices and sin­gle AGI mod­ules, the mem­bers of a pop­u­la­tion AGI might be in­di­vi­d­u­ally mo­ti­vated by ar­bi­trar­ily large-scale goals.

What are the rele­vant differ­ences from a safety per­spec­tive be­tween this pop­u­la­tion-based view and the stan­dard view? Speci­fi­cally, let’s com­pare a “pop­u­la­tion AGI” to a sin­gle AGI which can do just as much in­tel­lec­tual work as the whole pop­u­la­tion com­bined. Here I’m think­ing par­tic­u­larly of the most high-level work (such as do­ing sci­en­tific re­search, or mak­ing good strate­gic de­ci­sions), since that seems like a fairer com­par­i­son.

Interpretability

We might hope that a pop­u­la­tion AGI will be more in­ter­pretable than a sin­gle AGI, since its mem­bers will need to pass in­for­ma­tion to each other in a stan­dard­ised “lan­guage”. By con­trast, the differ­ent mod­ules in a sin­gle AGI may have de­vel­oped spe­cial­ised ways of com­mu­ni­cat­ing with each other. In hu­mans, lan­guage is much lower-band­width than thought. This isn’t a nec­es­sary fea­ture of com­mu­ni­ca­tion, though—mem­bers of a pop­u­la­tion AGI could be al­lowed to send data be­tween each other at an ar­bi­trar­ily high rate. De­creas­ing this com­mu­ni­ca­tion band­width might be a use­ful way to in­crease the in­ter­pretabil­ity of a pop­u­la­tion AGI.

Flexibility

Re­gard­less of the spe­cific de­tails of how they col­lab­o­rate and share in­for­ma­tion, mem­bers of a pop­u­la­tion AGI will need struc­tures and norms for do­ing so. There’s a sense in which some of the “work” of solv­ing prob­lems is done by those norms—for ex­am­ple, the struc­ture of a de­bate can be more or less helpful in ad­ju­di­cat­ing the claims made. The analo­gous as­pect of a sin­gle AGI is the struc­ture of its cog­ni­tive mod­ules and how they in­ter­act with each other. How­ever, the struc­ture of a pop­u­la­tion AGI would be much more flex­ible—and in par­tic­u­lar, it could be re­designed by the pop­u­la­tion AGI it­self in or­der to im­prove the flow of in­for­ma­tion. By con­trast, the mod­ules of a sin­gle AGI will have been de­signed by an op­ti­miser, and so fit to­gether much more rigidly. This likely makes them work to­gether more effi­ciently; the effi­ciency of end-to-end op­ti­mi­sa­tion is why a hu­man with a brain twice as large would be much more in­tel­li­gent than two nor­mal hu­mans col­lab­o­rat­ing. But the con­comi­tant lack of flex­i­bil­ity is why it’s much eas­ier to im­prove our co­or­di­na­tion pro­to­cols than our brain func­tion­al­ity.

Fine-tunability

Sup­pose we want to re­train an AGI to have a new set of goals. How easy is this in each case? Well, for a sin­gle AGI we can just train it on a new ob­jec­tive func­tion, in the same way we trained it on the old one. For a pop­u­la­tion AGI where each of the mem­bers was trained in­di­vi­d­u­ally, how­ever, we may not have good meth­ods for as­sign­ing credit when the whole pop­u­la­tion is try­ing to work to­gether to­wards a sin­gle task. For ex­am­ple, a difficulty dis­cussed in Sune­hag et al. (2017) is that one agent start­ing to learn a new skill might in­terfere with the perfor­mance of other agents—and the re­sult­ing de­crease in re­ward teaches the first agent to stop at­tempt­ing the new skill. This would be par­tic­u­larly rele­vant if the origi­nal pop­u­la­tion AGI was pro­duced by copy­ing an sin­gle agent trained by it­self—if so, it’s plau­si­ble that multi-agent re­in­force­ment learn­ing tech­niques have lagged be­hind.

Agency

This is a tricky one. I think that a pop­u­la­tion AGI is likely to be less agen­tic and goal-di­rected than a sin­gle AGI of equiv­a­lent in­tel­li­gence, be­cause differ­ent mem­bers of the pop­u­la­tion may have differ­ent goals which push in differ­ent di­rec­tions. How­ever, it’s also pos­si­ble that pop­u­la­tion-level phe­nom­ena am­plify goal-di­rected be­havi­our. For ex­am­ple, com­pe­ti­tion be­tween differ­ent mem­bers in a pop­u­la­tion AGI could push the group as a whole to­wards dan­ger­ous be­havi­our (in a similar way to how com­pe­ti­tion be­tween com­pa­nies makes hu­mans less safe from the per­spec­tive of chim­panzees). And our less­ened abil­ity to fine-tune them, as dis­cussed in the pre­vi­ous para­graph, might make it difficult to know how to in­ter­vene to pre­vent that.

Over­all eval­u­a­tion of pop­u­la­tion AGIs

I think that the ex­tent to which a pop­u­la­tion AGI is more dan­ger­ous than an equiv­a­lently in­tel­li­gent sin­gle AGI will mainly de­pend on how the in­di­vi­d­ual mem­bers are trained (in ways which I’ve dis­cussed pre­vi­ously). If we con­di­tion on a given train­ing regime be­ing used for both ap­proaches, though, it’s much less clear which type of AGI we should pre­fer. It’d be use­ful to see more ar­gu­ments ei­ther way—in par­tic­u­lar be­cause a bet­ter un­der­stand­ing of the pros and cons of each ap­proach might in­fluence our train­ing de­ci­sions. For ex­am­ple, dur­ing multi-agent train­ing there may be a trade­off be­tween train­ing in­di­vi­d­ual AIs to be more in­tel­li­gent, ver­sus run­ning more copies of them to teach them to co­op­er­ate at larger scales. In such en­vi­ron­ments we could also try to en­courage or dis­cour­age them from in-depth com­mu­ni­ca­tion with each other.

In my next post, I’ll dis­cuss one ar­gu­ment for why pop­u­la­tion AGIs might be safer: be­cause they can be de­ployed in more con­strained ways.