Collaboration-by-Design versus Emergent Collaboration

Introduction

This seems to be a non-triv­ial prob­lem for even cur­rent nar­row AI, which is much more prob­le­matic for strong NAI, which I haven’t seen called out or named ex­plic­itly. I provide a quick liter­a­ture re­view to ex­plain why I think it’s ig­nored in clas­sic multi-agent sys­tem de­sign. (But I might be cor­rected)

It is un­clear to me whether we can ex­pect even in­tro­spec­tive GAI to “just solve it” by notic­ing that it is a prob­lem and work­ing to fix it, given that peo­ple of­ten don’t seem to man­age it.

The problem

One challenge for safe AI is the in­trin­sic difficulty of co­or­di­na­tion prob­lems. This in­cludes co­or­di­na­tion with hu­mans, co­or­di­na­tion with other AI sys­tems, and po­ten­tially self-co­or­di­na­tion when AI uses mul­ti­ple agents. Un­for­tu­nately, the typ­i­cal sys­tem de­sign in­tends to max­i­mize some fit­ness func­tion, not to co­or­di­nate in or­der to al­low mu­tu­ally benefi­cial in­ter­ac­tion.

There is ex­ten­sive liter­a­ture on multi-agent co­or­di­na­tion for task-based del­e­ga­tion and co­op­er­a­tion, dat­ing back to at least the 1980 Con­tract Net In­ter­ac­tion Pro­to­col, which al­lows au­tonomous agents to spec­ify mar­kets for in­ter­ac­tion. This is use­ful, but doesn’t avoid any of the prob­lems with mar­ket failures and in­ad­e­quate equillibria. (In fact, it prob­a­bly in­duces such failures, since in­di­vi­d­ual con­tracts are the atomic unit or in­ter­ac­tion.) Ex­ten­sive fol­low-up work on dis­tributed con­sen­sus prob­lems as­sumes that all agents are built to achieve con­sen­sus. This may be im­por­tant for AI co­or­di­na­tion, but re­quires clearly defined com­mu­ni­ca­tion chan­nels and well-un­der­stood do­mains. Work on Col­lab­o­ra­tive In­tel­li­gence is also in­tended to al­low col­lab­o­ra­tion, but it is un­clear that there is sub­stan­tive on­go­ing work in that area. Mul­tis­cale de­ci­sion the­ory at­tempts to build multi-scale mod­els for de­ci­sion mak­ing, but is not tied ex­plic­itly to mul­ti­ple agents.

What most of the liter­a­ture shares is an as­sump­tion that agents will be de­signed for co­op­er­a­tion and col­lab­o­ra­tion. In­duc­ing col­lab­o­ra­tion in agents not ex­plic­itly de­signed for that task is a very differ­ent prob­lem, as is find­ing co­or­di­nated goals that can be achieved.

“Solu­tions”?

The ob­vi­ous solu­tion is to ex­pect multi-agent sys­tems to have agents with mod­els of other agents that are so­phis­ti­cated enough to build strate­gies that al­low col­lab­o­ra­tion. In situ­a­tions where mul­ti­ple equil­ibria ex­ist, mov­ing from pareto-dom­i­nated equil­ibria to bet­ter ones of­ten re­quires co­or­di­na­tion, which re­quires un­der­stand­ing that ini­tially costly moves to­wards the bet­ter equil­ibrium will be matched by other play­ers. As I ar­gued ear­lier, there are fun­da­men­tal limi­ta­tions on the mod­els of em­bed­ded agents that we don’t have good solu­tions to. (If we find good ways to build em­bed­ded agents, we may also find good ways to de­sign em­bed­ded agents for co­op­er­a­tion. This isn’t ob­vi­ous.)

Col­lab­o­ra­tion-by-de­sign, on the other hand, is much eas­ier. Un­for­tu­nately, AI-race dy­nam­ics make it seem un­likely. The other al­ter­na­tive is to ex­plic­itly de­sign safety pa­ram­e­ters, as Mo­bil­eye has done for self driv­ing cars with “RSS”—limit­ing the space in which they can make de­ci­sions to en­force limits about how cars in­ter­act. This seems in­tractable in do­mains where safety is ill-defined, and seems to re­quire much bet­ter un­der­stand­ing of cor­rigi­bil­ity, at the very least.

Next Steps

Per­haps there are ap­proaches I haven’t con­sid­ered, or rea­sons to think this isn’t a prob­lem. Alter­na­tively, per­haps there is a clearer way to frame the prob­lem that ex­ists ow which I am un­aware, or the prob­lem could be framed more clearly in a way I am not see­ing. As a first step, progress on iden­ti­fi­ca­tion on ei­ther front seems use­ful.