Characterizing Real-World Agents as a Research Meta-Strategy

Background

In­tu­itively, the real world seems to con­tain agenty sys­tems (e.g. hu­mans), non-agenty sys­tems (e.g. rocks), and am­bigu­ous cases which dis­play some agent-like be­hav­ior some­times (bac­te­ria, neu­ral nets, fi­nan­cial mar­kets, ther­mostats, etc). There’s a vague idea that agenty sys­tems pur­sue con­sis­tent goals in a wide va­ri­ety of en­vi­ron­ments, and that var­i­ous char­ac­ter­is­tics are nec­es­sary for this flex­ible goal-ori­ented be­hav­ior.

But once we get into the nitty-gritty, it turns out we don’t re­ally have a full math­e­mat­i­cal for­mal­iza­tion of these in­tu­itions. We lack a char­ac­ter­i­za­tion of agents.

To date, the clos­est we’ve come to char­ac­ter­iz­ing agents in gen­eral are the co­her­ence the­o­rems un­der­ly­ing Bayesian in­fer­ence and util­ity max­i­miza­tion. A wide va­ri­ety of the­o­rems with a wide va­ri­ety of differ­ent as­sump­tions all point to­wards agents which perform Bayesian in­fer­ence and choose their ac­tions to max­i­mize ex­pected util­ity. In this frame­work, an agent is char­ac­ter­ized by two pieces:

  • A prob­a­bil­is­tic world-model

  • A util­ity function

The Bayesian util­ity char­ac­ter­i­za­tion of agency neatly cap­tures many of our in­tu­itions of agency: the im­por­tance of ac­cu­rate be­liefs about the en­vi­ron­ment, the differ­ence be­tween things which do and don’t con­sis­tently pur­sue a goal (or ap­prox­i­mately pur­sue a goal, or some­times pur­sue a goal…), the im­por­tance of up­dat­ing on new in­for­ma­tion, etc.

Sadly, for pur­poses of AGI al­ign­ment, the stan­dard Bayesian util­ity char­ac­ter­i­za­tion is in­com­plete at best. Some ex­am­ple is­sues in­clude:

One way to view agent foun­da­tions re­search is that it seeks a char­ac­ter­i­za­tion of agents which re­solves prob­lems like the first two above. We want the same sort of benefits offered by the Bayesian util­ity char­ac­ter­i­za­tion, but in a wider and more re­al­is­tic range of agenty sys­tems.

Char­ac­ter­iz­ing Real-World Agents

We want to char­ac­ter­ize agency. We have a bunch of real-world sys­tems which dis­play agency to vary­ing de­grees. One ob­vi­ous strat­egy is to go study and char­ac­ter­ize those real-world agenty sys­tems.

Con­cretely, what would this look like?

Well, let’s set aside the short­com­ings of the stan­dard Bayesian util­ity char­ac­ter­i­za­tion for a mo­ment, and imag­ine ap­ply­ing it to a real-world sys­tem—a fi­nan­cial mar­ket, for in­stance. We have var­i­ous co­her­ence the­o­rems say­ing that agenty sys­tems must im­ple­ment Bayesian util­ity max­i­miza­tion, or else al­low ar­bi­trage. We have a strong prior that fi­nan­cial mar­kets don’t al­low ar­bi­trage (ex­cept per­haps very small ar­bi­trage on very short timescales). So, fi­nan­cial mar­kets should have a Bayesian util­ity func­tion, right? Ob­vi­ous next step: pick an ac­tual mar­ket and try to figure out its world-model and util­ity func­tion.

I tried this, and it didn’t work. Turns out mar­kets don’t have a util­ity func­tion, in gen­eral (in this con­text, it’s called a “rep­re­sen­ta­tive agent”).

Ok, but mar­kets are still in­ex­ploitable and still seem agenty, so where did it go wrong? Can we gen­er­al­ize Bayesian util­ity to char­ac­ter­ize sys­tems which are agenty like mar­kets? This was the line of in­quiry which led to “Why Subagents?”. The up­shot: for sys­tems with in­ter­nal state (in­clud­ing mar­kets), the stan­dard util­ity max­i­miza­tion char­ac­ter­i­za­tion gen­er­al­izes to a multi-agent com­mit­tee char­ac­ter­i­za­tion.

This is an ex­am­ple of a gen­eral strat­egy:

  • Start with some char­ac­ter­i­za­tion of agency—don’t worry if it’s not perfect yet

  • Ap­ply it to a real-world agenty sys­tem—speci­fi­cally, try to back out the char­ac­ter­iz­ing prop­er­ties, e.g. the prob­a­bil­is­tic world-model and util­ity func­tion in the case of a Bayesian util­ity characterization

  • If suc­cess­ful, great! We’ve gained a use­ful the­o­ret­i­cal tool for an in­ter­est­ing real-world sys­tem.

  • If un­suc­cess­ful, first check whether the failure cor­re­sponds to a situ­a­tion where the sys­tem ac­tu­ally doesn’t act very agenty—if so, then that ac­tu­ally sup­ports our char­ac­ter­i­za­tion of agency, and again tells us some­thing in­ter­est­ing about a real-world sys­tem.

  • Other­wise, we’ve found a real-world case where our char­ac­ter­i­za­tion of agency fails. Look at the sys­tem’s ac­tual in­ter­nal be­hav­ior to see where it differs from the as­sump­tions of our char­ac­ter­i­za­tion, and then gen­er­al­ize the char­ac­ter­i­za­tion to han­dle this kind of sys­tem.

Note that the last step, gen­er­al­iz­ing the char­ac­ter­i­za­tion, still needs to main­tain the struc­ture of a char­ac­ter­i­za­tion of agency. For ex­am­ple, prospect the­ory does a fine job pre­dict­ing the choices of hu­mans, but it isn’t a gen­eral char­ac­ter­i­za­tion of effec­tive goal-seek­ing be­hav­ior. There’s no rea­son to ex­pect prospect-the­ory-like be­hav­ior to be uni­ver­sal for effec­tive goal-seek­ing sys­tems. The co­her­ence the­o­rems of Bayesian util­ity, on the other hand, provide fairly gen­eral con­di­tions un­der which Bayesian in­duc­tion and ex­pected util­ity max­i­miza­tion are an op­ti­mal goal-seek­ing strat­egy—and there­fore “uni­ver­sal”, at least within the con­di­tions as­sumed. Although the Bayesian util­ity frame­work is in­com­plete at best, that’s still the kind of thing we’re look­ing for: a char­ac­ter­i­za­tion which should ap­ply to all effec­tive goal-seek­ing sys­tems.

Some ex­am­ples of (hy­po­thet­i­cal) pro­jects which fol­low this gen­eral strat­egy:

  • Look up the ki­netic equa­tions gov­ern­ing chemo­taxis in e-coli. Either ex­tract an ap­prox­i­mate prob­a­bil­is­tic world-model and util­ity func­tion from the equa­tions, find a sub­op­ti­mal­ity in the bac­te­ria’s be­hav­ior, or iden­tify a loop­hole and ex­pand the char­ac­ter­i­za­tion of agency.

  • Pick a fi­nan­cial mar­ket. Us­ing what­ever data you can ob­tain, ei­ther ex­tract (not nec­es­sar­ily unique) util­ity func­tions and world mod­els of the com­po­nent agents, find an ar­bi­trage op­por­tu­nity, or iden­tify a new loop­hole and ex­pand the char­ac­ter­i­za­tion of agency.

  • Start with the weights from a neu­ral net­work trained on a task in the ope­nai gym. Either ex­tract a prob­a­bil­is­tic world model and util­ity func­tion from the weights, find a strat­egy which dom­i­nates the NN’s strat­egy, or iden­tify a loop­hole and ex­pand the char­ac­ter­i­za­tion of agency

… and so forth.

Why Would We Want to Do This?

Char­ac­ter­i­za­tion of real-world agenty sys­tems has a lot of ad­van­tages as a gen­eral re­search strat­egy.

First and fore­most: when work­ing on math­e­mat­i­cal the­ory, it’s easy to get lost in ab­strac­tion and lose con­tact with the real world. One can end up push­ing sym­bols around ad nau­seum, with­out any idea which way is for­ward. The eas­iest counter to this failure mode is to stay grounded in real-world ap­pli­ca­tions. Just as a ra­tio­nal­ist lets re­al­ity guide be­liefs, a the­o­rist lets the prob­lems, prop­er­ties and in­tu­itions of the real world guide the the­ory.

Se­cond, when at­tempt­ing to char­ac­ter­ize real-world agenty sys­tems, one is very likely to make some kind of for­ward progress. If the char­ac­ter­i­za­tion works, then we’ve learned some­thing use­ful about an in­ter­est­ing real-world sys­tem. If it fails, then we’ve iden­ti­fied a hole in our char­ac­ter­i­za­tion of agency—and we have an ex­am­ple on hand to guide the con­struc­tion of a new char­ac­ter­i­za­tion.

Third, char­ac­ter­i­za­tion of real-world agenty sys­tems is di­rectly rele­vant to al­ign­ment: the al­ign­ment prob­lem it­self ba­si­cally amounts to char­ac­ter­iz­ing the wants and on­tolo­gies of hu­mans. This isn’t the only prob­lem rele­vant to FAI—tiling and sta­bil­ity and sub­agent al­ign­ment and the like are sep­a­rate—but it is ba­si­cally the whole “al­ign­ment with hu­mans” part. Char­ac­ter­iz­ing e.g. the wants and on­tol­ogy of an e-coli seems like a nat­u­ral step­ping-stone.

One could ob­ject that real-world agenty sys­tems lack some prop­er­ties which are cru­cial to the de­sign of al­igned AGI—most no­tably re­flec­tion and planned self-mod­ifi­ca­tion. A the­ory de­vel­oped by look­ing only at real-world agents will there­fore likely be in­com­plete. On the other hand, you don’t figure out gen­eral rel­a­tivity with­out figur­ing out New­to­nian grav­i­ta­tion first. Our un­der­stand­ing of agency is cur­rently so woe­fully poor that we don’t even un­der­stand real-world sys­tems, so we might as well start with that and reap all the ad­van­tages listed above. Once that’s figured out, we should ex­pect it to pave the way to the fi­nal the­ory: just as gen­eral rel­a­tivity has to re­pro­duce New­to­nian grav­ity in the limit of low speed and low en­ergy, more ad­vanced char­ac­ter­i­za­tions of agency should re­pro­duce more ba­sic char­ac­ter­i­za­tions un­der the ap­pro­pri­ate con­di­tions. The sub­agents char­ac­ter­i­za­tion, for ex­am­ple, re­pro­duces the util­ity char­ac­ter­i­za­tion in cases where the agenty sys­tem has no in­ter­nal state. It all adds up to nor­mal­ity—new the­o­ries must be con­sis­tent with the old, at least to the ex­tent that the old the­o­ries work.

Fi­nally, a note on rel­a­tive ad­van­tage. As a strat­egy, char­ac­ter­iz­ing real-world agenty sys­tems leans heav­ily on do­main knowl­edge in ar­eas like biol­ogy, ma­chine learn­ing, eco­nomics, and neu­ro­science/​psy­chol­ogy, along with the math in­volved in any agency re­search. That’s a pretty large pareto skill fron­tier, and I’d bet that it’s pretty un­der­ex­plored. That means there’s a lot of op­por­tu­nity for new, large con­tri­bu­tions to the the­ory, if you have the do­main knowl­edge or are will­ing to put in the effort to ac­quire it.