Embedded Curiosities

A fi­nal word on cu­ri­os­ity, and in­tel­lec­tual puz­zles:

I de­scribed an em­bed­ded agent, Emmy, and said that I don’t un­der­stand how she eval­u­ates her op­tions, mod­els the world, mod­els her­self, or de­com­poses and solves prob­lems.

In the past, when re­searchers have talked about mo­ti­va­tions for work­ing on prob­lems like these, they’ve gen­er­ally fo­cused on the mo­ti­va­tion from AI risk. AI re­searchers want to build ma­chines that can solve prob­lems in the gen­eral-pur­pose fash­ion of a hu­man, and du­al­ism is not a re­al­is­tic frame­work for think­ing about such sys­tems. In par­tic­u­lar, it’s an ap­prox­i­ma­tion that’s es­pe­cially prone to break­ing down as AI sys­tems get smarter. When peo­ple figure out how to build gen­eral AI sys­tems, we want those re­searchers to be in a bet­ter po­si­tion to un­der­stand their sys­tems, an­a­lyze their in­ter­nal prop­er­ties, and be con­fi­dent in their fu­ture be­hav­ior.

This is the mo­ti­va­tion for most re­searchers to­day who are work­ing on things like up­date­less de­ci­sion the­ory and sub­sys­tem al­ign­ment. We care about ba­sic con­cep­tual puz­zles which we think we need to figure out in or­der to achieve con­fi­dence in fu­ture AI sys­tems, and not have to rely quite so much on brute-force search or trial and er­ror.

But the ar­gu­ments for why we may or may not need par­tic­u­lar con­cep­tual in­sights in AI are pretty long. I haven’t tried to wade into the de­tails of that de­bate here. In­stead, I’ve been dis­cussing a par­tic­u­lar set of re­search di­rec­tions as an in­tel­lec­tual puz­zle, and not as an in­stru­men­tal strat­egy.

One down­side of dis­cussing these prob­lems as in­stru­men­tal strate­gies is that it can lead to some mi­s­un­der­stand­ings about why we think this kind of work is so im­por­tant. With the “in­stru­men­tal strate­gies” lens, it’s tempt­ing to draw a di­rect line from a given re­search prob­lem to a given safety con­cern. But it’s not that I’m imag­in­ing real-world em­bed­ded sys­tems be­ing “too Bayesian” and this some­how caus­ing prob­lems, if we don’t figure out what’s wrong with cur­rent mod­els of ra­tio­nal agency. It’s cer­tainly not that I’m imag­in­ing fu­ture AI sys­tems be­ing writ­ten in sec­ond-or­der logic! In most cases, I’m not try­ing at all to draw di­rect lines be­tween re­search prob­lems and spe­cific AI failure modes.

What I’m in­stead think­ing about is this: We sure do seem to be work­ing with the wrong ba­sic con­cepts to­day when we try to think about what agency is, as seen by the fact that these con­cepts don’t trans­fer well to the more re­al­is­tic em­bed­ded frame­work.

If AI de­vel­op­ers in the fu­ture are still work­ing with these con­fused and in­com­plete ba­sic con­cepts as they try to ac­tu­ally build pow­er­ful real-world op­ti­miz­ers, that seems like a bad po­si­tion to be in. And it seems like the re­search com­mu­nity is un­likely to figure most of this out by de­fault in the course of just try­ing to de­velop more ca­pa­ble sys­tems. Evolu­tion cer­tainly figured out how to build hu­man brains with­out “un­der­stand­ing” any of this, via brute-force search.

Embed­ded agency is my way of try­ing to point at what I think is a very im­por­tant and cen­tral place where I feel con­fused, and where I think fu­ture re­searchers risk run­ning into con­fu­sions too.

There’s also a lot of ex­cel­lent AI al­ign­ment re­search that’s be­ing done with an eye to­ward more di­rect ap­pli­ca­tions; but I think of that safety re­search as hav­ing a differ­ent type sig­na­ture than the puz­zles I’ve talked about here.


In­tel­lec­tual cu­ri­os­ity isn’t the ul­ti­mate rea­son we priv­ilege these re­search di­rec­tions. But there are some prac­ti­cal ad­van­tages to ori­ent­ing to­ward re­search ques­tions from a place of cu­ri­os­ity at times, as op­posed to only ap­ply­ing the “prac­ti­cal im­pact” lens to how we think about the world.

When we ap­ply the cu­ri­os­ity lens to the world, we ori­ent to­ward the sources of con­fu­sion pre­vent­ing us from see­ing clearly; the blank spots in our map, the flaws in our lens. It en­courages re-check­ing as­sump­tions and at­tend­ing to blind spots, which is helpful as a psy­cholog­i­cal coun­ter­point to our “in­stru­men­tal strat­egy” lens—the lat­ter be­ing more vuln­er­a­ble to the urge to lean on what­ever shaky premises we have on hand so we can get to more solidity and clo­sure in our early think­ing.

Embed­ded agency is an or­ga­niz­ing theme be­hind most, if not all, of our big cu­ri­osi­ties. It seems like a cen­tral mys­tery un­der­ly­ing many con­crete difficul­ties.