Embed­ded Curiosities

A fi­nal word on curi­os­ity, and in­tel­lec­tual puzzles:

I de­scribed an em­bed­ded agent, Emmy, and said that I don’t un­der­stand how she eval­u­ates her op­tions, mod­els the world, mod­els her­self, or de­com­poses and solves prob­lems.

In the past, when re­search­ers have talked about mo­tiv­a­tions for work­ing on prob­lems like these, they’ve gen­er­ally fo­cused on the mo­tiv­a­tion from AI risk. AI re­search­ers want to build ma­chines that can solve prob­lems in the gen­eral-pur­pose fash­ion of a hu­man, and du­al­ism is not a real­istic frame­work for think­ing about such sys­tems. In par­tic­u­lar, it’s an ap­prox­im­a­tion that’s es­pe­cially prone to break­ing down as AI sys­tems get smarter. When people fig­ure out how to build gen­eral AI sys­tems, we want those re­search­ers to be in a bet­ter po­s­i­tion to un­der­stand their sys­tems, ana­lyze their in­ternal prop­er­ties, and be con­fid­ent in their fu­ture be­ha­vior.

This is the mo­tiv­a­tion for most re­search­ers today who are work­ing on things like up­date­less de­cision the­ory and sub­sys­tem align­ment. We care about ba­sic con­cep­tual puzzles which we think we need to fig­ure out in or­der to achieve con­fid­ence in fu­ture AI sys­tems, and not have to rely quite so much on brute-force search or trial and er­ror.

But the ar­gu­ments for why we may or may not need par­tic­u­lar con­cep­tual in­sights in AI are pretty long. I haven’t tried to wade into the de­tails of that de­bate here. In­stead, I’ve been dis­cuss­ing a par­tic­u­lar set of re­search dir­ec­tions as an in­tel­lec­tual puzzle, and not as an in­stru­mental strategy.

One down­side of dis­cuss­ing these prob­lems as in­stru­mental strategies is that it can lead to some mis­un­der­stand­ings about why we think this kind of work is so im­port­ant. With the “in­stru­mental strategies” lens, it’s tempt­ing to draw a dir­ect line from a given re­search prob­lem to a given safety con­cern. But it’s not that I’m ima­gin­ing real-world em­bed­ded sys­tems be­ing “too Bayesian” and this some­how caus­ing prob­lems, if we don’t fig­ure out what’s wrong with cur­rent mod­els of ra­tional agency. It’s cer­tainly not that I’m ima­gin­ing fu­ture AI sys­tems be­ing writ­ten in second-or­der lo­gic! In most cases, I’m not try­ing at all to draw dir­ect lines between re­search prob­lems and spe­cific AI fail­ure modes.

What I’m in­stead think­ing about is this: We sure do seem to be work­ing with the wrong ba­sic con­cepts today when we try to think about what agency is, as seen by the fact that these con­cepts don’t trans­fer well to the more real­istic em­bed­ded frame­work.

If AI de­velopers in the fu­ture are still work­ing with these con­fused and in­com­plete ba­sic con­cepts as they try to ac­tu­ally build power­ful real-world op­tim­izers, that seems like a bad po­s­i­tion to be in. And it seems like the re­search com­munity is un­likely to fig­ure most of this out by de­fault in the course of just try­ing to de­velop more cap­able sys­tems. Evolu­tion cer­tainly figured out how to build hu­man brains without “un­der­stand­ing” any of this, via brute-force search.

Embed­ded agency is my way of try­ing to point at what I think is a very im­port­ant and cent­ral place where I feel con­fused, and where I think fu­ture re­search­ers risk run­ning into con­fu­sions too.

There’s also a lot of ex­cel­lent AI align­ment re­search that’s be­ing done with an eye to­ward more dir­ect ap­plic­a­tions; but I think of that safety re­search as hav­ing a dif­fer­ent type sig­na­ture than the puzzles I’ve talked about here.

In­tel­lec­tual curi­os­ity isn’t the ul­ti­mate reason we priv­ilege these re­search dir­ec­tions. But there are some prac­tical ad­vant­ages to ori­ent­ing to­ward re­search ques­tions from a place of curi­os­ity at times, as op­posed to only ap­ply­ing the “prac­tical im­pact” lens to how we think about the world.

When we ap­ply the curi­os­ity lens to the world, we ori­ent to­ward the sources of con­fu­sion pre­vent­ing us from see­ing clearly; the blank spots in our map, the flaws in our lens. It en­cour­ages re-check­ing as­sump­tions and at­tend­ing to blind spots, which is help­ful as a psy­cho­lo­gical coun­ter­point to our “in­stru­mental strategy” lens—the lat­ter be­ing more vul­ner­able to the urge to lean on whatever shaky premises we have on hand so we can get to more solid­ity and clos­ure in our early think­ing.

Embed­ded agency is an or­gan­iz­ing theme be­hind most, if not all, of our big curi­os­it­ies. It seems like a cent­ral mys­tery un­der­ly­ing many con­crete dif­fi­culties.