Reducing Agents: When abstractions break

Epistemic Effort: A month of dwelling on this idea, 12-16 hours of writ­ing to ex­plore the idea, and 2-5 hours reread­ing old LW stuff.

In the past few months, I’ve been notic­ing more things that lead me to be­lieve there’s some­thing in­com­plete about how I think about be­liefs, mo­tives, and agents. There’s been one too many in­stances of me won­der­ing, “Yeah, but but what do you re­ally be­lieve?” or “Is that what you re­ally want?”

This post is the first in a se­ries where I’m go­ing to ap­ply More Dakka to a lot of Less­wrong ideas I was already fa­mil­iar with, but hadn’t quite con­nected the dots on.

Here are the main points:

  1. Agents are an ab­strac­tion for mak­ing pre­dic­tions more quickly.

  2. In what con­texts does this ab­strac­tion break down?

  3. What model should be used in places where it does break down?

Rele­vant LW posts (will also be linked through­out)

Re­duc­tion­ism 101

Blue Min­i­miz­ing Robot

Adap­ta­tion-Ex­ecu­tors, Not Fit­ness-Maximizers


Ab­strac­tion is awe­some. Be­ing able to make qual­ity high level ab­strac­tions is like a su­per power that bends the uni­verse to your will. Think of an ab­strac­tion as a model. A given ab­strac­tion has its on­tolog­i­cally ba­sic build­ing blocks defined, as well as the rules that gov­ern their in­ter­ac­tions.

Imag­ine a uni­verse where 2x6 Lego bricks are the base level of re­al­ity. They con­nect to each other just like they do in our uni­verse, but they do so ax­io­mat­i­cally, not be­cause of fric­tion or any­thing. In this uni­verse, we might make a higher level ab­strac­tion by defin­ing a hand­ful of multi-brick struc­tures to be on­tolog­i­cally ba­sic. You lose some of the re­s­olu­tion of hav­ing 2X6’s as your ba­sis, but it also doesn’t take as long to make large things.

That’s the fun­da­men­tal trade-off of ab­strac­tions. Each time you hop up a layer, you lose some de­tail (and thus, model ac­cu­racy) but you gain sim­plic­ity in com­pu­ta­tion. You could talk about a com­plex sys­tem like a com­puter on the quark level, but the com­pu­ta­tional time would be silly. Same for the atom or molecule layer. There’s a sparkle of hope when you get to the level of tran­sis­tor be­ing ba­sic. Now you can quickly talk about tons of cool stuff, but talk­ing about an en­tire com­puter is still out of your reach. Hop up to logic gates. Hop up to ba­sic com­po­nents like ad­ders, mul­ti­plex­ers, and flip-flops. Now we’ve reached a level where you could ac­tu­ally de­sign a use­ful piece of hard­ware that does some­thing. Hop up to reg­isters and ALU’s. Make the awe­some leap to hav­ing a 16-bit CPU that can be pro­grammed in as­sem­bly. Keep go­ing all the way up un­til it’s pos­si­ble to say, “Did you see Kevin’s skiing pho­tos on face­book?”

Each time we hopped a layer of ab­strac­tion, we de­cided to sim­plify our mod­els by ig­nor­ing some de­tails. Luck­ily for us, many brave souls have pledged their lives to study­ing the gaps be­tween lay­ers of ab­strac­tions. There’s some­one who works on how to make tran­sis­tors closer to on­tolog­i­cally ba­sic things. On the flip side, there’s some­one with the job of know­ing how tran­sis­tors re­ally work, so they can de­sign cir­cuits where it truly doesn’t mat­ter that tran­sis­tors aren’t ba­sic. The rest of us can just hang-out and work on our own level of ab­strac­tion, care free and joyful.

For some­thing like com­put­ers, lots of smart peo­ple have put lots of thought into each layer of ab­strac­tion. Not to say that com­puter en­g­ineer­ing ab­strac­tions are the best they can be, but to make the point that you don’t get good ab­strac­tions for free. Most pos­si­ble next level ab­strac­tions you could make would suck. You only get a nice ab­strac­tion when you put in hard work. And even then, ab­strac­tions are always leaky (ex­cept maybe in math where you de­clare your model to have noth­ing to do with re­al­ity)

The thing that’s re­ally nice about en­g­ineer­ing ab­strac­tions is that they are nor­mally com­pletely speci­fied. Even if you don’t know how the IEEE defines float­ing point ar­ith­metic, there is a canon­i­cal ver­sion of “what things mean”. In en­g­ineer­ing, look­ing up defi­ni­tions is of­ten a to­tally valid and use­ful ways to re­solve ar­gu­ments.

When an ab­strac­tion is un­der-speci­fied, there’s lots of wig­gle room con­cern­ing how some­thing is sup­posed to work, and ev­ery­one fills-in-the-blank with their own in­tu­ition. It’s to­tally ac­cept­able for parts of your ab­strac­tion to be un­der-speci­fied. The C pro­gram­ming lan­guage de­clares the out­come of cer­tain situ­a­tions to be un­defined. C does not spec­ify what hap­pens when you try to derefer­ence a null poin­ter. You do get prob­lems when you don’t re­al­ize that parts of your ab­strac­tion are un­der-speci­fied. Lan­guage is a great ex­am­ple of a use­ful ab­strac­tion that is un­der-speci­fied, yet to the un­trained feels com­pletely speci­fied, and that differ­ence leads to all sorts of silly ar­gu­ments.

Agents and Ghosts

I’m no his­to­rian, but from what I’ve gath­ered most philoso­phers for most of his­tory have mod­eled peo­ple as hav­ing some sort of soul. There is some ethe­real other thing which is one’s soul, and it is the source of you, your de­ci­sions, and your con­scious­ness. There is this ma­chine which is your body, and some Ghost that in­hab­its it and makes it do things.

Even though it’s less com­mon to think Ghosts are part of re­al­ity, we still model our­selves and oth­ers as hav­ing Ghosts, which isn’t the most helpful. Ghosts are so un­der-speci­fied that they shift al­most all of the ex­plana­tory bur­den to one’s in­tu­ition. Ghosts do not help ex­plain any­thing, be­cause they can stretch as much as one’s in­tu­ition can, which is a lot.

Lucky for us, peo­ple have since made bet­ter ab­strac­tions than just the ba­sic Ghost. The de­ci­sion the­ory no­tion of an Agent does a pretty good job of cap­tur­ing the im­por­tant parts of “A thing that thinks and de­cides”. Agents have be­liefs about the world, some way to value world states, some way of gen­er­at­ing ac­tions, and some way to choose be­tween them (if there are any mod­els of agents that are differ­ent let me know in the com­ments).

Again, we are well versed in re­duc­tion­ism and know that there are no agents in the ter­ri­tory. They are a use­ful ab­strac­tion which we use to pre­dict what peo­ple do. We use it all the time, and it of­ten works to great suc­cess. It seems to be a ma­jor load bear­ing ab­strac­tion in our tool kit for com­pre­hend­ing the world.

The rest of this se­ries is a sus­tained med­i­ta­tion on two ques­tions, one’s which are vi­tal to ask any­time one asks an ab­strac­tion to do a lot of work:

  1. In what con­texts does the Agent ab­strac­tions break down?

  2. When it breaks down, what model do we use in­stead?

The rest of this post is go­ing to be some primer ex­am­ples of the Agent ab­strac­tion break­ing down.

The Blue Min­i­miz­ing Robot

Re­mem­ber the Blue Min­i­miz­ing Robot? (Scott’s se­quence was a strong primer for my thoughts here)

Imag­ine a robot with a tur­ret-mounted cam­era and laser. Each mo­ment, it is pro­grammed to move for­ward a cer­tain dis­tance and perform a sweep with its cam­era. As it sweeps, the robot con­tin­u­ously an­a­lyzes the av­er­age RGB value of the pix­els in the cam­era image; if the blue com­po­nent passes a cer­tain thresh­old, the robot stops, fires its laser at the part of the world cor­re­spond­ing to the blue area in the cam­era image, and then con­tinues on its way.

It’s tempt­ing to look at that robot and go, “Aha! It’s a blue min­i­miz­ing robot.” Now you can model the robot as an agent with goals and go about mak­ing pre­dic­tions. Yet time and time again, the robot fails to achieve the goal of min­i­miz­ing blue.

In fact, there are many ways to sub­vert this robot. What if we put a lens over its cam­era which in­verts the image, so that white ap­pears as black, red as green, blue as yel­low, and so on? The robot will not shoot us with its laser to pre­vent such a vi­o­la­tion (un­less we hap­pen to be wear­ing blue clothes when we ap­proach) - its en­tire pro­gram was de­tailed in the first para­graph, and there’s noth­ing about re­sist­ing lens al­ter­a­tions. Nor will the robot cor­rect it­self and shoot only at ob­jects that ap­pear yel­low—its en­tire pro­gram was de­tailed in the first para­graph, and there’s noth­ing about cor­rect­ing its pro­gram for new lenses. The robot will con­tinue to zap ob­jects that reg­ister a blue RGB value; but now it’ll be shoot­ing at any­thing that is yel­low.

Maybe you con­clude that the robot is just a Dumb Agent™ . It wants to min­i­mize blue, but it just isn’t clever enough to figure out how. But as Scot points out, the key er­ror with such an anal­y­sis is to even model the robot as an agent in the first place. The robot’s code is all that’s needed to fully pre­dict how the robot will op­er­ate in all fu­ture sce­nar­ios. If you were in the busi­ness of an­ti­ci­pat­ing the ac­tions of such robots, you’d best for­get about try­ing to model it as an agent and just use the source code.

The Con­nect 4 VNM Robot

I’ve got a Con­nect 4 play­ing robot that beats you 37 times in a row. You con­clude it’s a robot whose goal is to win at Con­nect 4. I even let you peak at the source code, and aha! It’s ex­plic­itly en­coded as a VNM agent us­ing a mini-max al­gorithm. Clearly this can safely be mod­eled as an ex­pected util­ity max­i­mizer with the goal of whoop­ing you at con­nect 4, right?

Well, de­pends on what counts as safely. If the ICC (In­ter­na­tional Con­nect 4 Com­mit­tee) de­clares that win­ning at Con­nect 4 is ac­tu­ally defined by get­ting 5 in a row, my robot is go­ing to start los­ing games to you. Wait, but isn’t it cheat­ing to just say we are re­defin­ing what win­ning is? Okay, maybe. In­stead of re­defin­ing win­ning, let’s run in­terfer­ence. Every time my robot is about to place a piece, you block the top of the board (but only for a few sec­onds). My robot will let go of its piece, not re­al­iz­ing it never made a move. Arg! If only the robot was smart enough to wait un­til you stopped block­ing the board, then it could have achieved it’s true goal of win­ning at con­nect 4!

Ex­cept this robot doesn’t have any such goal. The robot is only code, and even though it’s do­ing a faith­ful recre­ation of a VNM agent, it’s still not a Con­nect 4 win­ning robot. Un­til you make an Agent model that is at least as com­plex as the source code, I can put the robot in a con­text where your Agent model will make an in­cor­rect pre­dic­tion.

“So what?” you might ask. What if we don’t care about ev­ery pos­si­ble con­text? Why can’t we use an Agent model and only put the robot in con­texts where we know the ab­strac­tion works? We ab­solutely can do that. We just want to make sure we never for­get that this model breaks down in cer­tain places, and we’d also like to know ex­actly where and how it will break down.

Adap­ta­tion Ex­ecu­tors, Not Fit­ness Maximisers

Things get harder when we talk about hu­mans. We can’t yet “use the source code” to make pre­dic­tions. At first glance, us­ing Agents might seem like a perfect fit. We want things, we be­lieve things, and we have in­tel­li­gence. You can even look at evolu­tion and go, “Aha! Peo­ple are fit­ness max­i­miz­ers!” But then you no­tice weird things like the fact that hu­mans eat cook­ies.

Eliezer has already tack­led that idea.

No hu­man be­ing with the de­liber­ate goal of max­i­miz­ing their alle­les’ in­clu­sive ge­netic fit­ness, would ever eat a cookie un­less they were starv­ing. But in­di­vi­d­ual or­ganisms are best thought of as adap­ta­tion-ex­ecu­tors, not fit­ness-max­i­miz­ers.

Adap­ta­tion ex­ecu­tors, not fit­ness-max­i­miz­ers.

Adap­ta­tion ex­ecu­tors, not fit­ness-max­i­miz­ers.

Adap­ta­tion ex­ecu­tors, not fit­ness-max­i­miz­ers.

Re­peat that 5 more times ev­ery morn­ing upon wak­ing, and then thrice more at night be­fore go­ing to bed. I’ve cer­tainly been mut­ter­ing it to my­self for the last month that I’ve been dwelling on this post. Even if you’ve already read the Se­quences, give that chunk an­other read through.

Re­but­tal: Maybe fit­ness isn’t the goal. Maybe we should model hu­mans as Agents who want cook­ies.

We could, but that doesn’t work ei­ther. More from Scott:

If there is a cookie in front of me and I am on a diet, I may feel an ego dys­tonic temp­ta­tion to eat the cookie—one some­one might at­tribute to the “un­con­scious”. But this isn’t a prefer­ence—there’s not some lobe of my brain try­ing to steer the uni­verse into a state where cook­ies get eaten. If there were no cookie in front of me, but a red but­ton that tele­ported one cookie from the store to my stom­ach, I would have no urge what­so­ever to press the but­ton; if there were a green but­ton that re­moved the urge to eat cook­ies, I would feel no hes­i­ta­tion in press­ing it, even though that would steer away from the state in which cook­ies get eaten. If you took the cookie away, and then dis­tracted me so I for­got all about it, when I re­mem­bered it later I wouldn’t get up­set that your ac­tion had de­creased the num­ber of cook­ies eaten by me. The urge to eat cook­ies is not sta­ble across changes of con­text, so it’s just an urge, not a prefer­ence.

Like with the blue min­i­miz­ing robot, it’s tempt­ing to re­sort to us­ing a Dumb Agent™ model. Maybe you re­ally do have a prefer­ence for cook­ies, but there is a counter-prefer­ence for stay­ing on your diet. Maybe prox­im­ity to cook­ies in­creases how much you value the cookie world-state. There are all sorts of weird ways you could spec­ify your Dumb Agent ™ to pro­duce hu­man cookie. But please, don’t.

I can’t ap­peal to “Just use the source code” any­more, but hope­fully, I’m get­ting across the point that it’s at least a lit­tle bit sus­pi­cious that we (I) want to con­form all hu­man be­hav­ior to the Agent Ab­strac­tion.

So if we aren’t agents, what are we?

Hope­fully that last sen­tence trig­gered a strong re­flex. Re­mem­ber, it’s a not a ques­tion of whether or not we are agents. We are quarks/​what­ever-is-be­low, all hail re­duc­tion­ism. We are try­ing to get a bet­ter un­der­stand­ing of when the Agent ab­strac­tion breaks down, and what al­ter­na­tive mod­els to use when things do break down.

This post’s main in­tent was to mo­ti­vate this ex­plo­ra­tion, and put to rest any fears that I am naively try­ing to ex­plain away agents, be­liefs, and mo­tives.

Next Post: What are difficult parts of in­tel­li­gence that the Agent ab­strac­tion glosses over?