Abstraction, Evolution and Gears

Meta: this pro­ject is wrap­ping up for now. This is the sec­ond of prob­a­bly sev­eral posts dump­ing my thought-state as of this week.

It is an em­piri­cal fact that we can pre­dict the day-to-day be­hav­ior of the world around us—po­si­tions of trees or build­ings, tra­jec­to­ries of birds or cars, color of the sky and ground, etc—with­out wor­ry­ing about the de­tails of plas­mas roiling in any par­tic­u­lar far-away star. We can pre­dict the be­hav­ior of a dog with­out hav­ing to worry about po­si­tions of in­di­vi­d­ual molecules in its cells. We can pre­dict the be­hav­ior of re­in­forced con­crete with­out hav­ing to check it un­der a micro­scope or ac­count for the flaps of but­terfly wings a thou­sand kilo­me­ters away.

Our uni­verse ab­stracts well: it de­com­poses into high-level ob­jects whose in­ter­nal de­tails are ap­prox­i­mately in­de­pen­dent of far-away ob­jects, given all of their high-level sum­mary in­for­ma­tion.

It didn’t have to be this way. We could imag­ine a uni­verse which looks like a cryp­to­graphic hash func­tion, where most bits are tightly en­tan­gled with most other bits and any pre­dic­tion of any­thing re­quires near-perfect knowl­edge of the whole sys­tem state. But em­piri­cally, our uni­verse does not look like that.

Given that we live in a uni­verse amenable to ab­strac­tion, what sorts of agents should we ex­pect to evolve? What can we say about agency struc­ture and be­hav­ior in such a uni­verse? This post comes at the ques­tion from a few differ­ent an­gles, look­ing at differ­ent prop­er­ties I ex­pect evolved agents to dis­play in ab­strac­tion-friendly uni­verses.

Con­ver­gent In­stru­men­tal Goals

The ba­sic idea of ab­strac­tion is that any vari­able X is sur­rounded by lots of noisy un­ob­served vari­ables, which me­di­ate its in­ter­ac­tions with the rest of the uni­verse. Any­thing “far away” from X—i.e. any­thing out­side of those noisy in­ter­me­di­ates—can only “see” some ab­stract sum­mary in­for­ma­tion f(X). Any­thing more than a few microns from a tran­sis­tor on a CPU will only be sen­si­tive to the tran­sis­tor’s on/​off state, not its ex­act voltage; the grav­i­ta­tional forces on far-apart stars de­pend only on their to­tal mass, mo­men­tum and po­si­tion, not on the roiling of plas­mas.

One con­se­quence: if an agent’s goals do not ex­plic­itly in­volve things close to X, then the agent cares only about con­trol­ling f(X). If an agent does not ex­plic­itly care about ex­act voltages on a CPU, then it will care only about con­trol­ling the bi­nary states (and ul­ti­mately, the out­put of the com­pu­ta­tion). If an agent does not ex­plic­itly care about plas­mas in far-away stars, then it will care only about the to­tal mass, mo­men­tum and po­si­tion of those stars. This holds for any goal which does not ex­plic­itly care about the low-level de­tails of X or the things nearby X.

Noisy in­ter­me­di­ates Z mask all in­for­ma­tion about X ex­cept the sum­mary f(X). So, if an agent’s ob­jec­tive only ex­plic­itly de­pends on far-away vari­ables Y, then the agent only wants to con­trol f(X), not nec­es­sar­ily all of X.

This sounds like in­stru­men­tal con­ver­gence: any goal which does not ex­plic­itly care about things near X it­self will care only about con­trol­ling f(X), not all of X. Agents with differ­ent goals will com­pete to con­trol the same things: high-level be­hav­iors f(X), es­pe­cially those with far-reach­ing effects.

Nat­u­ral next ques­tion: does all in­stru­men­tal con­ver­gence work this way?

Typ­i­cal in­tu­ition for in­stru­men­tal con­ver­gence is some­thing like “well, hav­ing lots of re­sources in­creases one’s ac­tion space, so a wide va­ri­ety of agents will try to ac­quire re­sources in or­der to in­crease their ac­tion space”. Re-word­ing that as an ab­strac­tion ar­gu­ment: “an agent’s ac­cessible ac­tion space ‘far away’ from now (i.e. far in the fu­ture) de­pends mainly on what re­sources it ac­quires, and is oth­er­wise mostly in­de­pen­dent of spe­cific choices made right now”.

That may sound sur­pris­ing at first, but imag­ine a strate­gic video game (I pic­ture Star­craft). There’s a finite world-map, so over a long-ish time hori­zon I can get my units wher­ever I want them; their ex­act po­si­tions don’t mat­ter to my long-term ac­tion space. Like­wise, I can always tear down my build­ings and repo­si­tion them some­where else; that’s not free, but the long-term effect of such ac­tions is just hav­ing less re­sources. Similarly, on a long time hori­zon, I can build/​lose what­ever units I want, at the cost of re­sources. It’s ul­ti­mately just the re­sources which re­strict my ac­tion space, over a long time hori­zon.

(More gen­er­ally, I think me­di­at­ing-long-term-ac­tion-space is part of how we in­tu­itively de­cide what to call “re­sources” in the first place.)

Com­ing from a differ­ent an­gle, we could com­pare to TurnTrout’s for­mu­la­tion of con­ver­gent in­stru­men­tal goals in MDPs. Those re­sults are similar to the ar­gu­ment above in that agents tend to pur­sue states which max­i­mize their long-term ac­tion space. We could for­mally define an ab­strac­tion on MDPs in which X is the cur­rent state, and f(X) sum­ma­rizes the in­for­ma­tion about the cur­rent state rele­vant to the far-fu­ture ac­tion space. In other words, two states X with the same long-run ac­tion space will have the same f(X). “Power”, as TurnTrout defined it, would be an in­creas­ing func­tion of f(X) - larger long-run ac­tion spaces mean more power. Pre­sum­ably agents would tend to seek states with large f(X).


Fun fact: biolog­i­cal sys­tems are highly mod­u­lar, at mul­ti­ple differ­ent scales. This can be quan­tified and ver­ified statis­ti­cally, e.g. by map­ping out pro­tein net­works and al­gorith­mi­cally par­ti­tion­ing them into parts, then com­par­ing the con­nec­tivity of the parts. It can also be seen more qual­i­ta­tively in ev­ery­day biolog­i­cal work: pro­teins have sub­units which re­tain their func­tion when fused to other pro­teins, re­cep­tor cir­cuits can be swapped out to make bac­te­ria fol­low differ­ent chem­i­cal gra­di­ents, ma­nipu­lat­ing spe­cific genes can turn a fly’s an­ten­nae into legs, or­gans perform spe­cific func­tions, etc, etc.

One lead­ing the­ory for why mod­u­lar­ity evolves is “mod­u­larly vary­ing goals”: es­sen­tially, mod­u­lar­ity in the or­ganism evolves to match mod­u­lar re­quire­ments from the en­vi­ron­ment. For in­stance, an­i­mals need to breathe, eat, move, and re­pro­duce. A new en­vi­ron­ment might have differ­ent food or re­quire differ­ent mo­tions, in­de­pen­dent of res­pi­ra­tion or re­pro­duc­tion—or vice versa. Since these re­quire­ments vary more-or-less in­de­pen­dently in the en­vi­ron­ment, an­i­mals evolve mod­u­lar sys­tems to deal with them: di­ges­tive tract, lungs, etc. This has been tested in sim­ple simu­lated evolu­tion ex­per­i­ments, and it works.

In short: mod­u­lar­ity of the or­ganism evolves to match mod­u­lar­ity of the en­vi­ron­ment.

… and mod­u­lar­ity of the en­vi­ron­ment is es­sen­tially ab­strac­tion-friendli­ness. The idea of ab­strac­tion is that the en­vi­ron­ment con­sists of high-level com­po­nents whose low-level struc­ture is in­de­pen­dent (given the high-level sum­maries) for any far-apart com­po­nents. That’s mod­u­lar­ity.

Com­ing from an en­tirely differ­ent di­rec­tion, we could talk about the good reg­u­la­tor the­o­rem from con­trol the­ory: any reg­u­la­tor of a sys­tem which is max­i­mally suc­cess­ful and sim­ple must be iso­mor­phic to the sys­tem it­self. Again, this sug­gests that mod­u­lar en­vi­ron­ments should evolve mod­u­lar “reg­u­la­tors”, e.g. or­ganisms or agents.

I ex­pect that the right for­mal­iza­tion of these ideas would yield a the­o­rem say­ing that evolu­tion in ab­strac­tion-friendly en­vi­ron­ments tends to pro­duce mod­u­lar­ity re­flect­ing the mod­u­lar struc­ture of the en­vi­ron­ment. Or, to put it differ­ently: evolu­tion in ab­strac­tion-friendly en­vi­ron­ments tends to pro­duce (im­plicit) world-mod­els whose struc­ture matches the struc­ture of the world.


Fi­nally, we can ask what hap­pens when one mod­u­lar com­po­nent of the world is it­self an evolved agent mod­el­ling the world. What would we ex­pect this agent’s model of it­self to look like?

I don’t have much to say yet about what this would look like, but it would be very use­ful to have. It would give us a grounded, em­piri­cally-testable out­side-view cor­rect­ness crite­rion for things like em­bed­ded world mod­els and em­bed­ded de­ci­sion the­ory. Ul­ti­mately, I hope that it will get at Scott’s open ques­tion “Does agent-like be­hav­ior im­ply agent-like ar­chi­tec­ture?”, at least for evolved agents speci­fi­cally.