Sequence introduction: non-agent and multiagent models of mind

A typ­i­cal paradigm by which peo­ple tend to think of them­selves and oth­ers is as con­se­quen­tial­ist agents: en­tities who can be use­fully mod­eled as hav­ing be­liefs and goals, who are then act­ing ac­cord­ing to their be­liefs to achieve their goals.

This is of­ten a use­ful model, but it doesn’t quite cap­ture re­al­ity. It’s a bit of a fake frame­work. Or in com­puter sci­ence terms, you might call it a leaky ab­strac­tion.

An ab­strac­tion in the com­puter sci­ence sense is a sim­plifi­ca­tion which tries to hide the un­der­ly­ing de­tails of a thing, let­ting you think in terms of the sim­plifi­ca­tion rather than the de­tails. To the ex­tent that the ab­strac­tion ac­tu­ally suc­ceeds in hid­ing the de­tails, this makes things a lot sim­pler. But some­times the ab­strac­tion in­evitably leaks, as the sim­plifi­ca­tion fails to pre­dict some of the ac­tual be­hav­ior that emerges from the de­tails; in that situ­a­tion you need to ac­tu­ally know the un­der­ly­ing de­tails, and be able to think in terms of them.

Agent-ness be­ing a leaky ab­strac­tion is not ex­actly a novel con­cept for Less Wrong; it has been touched upon sev­eral times, such as in Scott Alexan­der’s Blue-Min­i­miz­ing Robot Se­quence. At the same time, I do not think that it has been quite fully in­ter­nal­ized yet, and that many foun­da­tional posts on LW go wrong due to be­ing premised on the as­sump­tion of hu­mans be­ing agents. In fact, I would go as far as to claim that this is the biggest flaw of the origi­nal Se­quences: they were at­tempt­ing to ex­plain many failures of ra­tio­nal­ity as be­ing due to cog­ni­tive bi­ases, when in ret­ro­spect it looks like un­der­stand­ing cog­ni­tive bi­ases doesn’t ac­tu­ally make you sub­stan­tially more effec­tive. But if you are im­plic­itly mod­el­ing hu­mans as goal-di­rected agents, then cog­ni­tive bi­ases is the most nat­u­ral place for ir­ra­tional­ity to emerge from, so it makes sense to fo­cus the most on there.

Just know­ing that an ab­strac­tion leaks isn’t enough to im­prove your think­ing, how­ever. To do bet­ter, you need to know about the ac­tual un­der­ly­ing de­tails to get a bet­ter model. In this se­quence, I will aim to elab­o­rate on var­i­ous tools for think­ing about minds which look at hu­mans in more gran­u­lar de­tail than the clas­si­cal agent model does. Hope­fully, this will help us bet­ter get past the old paradigm.

One par­tic­u­lar fam­ily of mod­els that I will be dis­cussing, will be that of multi-agent the­o­ries of mind. Here the claim is not that we would liter­ally have mul­ti­ple per­son­al­ities. Rather, my ap­proach will be similar in spirit to the one in Subagents Are Not A Me­taphor:

Here’s are the parts com­pos­ing my tech­ni­cal defi­ni­tion of an agent:
1. Values
This could be any­thing from liter­ally a util­ity func­tion to highly fram­ing-de­pen­dent. De­gen­er­ate case: em­bed­ded in lookup table from world model to ac­tions.
2. World-Model
De­gen­er­ate case: state­less world model con­sist­ing of just sense in­puts.
3. Search Process
Causal de­ci­sion the­ory is a search pro­cess. “From a fixed list of ac­tions, pick the most pos­i­tively re­in­forced” is an­other. De­gen­er­ate case: lookup table from world model to ac­tions.
Note: this says a ther­mo­stat is an agent. Not figu­ra­tively an agent. Liter­ally tech­ni­cally an agent. Fea­ture not bug.

This is a model that can be ap­plied nat­u­rally to a wide range of en­tities, as seen from the fact that ther­mostats qual­ify. And the rea­son why we tend to au­to­mat­i­cally think of peo­ple—or ther­mostats—as agents, is that our brains have evolved to nat­u­rally model things in terms of this kind of an in­ten­tional stance; it’s a way of thought that comes na­tively to us.

Given that we want to learn to think about hu­mans in a new way, we should look for ways to map the new way of think­ing into a na­tive mode of thought. One of my tac­tics will be to look for parts of the mind that look like they could liter­ally be agents (as in the above tech­ni­cal defi­ni­tion of an agent), so that we can re­place our in­tu­itive one-agent model with in­tu­itive multi-agent mod­els with­out need­ing to make trade-offs be­tween in­tu­itive­ness and truth. This will still be a leaky sim­plifi­ca­tion, but hope­fully it will be a more fine-grained leaky sim­plifi­ca­tion, so that over­all we’ll be more ac­cu­rate.

My model of what I think our sub­agents looks like draws upon a num­ber of differ­ent sources, in­clud­ing neu­ro­science, psy­chother­apy and med­i­ta­tion, so in the pro­cess of sketch­ing out my model I will be cov­er­ing a num­ber of them in turn. To give you a rough idea of what I’m try­ing to do, here’s a sum­mary of some up­com­ing con­tent.

Pub­lished posts:

Book sum­mary: Con­scious­ness and the Brain. One of the fun­da­men­tal build­ing blocks of much of con­scious­ness re­search, is that of Global Workspace The­ory (GWT). This could be de­scribed as a com­po­nent of a mul­ti­a­gent model, fo­cus­ing on the way in which differ­ent agents ex­change in­for­ma­tion be­tween one an­other. One elab­o­ra­tion of GWT, which fo­cuses on how it might be im­ple­mented in the brain, is the Global Neu­ronal Workspace (GNW) model in neu­ro­science. Con­scious­ness in the Brain is a 2014 book that sum­ma­rizes some of the re­search and ba­sic ideas be­hind GNW, so sum­ma­riz­ing the main con­tent of that book looks like a good place to start our dis­cus­sion and for get­ting a neu­ro­scien­tific ground­ing be­fore we get more spec­u­la­tive.

Build­ing up to an IFS model. One the­o­ret­i­cal ap­proach for mod­el­ing hu­mans as be­ing com­posed of in­ter­act­ing parts is that of In­ter­nal Fam­ily Sys­tems. In my ex­pe­rience and that of sev­eral other peo­ple in the ra­tio­nal­ist com­mu­nity, it’s very effec­tive for this pur­pose. How­ever, hav­ing its ori­gins in ther­apy, its the­o­ret­i­cal model may seem rather un­scien­tific and woo-y. This per­son­ally put me off the the­ory for a long time, as I thought that it sounded fake, and gave me a strong sense of “my mind isn’t split into parts like that”.

In this post, I con­struct a mechanis­tic sketch of how a mind might work, draw­ing on the kinds of mechanisms that have already been demon­strated in con­tem­po­rary ma­chine learn­ing, and then end up with a model that pretty closely re­sem­bles the IFS one.

Subagents, in­tro­spec­tive aware­ness, and blend­ing. In this post, I ex­tend the model of mind that I’ve been build­ing up in pre­vi­ous posts to ex­plain some things about change blind­ness, not know­ing whether you are con­scious, for­get­ting most of your thoughts, and mis­tak­ing your thoughts and emo­tions as ob­jec­tive facts, while also con­nect­ing it with the the­ory in the med­i­ta­tion book The Mind Illu­mi­nated.

Subagents, akra­sia, and co­her­ence in hu­mans. We can roughly de­scribe co­her­ence as the prop­erty that, if you be­come aware that there ex­ists a more op­ti­mal strat­egy for achiev­ing your goals than the one that you are cur­rently ex­e­cut­ing, then you will switch to that bet­ter strat­egy. For a sub­agent the­ory of mind, we would like to have some ex­pla­na­tion of when ex­actly the sub­agents man­age to be col­lec­tively co­her­ent (that is, change their be­hav­ior to some bet­ter one), and what are the situ­a­tions in which they fail to do so.

My con­clu­sion is that we are ca­pa­ble of chang­ing our be­hav­iors on oc­ca­sions when the mind-sys­tem as a whole puts suffi­ciently high prob­a­bil­ity on the new be­hav­ior be­ing bet­ter, when the new be­hav­ior is not be­ing blocked by a par­tic­u­lar highly weighted sub­agent (such as an IFS-style pro­tec­tor) that puts high prob­a­bil­ity on it be­ing bad, and when we have enough slack in our lives for any new be­hav­iors to be eval­u­ated in the first place. Akra­sia is sub­agent dis­agree­ment about what to do.

In­te­grat­ing dis­agree­ing sub­agents. In the pre­vi­ous post, I sug­gested that akra­sia in­volves sub­agent dis­agree­ment—or in other words, differ­ent parts of the brain hav­ing differ­ing ideas on what the best course of ac­tion is. The ex­is­tence of such con­flicts raises the ques­tion, how does one re­solve them?

In this post I dis­cuss var­i­ous tech­niques which could be in­ter­preted as ways of re­solv­ing sub­agents dis­agree­ments, as well as some of the rea­sons for why this doesn’t always hap­pen.

Subagents, neu­ral Tur­ing ma­chines, thought se­lec­tion, and blindspots. In my sum­mary of Con­scious­ness and the Brain, I briefly men­tioned that one of the func­tions of con­scious­ness is to carry out ar­tifi­cial se­rial op­er­a­tions; or in other words, im­ple­ment a pro­duc­tion sys­tem (equiv­a­lent to a Tur­ing ma­chine) in the brain.

While I did not go into very much de­tail about this model in the post, I’ve used it in later ar­ti­cles. For in­stance, in Build­ing up to an In­ter­nal Fam­ily Sys­tems model, I used a toy model where differ­ent sub­agents cast votes to mod­ify the con­tents of con­scious­ness. One may con­cep­tu­al­ize this as equiv­a­lent to the pro­duc­tion sys­tem model, where differ­ent sub­agents im­ple­ment differ­ent pro­duc­tion rules which com­pete to mod­ify the con­tents of con­scious­ness.

In this post, I will flesh out the model a bit more, as well as ap­ply­ing it to a few other ex­am­ples, such as emo­tion sup­pres­sion, in­ter­nal con­flict, and blind spots.

Near-term posts (par­tially already writ­ten):

A non-mys­te­ri­ous ex­pla­na­tion of the Three Marks of Ex­is­tence. If be­ing an agent is a leaky ab­strac­tion, then one way of char­ac­ter­iz­ing in­sight med­i­ta­tion would be as a tech­nique for find­ing and star­ing at the places where the ab­strac­tion does leak. Here, I offer a model of in­sight med­i­ta­tion as a way to wit­ness some of the pro­cesses by which the ex­pe­rience of be­ing an agent is con­structed, helping dis­solve the kinds of con­fu­sions that make us think we are agents in the first place.

One way of carv­ing up the space of things that you’ll find by do­ing in­sight med­i­ta­tion is by what some Bud­dhist schools call the Three Marks of Ex­is­tence: no-self, im­per­ma­nence, and un­satis­fac­tori­ness. Here, I try to sketch out an ex­pla­na­tion of the kinds of things that these marks are point­ing to, how they un­der­lie a more ac­cu­rate model of hu­man psy­chol­ogy than the folk in­tu­ition does, and how wit­ness­ing them might be ex­pected to trans­form one’s ex­pec­ta­tions.

Farther out (sketched out but not as ex­ten­sively planned/​writ­ten yet)

The game the­ory of ra­tio­nal­ity and co­op­er­a­tion in a mul­ti­a­gent world. Multi-agent mod­els have a nat­u­ral con­nec­tion to Elephant in the Brain -style dy­nam­ics: our brains do­ing things for pur­poses of which we are un­aware. Fur­ther­more, there can be strong in­cen­tives to con­tinue sys­tem­atic self-de­cep­tion and not in­te­grate con­flict­ing be­liefs. For in­stance, if a mind has sub­agents which think that spe­cific be­liefs are dan­ger­ous to hold or ex­press, then they will work to sup­press sub­agents hold­ing that be­lief from com­ing into con­scious aware­ness.

“Danger­ous be­liefs” might be ones that touch upon poli­ti­cal top­ics, but they might also be ones of a more per­sonal na­ture. For in­stance, some­one may have an iden­tity as be­ing “good at X”, and then want to ra­tio­nal­ize away any con­tra­dic­tory ev­i­dence—in­clud­ing ev­i­dence sug­gest­ing that they were wrong on a topic re­lated to X. Or it might be some­thing even more sub­tle.

Th­ese are a few ex­am­ples of how ra­tio­nal­ity work has to hap­pen on two lev­els at once: to de­bug some be­liefs (in­di­vi­d­ual level), peo­ple need to be in a com­mu­nity where hold­ing var­i­ous kinds of be­liefs is ac­tu­ally safe (so­cial level). But in or­der for the com­mu­nity to be safe for hold­ing those be­liefs (so­cial level), peo­ple within the com­mu­nity also need to work on them­selves so as to deal with their own sub­agents that would cause them to at­tack peo­ple with the “wrong” be­liefs (in­di­vi­d­ual level). This kind of work also seems to be nec­es­sary for fix­ing “poli­tics be­ing the mind-kil­ler” and col­lab­o­rat­ing on is­sues such as ex­is­ten­tial risk across sharp value differ­ences; but the need to carry out the work on many lev­els at once makes it challeng­ing, es­pe­cially since the cur­rent en­vi­ron­ment in­cen­tivizes many (sub)agents to sab­o­tage any at­tempt at this.

(This topic area is also re­lated to that stuff Valen­tine has been say­ing about Omega.)

AI al­ign­ment and mul­ti­a­gent mod­els: sub­mind val­ues and the de­fault hu­man on­tol­ogy. In a re­cent post, Wei Dai men­tioned that “the only ap­par­ent util­ity func­tion we have seems to be defined over an on­tol­ogy very differ­ent from the fun­da­men­tal on­tol­ogy of the uni­verse”. I agree, and I think it’s worth em­pha­siz­ing that the differ­ence is not just “we tend to think in terms of clas­si­cal physics but ac­tu­ally the uni­verse runs on par­ti­cle physics”. Un­less they’ve been speci­fi­cally trained to do so, peo­ple don’t usu­ally think of their val­ues in terms of clas­si­cal physics, ei­ther. That’s some­thing that’s learned on top of the de­fault on­tol­ogy.

The on­tol­ogy that our val­ues are defined over, I think, shat­ters into a thou­sand shards of dis­parate mod­els held by differ­ent sub­agents with differ­ent pri­ori­ties. It is mostly some­thing like “pre­dic­tions of re­ceiv­ing sen­sory data that has been pre­vi­ously clas­sified as good or bad, the pre­dic­tions formed on the ba­sis of do­ing pat­tern match­ing to past streams of sen­sory data”. Things like e.g. in­tu­itive physics simu­la­tors feed into these pre­dic­tions, but I sus­pect that even in­tu­itive physics is not the on­tol­ogy over which our val­ues are defined; clusters of sen­sory ex­pe­riences are that on­tol­ogy, with in­tu­itive physics be­ing a tool for pre­dict­ing how to get those ex­pe­riences. This is the same sense in which you might e.g. use your knowl­edge of so­cial dy­nam­ics to figure out how to get into situ­a­tions which have made you feel loved in the past, but your knowl­edge of so­cial dy­nam­ics is not the same thing as the ex­pe­rience of be­ing loved.


This se­quence is part of re­search done for, and sup­ported by, the Foun­da­tional Re­search In­sti­tute.