Integrating disagreeing subagents

In my pre­vi­ous post, I sug­gested that akra­sia in­volves sub­agent dis­agree­ment—or in other words, differ­ent parts of the brain hav­ing differ­ing ideas on what the best course of ac­tion is. The ex­is­tence of such con­flicts raises the ques­tion, how does one re­solve them?

In this post I will dis­cuss var­i­ous tech­niques which could be in­ter­preted as ways of re­solv­ing sub­agents dis­agree­ments, as well as some of the rea­sons for why this doesn’t always hap­pen.

A word on in­ter­pret­ing “sub­agents”

The frame that I’ve had so far is that of the brain be­ing com­posed of differ­ent sub­agents with con­flict­ing be­liefs. On the other hand, one could ar­gue that the sub­agent in­ter­pre­ta­tion isn’t strictly nec­es­sary for many of the ex­am­ples that I bring up in this post. One could just as well view my ex­am­ples as talk­ing about a sin­gle agent with con­flict­ing be­liefs.

The dis­tinc­tion be­tween these two frames isn’t always en­tirely clear. In “Com­plex Be­hav­ior from Sim­ple (Sub)Agents”, mor­di­na­mael pre­sents a toy model where an agent has differ­ent goals. Mov­ing to differ­ent lo­ca­tions will satisfy the differ­ent goals to a vary­ing ex­tent. The agent will gen­er­ate a list of pos­si­ble moves and picks the move which will bring some goal the clos­est to be­ing satis­fied.

Is this a unified agent, or one made up of sev­eral sub­agents?

One could ar­gue for ei­ther in­ter­pre­ta­tion. On the other hand, mor­di­na­mael’s post frames the goals as sub­agents, and they are in a sense com­pet­ing with each other. On the other hand, the sub­agents ar­guably don’t make the fi­nal de­ci­sion them­selves: they just re­port ex­pected out­comes, and then a cen­tral mechanism picks a move based on their re­ports.

This re­sem­bles the neu­ro­science model I dis­cussed in my last post, where differ­ent sub­sys­tems in the brain sub­mit var­i­ous ac­tion “bids” to the basal gan­glia. Var­i­ous mechanisms then pick a win­ning bid based on var­i­ous crite­ria—such as how rele­vant the sub­sys­tem’s con­cerns are for the cur­rent situ­a­tion, and how ac­cu­rate the differ­ent sub­sys­tems have his­tor­i­cally been in their pre­dic­tions.

Like­wise, in ex­tend­ing the model from Con­scious­ness and the Brain for my toy ver­sion of the In­ter­nal Fam­ily Sys­tems model, I pos­tu­lated a sys­tem where var­i­ous sub­agents vote for differ­ent ob­jects to be­come the con­tent of con­scious­ness. In that model, the win­ner was de­ter­mined by a sys­tem which ad­justed the vote weights of the differ­ent sub­agents based on var­i­ous fac­tors.

So, sub­agents, or just an agent with differ­ent goals?

Here I would draw an anal­ogy to par­li­a­men­tary de­ci­sion-mak­ing. In a sense, a par­li­a­ment as a whole is an agent. Var­i­ous mem­bers of par­li­a­ment cast their votes, with “the vot­ing sys­tem” then “mak­ing the fi­nal choice” based on the votes that have been cast. That re­flects the over­all judg­ment of the par­li­a­ment as a whole. On the other hand, for un­der­stand­ing and pre­dict­ing how the par­li­a­ment will ac­tu­ally vote in differ­ent situ­a­tions, it is im­por­tant to model how the in­di­vi­d­ual MPs in­fluence and bro­ker deals with each other.

Like­wise, the sub­agent frame seems most use­ful when a per­son’s goals in­ter­act in such a way that ap­ply­ing the in­ten­tional stance—think­ing in terms of the be­liefs and goals of the in­di­vi­d­ual sub­agents—is use­ful for mod­el­ing the over­all in­ter­ac­tions of the sub­agents.

For ex­am­ple, in my toy In­ter­nal Fam­ily Sys­tems model, I noted that re­in­force­ment learn­ing sub­agents might end up form­ing some­thing like al­li­ances. Sup­pose that a robot has a choice be­tween mak­ing cook­ies, pok­ing its finger at a hot stove, or day­dream­ing. It has three sub­agents: “cook” wants the robot to make cook­ies, “masochist” wants to poke the robot’s finger at the stove, and “safety” wants the robot to not poke its finger at the stove.

By de­fault, “safety” is in­differ­ent be­tween “make cook­ies” and “day­dream”, and might cast its votes at ran­dom. But when it votes for “make cook­ies”, then that tends to avert “poke at stove” more re­li­ably than vot­ing for “day­dream” does, as “make cook­ies” is also be­ing voted for by “cook”. Thus its ten­dency to vote for “make cook­ies” in this situ­a­tion gets re­in­forced.

We can now ap­ply the in­ten­tional stance to this situ­a­tion, and say that “safety” has “formed an al­li­ance” with “cook”, as it cor­rectly “be­lieves” that this will avert masochis­tic ac­tions. If the sub­agents are also aware of each other and can pre­dict each other’s ac­tions, then the in­ten­tional stance gets even more use­ful.

Of course, we could just as well ap­ply the purely mechanis­tic ex­pla­na­tion and end up with the same pre­dic­tions. But the in­ten­tional ex­pla­na­tion of­ten seems eas­ier for hu­mans to rea­son with, and helps high­light salient con­sid­er­a­tions.

In­te­grat­ing be­liefs, nat­u­rally or with techniques

In any case, re­gard­less of whether we are talk­ing about sub­agents with con­flict­ing be­liefs or just con­flict­ing goals, it still seems like many of our prob­lems arise from some kind of in­ter­nal dis­agree­ment. I will use the term “in­te­gra­tion” for any­thing that acts to re­solve such con­flicts, and dis­cuss a few ex­am­ples of things which can be use­fully thought of as in­te­gra­tion.

In these ex­am­ples, I am again go­ing to rely on the ba­sic ob­ser­va­tion from Con­scious­ness and the Brain: that when some sub­sys­tem in the brain man­ages to ele­vate a men­tal ob­ject into the con­tent of con­scious­ness, mul­ti­ple sub­sys­tems will syn­chro­nize their pro­cess­ing around that ob­ject. As­sum­ing that the con­di­tions are right, this will al­low for the in­te­gra­tion of oth­er­wise con­flict­ing be­liefs or be­hav­iors.

Why do we need to ex­plic­itly in­te­grate be­liefs, rather than this hap­pen­ing au­to­mat­i­cally? One an­swer is that try­ing to in­te­grate all be­liefs would be in­fea­si­ble; as CronoDAS notes:

GEB has a sec­tion on this.
In or­der to not com­part­men­tal­ize, you need to test if your be­liefs are all con­sis­tent with each other. If your be­liefs are all state­ments in propo­si­tional logic, con­sis­tency check­ing be­comes the Boolean Satis­fi­a­bil­ity Prob­lem, which is NP-com­plete. If your be­liefs are state­ments in pred­i­cate logic, then con­sis­tency check­ing be­comes PSPACE-com­plete, which is even worse than NP-com­plete.

Rather than try to con­stantly in­te­grate ev­ery pos­si­ble be­lief and be­hav­ior, the brain will rather try to in­te­grate be­liefs at times when it no­tices con­tra­dic­tions. Of course, some­times we do re­al­ize that there are con­tra­dic­tions, but still don’t au­to­mat­i­cally in­te­grate the sub­agents. Then we can use var­i­ous tech­niques for mak­ing in­te­gra­tion more effec­tive. How come in­te­gra­tion isn’t more au­to­matic?

One rea­son is that in­te­gra­tion re­quires the right con­di­tions, and while the brain has mechanisms for get­ting those con­di­tions right, in­te­gra­tion is still a non­triv­ial skill. As an anal­ogy, most chil­dren learn the ba­sics of talk­ing and run­ning on their own, but they can still ex­plic­itly study rhetoric or run­ning tech­niques to boost their na­tive com­pe­ten­cies far above their start­ing level. Like­wise, ev­ery­one na­tively does some in­te­gra­tion on their own, but peo­ple can also use ex­plicit tech­niques which make them much bet­ter at it.

Re­sist­ing be­lief integration

Lack of skill isn’t the full an­swer for why we don’t always au­to­mat­i­cally up­date, how­ever. Some­times it seems as if the mind ac­tively re­sists up­dat­ing.

One of the is­sues that com­monly comes up in In­ter­nal Fam­ily Sys­tems ther­apy is that parts of the mind want to keep some old be­lief frozen, be­cause if it were known, it would change the per­son’s be­hav­ior in an un­de­sired way. For ex­am­ple, if some­one be­lieves that they have a good rea­son not to aban­don their friend, then a part of the mind which val­ues not aban­don­ing the friend in ques­tion might re­sist hav­ing this be­lief re-reval­u­ated. The part may then need to be con­vinced that know­ing the truth only leaves opens the op­tion of aban­don­ing the friend, it doesn’t com­pel it.

Note that this isn’t nec­es­sar­ily true. If there are other sub­agents which suffi­ciently strongly hold the opinion that the friend should be aban­doned, and the sub­agent-which-val­ues-the-friend is only man­ag­ing to pre­vent that by hang­ing on to a spe­cific be­lief, then read­just­ing that be­lief might re­move the only con­straint which was pre­vent­ing the anti-friend coal­i­tion from dump­ing the friend. Thus from the point of view of the sub­agent which is re­sist­ing the be­lief up­date, the up­date would com­pel an aban­don­ment of the friend. In such a situ­a­tion, ad­di­tional in­ter­nal work may be nec­es­sary be­fore the sub­agent will agree to let the be­lief re­vi­sion pro­ceed.

More gen­er­ally, sub­agents may be in­cen­tivized to re­sist be­lief up­dat­ing for at least three differ­ent rea­sons (this list is not in­tended to be ex­haus­tive):

  1. The sub­agent is try­ing to pur­sue or main­tain a goal, and pre­dicts that re­vis­ing some par­tic­u­lar be­lief would make the per­son less mo­ti­vated to pur­sue or main­tain the goal.

  2. The sub­agent is try­ing to safe­guard the per­son’s so­cial stand­ing, and pre­dicts that not un­der­stand­ing or in­te­grat­ing some­thing will be safer, give the per­son an ad­van­tage in ne­go­ti­a­tion, or be oth­er­wise so­cially benefi­cial. For in­stance, differ­ent sub­agents hold­ing con­flict­ing be­liefs al­lows a per­son to ver­bally be­lieve in one thing while still not act­ing ac­cord­ingly—even ac­tively chang­ing their ver­bal model so as to avoid falsify­ing the in­visi­ble dragon in the garage.

  3. Eval­u­at­ing a be­lief would re­quire ac­ti­vat­ing a mem­ory of a trau­matic event that the be­lief is re­lated to, and the sub­agent is try­ing to keep that mem­ory sup­pressed as part of an ex­ile-pro­tec­tor dy­namic.

Here’s an al­ter­nate way of look­ing at the is­sue, which doesn’t use the sub­agent frame. So far I have been mostly talk­ing about in­te­grat­ing be­liefs rather than goals, but hu­mans don’t seem to have a clear value/​be­lief dis­tinc­tion. As Stu­art Arm­strong dis­cusses in his mAIry’s room ar­ti­cle, for hu­mans sim­ply re­ceiv­ing sen­sory in­for­ma­tion of­ten also rewires some of their val­ues. Now, Mark Lipp­man sug­gests that try­ing to op­ti­mize a com­pli­cated net­work of be­liefs and goals means that fur­ther­ing one goal may hurt other goals, so the sys­tem needs to have checks in place to en­sure that one goal is not pur­sued in a way which dis­pro­por­tionately harms the achieve­ment of other goals.

For ex­am­ple, most peo­ple wouldn’t want to spend the rest of their lives do­ing noth­ing but shoot­ing up heroin, even if they knew for cer­tain that this max­i­mized the achieve­ment of their “ex­pe­rience plea­sure” goal. If some­one offered them the chance to ex­pe­rience just how plea­surable heroin felt like—giv­ing them more ac­cu­rate emo­tion-level pre­dic­tions of the ex­pe­rience—they might quite rea­son­ably re­fuse, as they feared that mak­ing this up­date might make them more in­clined to take heroin. Eliezer once noted that if some­one offered him a pill which simu­lated the joy of sci­en­tific dis­cov­ery, he would make sure never to take it.

Sup­pose that a sys­tem has a net­work of be­liefs and goals and it does some­thing like pre­dict­ing how var­i­ous ac­tions and their effects—not only their effects on the ex­ter­nal world, but on the be­lief/​goal net­work it­self—might in­fluence its goal achieve­ment. If it re­sists ac­tions which re­duce the prob­a­bil­ity of achiev­ing its cur­rent goals, then this might pro­duce dy­nam­ics which look like sub­agents try­ing to achieve their goals at the ex­pense of the other sub­agents.

For in­stance, Eliezer’s re­fusal to take the pill might be framed as a sub­agent valu­ing sci­en­tific dis­cov­ery try­ing to block a sub­agent valu­ing hap­piness from im­ple­ment­ing an ac­tion which would make the hap­piness sub­agent’s bids for mo­tor sys­tem ac­cess stronger. Alter­na­tively, it might be framed as the over­all sys­tem putting value on ac­tu­ally mak­ing sci­en­tific dis­cov­er­ies, and re­fus­ing to self-mod­ify in a way which it pre­dicted would hurt this goal. (You might note that this has some in­ter­est­ing similar­i­ties to things like the Cake or Death prob­lem in AI al­ign­ment.)

In any case, in­te­gra­tion is not always straight­for­ward. Even if the sys­tem does de­tect a con­flict be­tween its sub­agents, it may have a rea­son to avoid do­ing so.

Hav­ing re­viewed some po­ten­tial bar­ri­ers for in­te­gra­tion, let us move on to differ­ent ways in which con­flicts can be de­tected and in­te­grated.

Ways to in­te­grate con­flict­ing subagents

Cog­ni­tive Be­hav­ioral Therapy

Scott Alexan­der has an old post where he quotes this ex­cerpt from the cog­ni­tive be­hav­ioral ther­apy book When Panic At­tacks:

I asked Walter how he was think­ing and feel­ing about the breakup with Paul. What was he tel­ling him­self? He said “I feel in­cred­ibly guilty and ashamed, and it seems like it must have been my fault. Maybe I wasn’t skil­lful enough, at­trac­tive enough, or dy­namic enough. Maybe I wasn’t there for him emo­tion­ally. I feel like I must have screwed up. Some­times I feel like a to­tal fraud. Here I am, a mar­riage and fam­ily ther­a­pist, and my own re­la­tion­ship didn’t even work out. I feel like a loser. A re­ally, re­ally big loser.” [...]
I thought the Dou­ble Stan­dard Tech­nique might help be­cause Walter seemed to be a warm and com­pas­sion­ate in­di­vi­d­ual. I asked what he’d say to a dear friend who’d been re­jected by some­one he’d been liv­ing with for eight years. I said “Would you tell him that there’s some­thing wrong with him, that he screwed up his life and flushed it down the toi­let for good?”
Walter looked shocked and said he’d never say some­thing like that to a friend. I sug­gested we try a role-play­ing ex­er­cise so that he could tell me what he would say to a friend who was in the same predica­ment […]
Ther­a­pist (role-play­ing pa­tient’s friend): Walter, there’s an­other an­gle I haven’t told you about. What you don’t un­der­stand is that I’m im­pos­si­ble to live with and be in a re­la­tion­ship with. That’s the real rea­son I feel so bad, and that’s why I’ll be alone for the rest of my life.
Pa­tient (role-play­ing as if ther­a­pist is his friend who just had a bad breakup): Gosh, I’m sur­prised to hear you say that, be­cause I’ve known you for a long time and never felt that way about you. In fact, you’ve always been warm and open, and a loyal friend. How in the world did you come to the con­clu­sion that you were im­pos­si­ble to be in a re­la­tion­ship with?
Ther­a­pist (con­tin­u­ing role-play): Well, my re­la­tion­ship with [my boyfriend] fell apart. Doesn’t that prove I’m im­pos­si­ble to be in a re­la­tion­ship with?
Pa­tient (con­tin­u­ing role-play): In all hon­esty, what you’re say­ing doesn’t make a lot of sense. In the first place, your boyfriend was also in­volved in the re­la­tion­ship. It takes two to tango. And in the sec­ond place, you were in­volved in a rea­son­ably suc­cess­ful re­la­tion­ship with him for eight years. So how can you claim that you’re im­pos­si­ble to live with?
Ther­a­pist (con­tin­u­ing role-play:) Let me make sure I’ve got this right. You’re say­ing that I was in a rea­son­ably suc­cess­ful re­la­tion­ship for eight years, so it doesn’t make much sense to say that I’m im­pos­si­ble to live with or im­pos­si­ble to be in a re­la­tion­ship with?
Pa­tient (con­tin­u­ing-role-play:) You’ve got it. Crys­tal clear.
At that point, Walter’s face lit up, as if a light­bulb had sud­denly turned on in his brain, and we both started laugh­ing. His nega­tive thoughts sud­denly seemed ab­surd to him, and there was an im­me­di­ate shift in his mood…af­ter Walter put the lie to his nega­tive thoughts, I asked him to rate how he was feel­ing again. His feel­ing of sad­ness fell all the way from 80% to 20%. His feel­ings of guilt, shame, and anx­iety fell all the way to 10%, and his feel­ings of hope­less­ness dropped to 5%. The feel­ings of loneli­ness, em­bar­rass­ment, frus­tra­tion, and anger dis­ap­peared com­pletely.

At the time, Scott ex­pressed con­fu­sion about how just tel­ling some­one that their be­liefs aren’t ra­tio­nal, would be enough to trans­form the be­liefs. But that wasn’t re­ally what hap­pened. Walter was asked whether he’d say some­thing harsh to a friend, and he said no, but that alone wasn’t enough to im­prove his con­di­tion. What did help was putting him in a po­si­tion where he had to re­ally think through the ar­gu­ments for why this is ir­ra­tional in or­der to con­vince his friend, and then, af­ter hav­ing for­mu­lated the ar­gu­ments once him­self, get con­vinced by them him­self.

In terms of our frame­work, we might say that a part of Walter’s mind con­tained a model which out­put a harsh judg­ment of him­self, while an­other part con­tained a model which would out­put a much less harsher judg­ment of some­one else who was in oth­er­wise iden­ti­cal cir­cum­stances. Just bring­ing up the ex­is­tence of this con­tra­dic­tion wasn’t enough to change it: it caused the con­tra­dic­tion to be no­ticed, but didn’t ac­ti­vate the rele­vant mod­els ex­ten­sively enough for their con­tents to be re­pro­cessed.

But when Walter had to role-play a situ­a­tion where he thought of him­self as ac­tu­ally talk­ing with a de­pressed friend, that re­quired him to more fully ac­ti­vate the non-judg­men­tal model and ap­ply it to the rele­vant situ­a­tion. This caused him to blend with the model, tak­ing its per­spec­tive as the truth. When that per­spec­tive was then prop­a­gated to the self-crit­i­cal model, the eas­iest way for the mind to re­solve the con­flict was sim­ply to al­ter the model pro­duc­ing the self-crit­i­cal thoughts.

Note that this kind of a re­sult wasn’t guaran­teed to hap­pen: Walter’s self-crit­i­cal model might have had a rea­son for why these cases were ac­tu­ally differ­ent, and point­ing out that rea­son would have been an­other way for the con­tra­dic­tion to be re­solved. In the ex­am­ple case, how­ever, it seemed to work.

Men­tal contrasting

Another ex­am­ple of ac­ti­vat­ing two con­flict­ing men­tal mod­els and forc­ing an up­date that way comes from the psy­chol­o­gist Gabriele Oet­tin­gen’s book Re­think­ing Pos­i­tive Think­ing. Oet­tin­gen is a psy­chol­o­gist who has stud­ied com­bin­ing a men­tal imagery tech­nique known as “men­tal con­trast­ing” with trig­ger-ac­tion plan­ning.

It is worth not­ing that this book has come un­der some heavy crit­i­cism and may be based on cherry-picked stud­ies. How­ever, in the book this par­tic­u­lar ex­am­ple is just pre­sented as an anec­dote with­out even try­ing to cite any par­tic­u­lar stud­ies in its sup­port. I pre­sent it be­cause I’ve per­son­ally found the tech­nique to be use­ful, and be­cause it feels like a nice con­cise ex­pla­na­tion of the kind of in­te­gra­tion that of­ten works:

Try this ex­er­cise for your­self. Think about a fear you have about the fu­ture that is vex­ing you quite a bit and that you know is un­jus­tified. Sum­ma­rize your fear in three to four words. For in­stance, sup­pose you’re a father who has got­ten di­vorced and you share cus­tody with your ex-wife, who has got­ten re­mar­ried. For the sake of your daugh­ter’s hap­piness, you want to be­come friendly with her step­father, but you find your­self stymied by your own emo­tions. Your fear might be “My daugh­ter will be­come less at­tached to me and more at­tached to her step­father.” Now go on to imag­ine the worst pos­si­ble out­come. In this case, it might be “I feel dis­tanced from my daugh­ter. When I see her she ig­nores me, but she ea­gerly spends time with her step­father.” Okay, now think of the pos­i­tive re­al­ity that stands in the way of this fear com­ing true. What in your ac­tual life sug­gests that your fear won’t re­ally come to pass? What’s the sin­gle key el­e­ment? In this case, it might be “The fact that my daugh­ter is ex­tremely at­tached to me and loves me, and it’s ob­vi­ous to any­one around us.” Close your eyes and elab­o­rate on this re­al­ity.
Now take a step back. Did the ex­er­cise help? I think you’ll find that by be­ing re­minded of the pos­i­tive re­al­ity stand­ing in the way, you will be less trans­fixed by the anx­ious fan­tasy. When I con­ducted this kind of men­tal con­trast­ing with peo­ple in Ger­many, they re­ported that the ex­pe­rience was sooth­ing, akin to tak­ing a warm bath or get­ting a mas­sage. “It just made me feel so much calmer and more se­cure,” one woman told me. “I sense that I am more grounded and fo­cused.”
Men­tal con­trast­ing can pro­duce re­sults with both un­jus­tified fears as well as overblown fears rooted in a ker­nel of truth. If as a child you suffered through a cou­ple of painful vis­its to the den­tist, you might to­day fear go­ing to get a filling re­placed, and this fear might be­come so ter­ror­iz­ing that you put off tak­ing care of your den­tal needs un­til you just can­not avoid it. Men­tal con­trast­ing will help you in this case to ap­proach the task of go­ing to the den­tist. But if your fear is jus­tified, then men­tal con­trast­ing will con­firm this, since there is noth­ing pre­vent­ing your fear from com­ing true. The ex­er­cise will then help you to take pre­ven­tive mea­sures or avoid the im­pend­ing dan­ger al­to­gether.

As in the CBT ex­am­ple, first one men­tal model (the one pre­dict­ing los­ing the daugh­ter’s love) is ac­ti­vated and in­ten­tion­ally blended with, af­ter which an op­pos­ing one is, forc­ing in­te­gra­tion. And as in Walter’s ex­am­ple, this is not guaran­teed to re­solve the con­flict in a more re­as­sur­ing way: the mind can also re­solve the con­flict by de­ter­min­ing that ac­tu­ally the fear is jus­tified.

In­ter­nal Dou­ble Crux /​ In­ter­nal Fam­ily Systems

On some oc­ca­sions a sin­gle round of men­tal con­trast­ing, or the Walter CBT tech­nique, might be enough. In that case, there were two dis­agree­ing mod­els, and bring­ing the dis­agree­ment into con­scious­ness was enough to re­ject the other one en­tirely. But it is not always so clear-cut; some­times there are sub­agents which dis­agree, and both of them ac­tu­ally have some valid points.

For in­stance, some­one might have a sub­agent which wants the per­son to do so­cially risky things, and an­other sub­agent which wants to play things safe. Nei­ther is un­am­bigu­ously wrong: on the other hand, some things are so risky that you should never try to do them. On the other hand, never do­ing any­thing which oth­ers might dis­ap­prove of is not go­ing to lead to a par­tic­u­larly happy life, ei­ther.

In that case, one may need to ac­tively fa­cil­i­tate a di­alogue be­tween the sub­agents, such as in the CFAR tech­nique of In­ter­nal Dou­ble Crux (de­scrip­tion, dis­cus­sion and ex­am­ple, ex­am­ple as ap­plied to diet­ing), iter­at­ing it for sev­eral rounds un­til both sub­agents come to agree­ment. The CBT and men­tal con­trast­ing ex­am­ples above might be con­sid­ered spe­cial cases of an IDC ses­sion, where agree­ment was reached within a sin­gle round of dis­cus­sion.

More broadly, IDC it­self can be con­sid­ered a spe­cial case of ap­ply­ing In­ter­nal Fam­ily Sys­tems, which in­cludes fa­cil­i­tat­ing con­ver­sa­tions be­tween mu­tu­ally op­pos­ing sub­agents as one of its tech­niques.

Self-con­cept editing

In the sum­mer of 2017, I found Steve An­dreas’s book Trans­form­ing Your Self, and ap­plied its tech­niques to fix­ing a num­ber of is­sues in my self-con­cepts which had con­tributed to my de­pres­sion and anx­iety. Effects from this work which have lasted in­clude no longer hav­ing gen­er­al­ized feel­ings of shame, no longer need­ing con­stant val­i­da­tion to avoid such feel­ings of shame, no longer be­ing mo­ti­vated by a de­sire to prove to my­self that I’m a good per­son, and no longer hav­ing ob­ses­sive es­capist fan­tasies, among other things.

I wrote an ar­ti­cle at the time that de­scribed the work. The model in Trans­form­ing Your Self is that I might have a self-con­cept such as “I am kind”. That self-con­cept is made up of mem­o­ries of times when I ei­ther was kind (ex­am­ples of the con­cept), or times when I was not (coun­terex­am­ples). In a healthy self-con­cept, both ex­am­ples and coun­terex­am­ples are in­te­grated to­gether: you might have mem­o­ries of how you are kind in gen­eral, but also mem­o­ries of not be­ing very kind at times when you were e.g. un­der a lot of stress. This al­lows you to both know your gen­eral ten­dency, as well as let­ting you pre­pare for situ­a­tions where you know that you won’t be very kind.

The book’s model also holds that some­times a per­son’s coun­terex­am­ples might be split off from their ex­am­ples. This leads to an un­sta­ble self-con­cept: ei­ther your sub­con­scious at­ten­tion is fo­cused on the ex­am­ples and to­tally ig­nores the coun­terex­am­ples, in which case you feel good and kind, or it swings to the coun­terex­am­ples and to­tally ig­nores the ex­am­ples, in which case you feel like a ter­rible hor­rible per­son with no re­deem­ing qual­ities. You need a con­stant stream of ex­ter­nal val­i­da­tion and ev­i­dence in or­der to keep your at­ten­tion an­chored on the ex­am­ples; the mo­ment it ceases, your at­ten­tion risks swing­ing to the coun­terex­am­ples again.

While I didn’t have the con­cept back then, what I did could also be seen as in­te­grat­ing true but dis­agree­ing per­spec­tives be­tween two sub­agents. There was one sub­agent which held mem­o­ries of times when I had acted in what it thought of as a bad way, and was us­ing feel­ings of shame to mo­ti­vate me to make up for those ac­tions. Another sub­agent was then re­act­ing to it by mak­ing me do more and more things which I could use to prove to my­self and oth­ers that I was in­deed a good per­son. (This de­scrip­tion roughly fol­lows the fram­ing and con­cep­tu­al­iza­tion of self-es­teem and guilt/​shame in the IFS book Free­dom from your In­ner Critic.)

Un­der the so­ciome­ter the­ory of self-es­teem, self-es­teem is an in­ter­nal eval­u­a­tion of one’s worth as a part­ner to oth­ers. With this kind of an in­ter­pre­ta­tion, it makes sense to have sub­agents act­ing in the ways that I de­scribed: if you have done things that your so­cial group would judge you for, then it be­comes im­por­tant to do things which prove your worth and make them for­give you.

This then be­comes a spe­cial case of an IFS ex­ile/​pro­tec­tor dy­namic. Un­der that for­mu­la­tion, the split­ting of the coun­terex­am­ples and the lack of up­dat­ing ac­tu­ally serves a pur­pose. The sub­agent hold­ing the mem­o­ries of do­ing shame­ful things doesn’t want to stop gen­er­at­ing the feel­ings of shame un­til it has re­ceived suffi­cient ev­i­dence that the “prove your worth” be­hav­ior has ac­tu­ally be­come un­nec­es­sary.

One of the tech­niques from Trans­form­ing Your Self that I used to fix my self-con­cept was in­te­grat­ing the ex­am­ples by adding qual­ifiers to the coun­terex­am­ples: “when I was a child, and my ex­ec­u­tive con­trol wasn’t as de­vel­oped, I didn’t always act as kindly as I could have”. Un­der the be­lief fram­ing, this al­lowed my mem­o­ries to be in­te­grated in a way which showed that my self­ish­ness as a child was no longer ev­i­dence of me be­ing hor­rible in gen­eral. Un­der the sub­agent fram­ing, this com­mu­ni­cated to the shame-gen­er­at­ing sub­agent that the things that I did as a child would no longer be held against me, and that it was safe to re­lax.

Another tech­nique men­tioned in Trans­form­ing Your Self, which I did not per­son­ally need to use, was trans­lat­ing the con­cerns of sub­agents into a com­mon lan­guage. For in­stance, some­one’s pos­i­tive self-con­cept ex­am­ples might be in the form of men­tal images, with their nega­tive coun­terex­am­ples be­ing in the form of a voice which re­minds them of their failures. In that case, they might trans­late the in­ner speech into men­tal imagery by vi­su­al­iz­ing what the voice is say­ing, turn­ing both the ex­am­ples and coun­terex­am­ples into men­tal images that can then be com­bined. This brings us to…

Trans­lat­ing into a com­mon language

Eliezer pre­sents an ex­am­ple of two differ­ent fram­ings elic­it­ing con­flict­ing be­hav­ior in his “Cir­cu­lar Altru­ism” post:

Sup­pose that a dis­ease, or a mon­ster, or a war, or some­thing, is kil­ling peo­ple. And sup­pose you only have enough re­sources to im­ple­ment one of the fol­low­ing two op­tions:
1. Save 400 lives, with cer­tainty.
2. Save 500 lives, with 90% prob­a­bil­ity; save no lives, 10% prob­a­bil­ity.
Most peo­ple choose op­tion 1. [...] If you pre­sent the op­tions this way:
1. 100 peo­ple die, with cer­tainty.
2. 90% chance no one dies; 10% chance 500 peo­ple die.
Then a ma­jor­ity choose op­tion 2. Even though it’s the same gam­ble. You see, just as a cer­tainty of sav­ing 400 lives seems to feel so much more com­fortable than an un­sure gain, so too, a cer­tain loss feels worse than an un­cer­tain one.

In my pre­vi­ous post, I pre­sented a model where sub­agents which are most strongly ac­ti­vated by the situ­a­tion are the ones that get ac­cess to the mo­tor sys­tem. If you are hun­gry and have a meal in front of you, the pos­si­bil­ity of eat­ing is the most salient and valuable fea­ture of the situ­a­tion. As a re­sult, sub­agents which want you to eat get the most de­ci­sion-mak­ing power. On the other hand, if this is a restau­rant in Juras­sic Park and a ve­lo­cirap­tor sud­denly charges through the win­dow, then the dan­ger­ous as­pects of the situ­a­tion be­come most salient. That lets the sub­agents which want you to flee to get the most de­ci­sion-mak­ing power.

Eliezer’s ex­pla­na­tion of the sav­ing lives dilemma is that in the first fram­ing, the cer­tainty of sav­ing 400 lives is salient, whereas in the sec­ond ex­pla­na­tion the cer­tainty of los­ing 100 lives is salient. We can in­ter­pret this in similar terms as the “eat or run” dilemma: the ac­tion which gets cho­sen, de­pends on which fea­tures are the most salient and how those fea­tures ac­ti­vate differ­ent sub­agents (or how those fea­tures high­light differ­ent pri­ori­ties, if we are not us­ing the sub­agent frame).

Sup­pose that you are some­one who was tempted to choose op­tion 1 when you were pre­sented with the first fram­ing, and op­tion 2 when you were pre­sented with the sec­ond fram­ing. It is now pointed out to you that these are ac­tu­ally ex­actly equiv­a­lent. You re­al­ize that it would be in­con­sis­tent to pre­fer one op­tion over the other just de­pend­ing on the fram­ing. Fur­ther­more, and maybe even more cru­cially, re­al­iz­ing this makes both the “cer­tainty of sav­ing 400 lives” and “cer­tainty of los­ing 100 lives” fea­tures be­come equally salient. That puts the rele­vant sub­agents (pri­ori­ties) on more equal terms, as they are both ac­ti­vated to the same ex­tent.

What hap­pens next de­pends on what the rel­a­tive strengths of those sub­agents (pri­ori­ties) are oth­er­wise, and whether you hap­pen to know about ex­pected value. Maybe you con­sider the situ­a­tion and one of the two sub­agents (pri­ori­ties) hap­pens to be stronger, so you de­cide to con­sis­tently save 400 or con­sis­tently lose 100 lives in both situ­a­tions. Alter­na­tively, the con­flict­ing pri­ori­ties may be re­solved by in­tro­duc­ing the rule that “when de­tect­ing this kind of a dilemma, con­vert both op­tions into an ex­pected value of lives saved, and pick the op­tion with the higher value”.

By con­vert­ing the op­tions to an ex­pected value, one can get a ba­sis by which two oth­er­wise equal op­tions can be eval­u­ated and cho­sen be­tween. Another way of look­ing at it is that this is bring­ing in a third kind of con­sid­er­a­tion/​sub­agent (knowl­edge of the de­ci­sion-the­o­ret­i­cally op­ti­mal de­ci­sion) in or­der to re­solve the tie.

Urge propagation

CFAR and Har­vard Effec­tive Altru­ism is a video of a lec­ture given by former CFAR in­struc­tors Valen­tine Smith and Dun­can Sa­bien. In Valen­tine’s part of the lec­ture, he de­scribes a few mo­ti­va­tional tech­niques which work by men­tally re­fram­ing the con­tents of an ex­pe­rience.

The first ex­am­ple in­volves hav­ing a $50 park­ing ticket, which—un­less paid within 30 days—will ac­crue an ad­di­tional $90 penalty. This kind of a thing tends to feel ughy to deal with, caus­ing an in­cli­na­tion to avoid think­ing about it—while also be­ing aware of the need to do some­thing about it. Some­thing along the lines of two differ­ent sub­agents which are both try­ing to avoid pain us­ing op­po­site meth­ods—one by not think­ing about un­pleas­ant things, an­other by do­ing things which stop fu­ture un­pleas­ant­ness.

Val’s sug­gested ap­proach in­volves not­ing that if you in­stead had a cheque for $90, which would ex­pire in 30 days, then that would not cause such a dis­in­cli­na­tion. Rather, it would feel ac­tively pleas­ant to cash it in and get the money.

The struc­ture of the “park­ing ticket” and “cheque” sce­nar­ios are equiv­a­lent, in that both cases you can take an ac­tion to be $90 bet­ter off af­ter 30 days. If you no­tice this, then it may be pos­si­ble for you to re-in­ter­pret the ac­tion of pay­ing off the park­ing ticket as some­thing that gains you money, maybe by some­thing like liter­ally look­ing at it and imag­in­ing it as a cheque that you can cash in, un­til cash­ing it in starts feel­ing ac­tively pleas­ant.

Val em­pha­sizes that this is not just an ar­bi­trary mo­ti­va­tional hack: it’s im­por­tant that your re­frame is ac­tu­ally bring­ing in real facts from the world. You don’t want to just imag­ine the park­ing ticket as a tick­ing time bomb, or as some­thing else which it ac­tu­ally isn’t. Rather, you want to do a re­frame which in­te­grates both per­spec­tives, while also high­light­ing the fea­tures which will help fix the con­flict.

One de­scrip­tion of what hap­pens here would be that once the pain-avoid­ing sub­agent no­tices that pay­ing the park­ing ticket can feel like a net gain, and that it be­ing a net gain is ac­tu­ally de­scribing a real fact about the world, then it can drop its ob­jec­tion and you can pro­ceed to take ac­tions. The other way of look­ing at it is that like with ex­pected value, you are in­tro­duc­ing a com­mon cur­rency—the fu­ture im­pact on your fi­nances—which al­lows the salient fea­tures from both sub­agents’ per­spec­tives to be in­te­grated and then re­solved.

Val’s sec­ond ex­am­ple in­volves a case where he found him­self not do­ing push-ups like he had in­tended to. When ex­am­in­ing the rea­son why not, he no­ticed that the push-ups felt phys­i­cally un­pleas­ant: they in­volved sweat­ing, pant­ing, and a burn­ing sen­sa­tion, and this caused a feel­ing of aver­sion.

Part of how he solved the is­sue was by re­al­iz­ing that his origi­nal goal for get­ting ex­er­cise was to live longer and be in bet­ter health. The un­pleas­ant phys­i­cal sen­sa­tions were a sign that he was push­ing his body hard enough that the push-ups would ac­tu­ally be use­ful for this goal. He could then cre­ate a men­tal con­nec­tion be­tween the sen­sa­tions and his goal of be­ing healthier and liv­ing longer: the sen­sa­tions started feel­ing like some­thing pos­i­tive, since they were an in­di­ca­tion of progress.

Be­sides be­ing an ex­am­ple of cre­at­ing a com­mon rep­re­sen­ta­tion be­tween the sub­agents, this can also be viewed as do­ing a round of In­ter­nal Dou­ble Crux, some­thing like:

Ex­er­cise sub­agent: We should ex­er­cise.
Op­ti­mizer sub­agent: That feels un­pleas­ant and costs a lot of en­ergy, we would have the en­ergy to do more things if we didn’t ex­er­cise.
Ex­er­cise sub­agent: That’s true. But the feel­ings of un­pleas­ant­ness are ac­tu­ally a sign of us get­ting more en­ergy in the long term.
Op­ti­mizer sub­agent: Oh, you’re right! Then let’s ex­er­cise, that fur­thers my goals too.

(There’s also a bunch of other good stuff in the video that I didn’t de­scribe here, you may want to check it out if you haven’t already done so.)

Ex­po­sure Therapy

So far, most of the ex­am­ples have as­sumed that the per­son already has all the in­for­ma­tion nec­es­sary for solv­ing the in­ter­nal dis­agree­ment. But some­times ad­di­tional in­for­ma­tion might be re­quired.

The pro­to­typ­i­cal use of ex­po­sure ther­apy is for pho­bias. Some­one might have a pho­bia of dogs, while at the same time feel­ing that their fear is ir­ra­tional, so they de­cide to get ther­apy for their pho­bia.

How the ther­apy typ­i­cally pro­ceeds is by ex­pos­ing the per­son to their fear in in­cre­ments that are as small as pos­si­ble. For in­stance, a page by Anx­iety Canada offers this list of steps that some­one might have for ex­pos­ing them­selves to dogs:

Step 1: Draw a dog on a piece of pa­per.
Step 2: Read about dogs.
Step 3: Look at pho­tos of dogs.
Step 4: Look at videos of dogs.
Step 5: Look at dogs through a closed win­dow.
Step 6: Then through a partly-opened win­dow, then open it more and more.
Step 7: Look at them from a door­way.
Step 8: Move fur­ther out the door­way; then fur­ther etc.
Step 9: Have a helper bring a dog into a nearby room (on a leash).
Step 10: Have the helper bring the dog into the same room, still on a leash.

The ideal is that each step is enough to make you feel a lit­tle scared, but not so scared that it would serve to act re­trau­ma­tize you or oth­er­wise make you feel hor­rible about what hap­pened.

In a sense, ex­po­sure ther­apy in­volves one part of the mind think­ing that the situ­a­tion is safe, and an­other part of the mind think­ing that the situ­a­tion is un­safe, and the con­tra­dic­tion be­ing re­solved by test­ing it. If some­one feels ner­vous about look­ing at a photo of a dog, it im­plies that a part of their mind thinks that see­ing a photo of a dog means they are po­ten­tially in dan­ger. (In terms of the ma­chine learn­ing toy model from my IFS post, it means that a fear model is ac­ti­vated, which pre­dicts the cur­rent state to be dan­ger­ous.)

By look­ing at pho­tos suffi­ciently many times, and then af­ter­wards not­ing that ev­ery­thing is okay, the ner­vous sub­agent gets in­for­ma­tion about hav­ing been wrong, and up­dates its model. Over time, and as the per­son goes for­ward in steps, the ner­vous sub­agent can even­tu­ally con­clude that it had over­gen­er­al­ized from the origi­nal trauma, and that dogs in gen­eral aren’t that dan­ger­ous af­ter all.

As in the CBT ex­am­ple, one can view this as ac­ti­vat­ing con­flict­ing mod­els and the mind then fix­ing the con­flict by up­dat­ing the mod­els. In this case, the con­flict is be­tween the fright­ened sub­agent’s pre­dic­tion that see­ing the dog is a sign of dan­ger, and an­other sub­agent’s later as­sess­ment that ev­ery­thing turned out to be fine.

Con­clu­sion to in­te­gra­tion methods

I have con­sid­ered here a num­ber of ways of in­te­grat­ing sub­agent con­flicts. Here are a few key prin­ci­ples that are used in them:

  • Selec­tively blend­ing with sub­agents/​be­liefs to make dis­agree­ments be­tween them more ap­par­ent. Used in the Cog­ni­tive Be­hav­ioral Ther­apy and men­tal con­trast­ing cases. Also used in a some­what differ­ent form in ex­po­sure ther­apy, where you are par­tially blended with a sub­agent that thinks that the situ­a­tion is dan­ger­ous, while get­ting dis­agree­ing in­for­ma­tion from the rest of the world.

  • Fa­cil­i­tat­ing a di­alogue be­tween sub­agents “from the out­side”. Used in In­ter­nal Dou­ble Crux, In­ter­nal Fam­ily Sys­tems. In a sense, the next bul­let can also be viewed a spe­cial case of this.

    • Com­bin­ing as­pects of the con­flict­ing per­spec­tives to a whole which al­lows for re­s­olu­tion. Used in self-con­cept edit­ing, Eliezer’s al­tru­ism ex­am­ple, and urge prop­a­ga­tion.

  • Col­lect­ing ad­di­tional in­for­ma­tion which al­lows for the dis­agree­ment to be re­solved. Used in ex­po­sure ther­apy.

I be­lieve that we have evolved to use all of these spon­ta­neously, with­out nec­es­sar­ily re­al­iz­ing what it is that we are do­ing.

For ex­am­ple, many peo­ple have the ex­pe­rience of it be­ing use­ful to talk to a friend about your prob­lems, weight­ing the pros and cons of differ­ent op­tions. Fre­quently just get­ting to talk about it helps clar­ify the is­sue, even if the friend doesn’t say any­thing (or even if they are a rub­ber duck). Prob­a­bly not co­in­ci­den­tally, if you are talk­ing about the con­flict­ing feel­ings that you have in your mind, then you are fre­quently do­ing some­thing like an in­for­mal ver­sion of In­ter­nal Dou­ble Crux. You are rep­re­sent­ing all the sides of a dilemma un­til you have reached a con­clu­sion and in­te­grated the differ­ent per­spec­tives.

To the ex­tent that they are effec­tive, var­i­ous schools of ther­apy and self-im­prove­ment—rang­ing from CBT to IDC to IFS—are for­mal­ized meth­ods for mak­ing such in­te­gra­tion more effec­tively.