Compartmentalization in epistemic and instrumental rationality

Re­lated to: Hu­mans are not au­to­mat­i­cally strate­gic, The mys­tery of the haunted ra­tio­nal­ist, Striv­ing to ac­cept, Tak­ing ideas seriously

I ar­gue that many tech­niques for epistemic ra­tio­nal­ity, as taught on LW, amount to tech­niques for re­duc­ing com­part­men­tal­iza­tion. I ar­gue fur­ther that when these same tech­niques are ex­tended to a larger por­tion of the mind, they boost in­stru­men­tal, as well as epistemic, ra­tio­nal­ity.

Imag­ine try­ing to de­sign an in­tel­li­gent mind.

One prob­lem you’d face is de­sign­ing its goal.

Every time you de­signed a goal-in­di­ca­tor, the mind would in­crease ac­tion pat­terns that hit that in­di­ca­tor[1]. Amongst these re­in­forced ac­tions would be “wire­head­ing pat­terns” that fooled the in­di­ca­tor but did not hit your in­tended goal. For ex­am­ple, if your crea­ture gains re­ward from in­ter­nal in­di­ca­tors of sta­tus, it will in­crease those in­di­ca­tors—in­clud­ing by such meth­ods as sur­round­ing it­self with peo­ple who agree with it, or con­vinc­ing it­self that it un­der­stood im­por­tant mat­ters oth­ers had missed. It would be hard-wired to act as though “be­liev­ing makes it so”.

A sec­ond prob­lem you’d face is prop­a­gat­ing ev­i­dence. When­ever your crea­ture en­coun­ters some new ev­i­dence E, you’ll want it to up­date its model of “events like E”. But how do you tell which events are “like E”? The soup of hy­pothe­ses, in­tu­ition-frag­ments, and other pieces of world-model is too large, and its pro­cess­ing too limited, to up­date each be­lief af­ter each piece of ev­i­dence. Even ab­sent wire­head­ing-driven ten­den­cies to keep re­ward­ing be­liefs iso­lated from threat­en­ing ev­i­dence, you’ll prob­a­bly have trou­ble with ac­ci­den­tal com­part­men­tal­iza­tion (where the crea­ture doesn’t up­date rele­vant be­liefs sim­ply be­cause your heuris­tics for what to up­date were im­perfect).

Evolu­tion, AFAICT, faced just these prob­lems. The re­sult is a fa­mil­iar set of ra­tio­nal­ity gaps:

I. Ac­ci­den­tal compartmentalization

a. Belief com­part­men­tal­iza­tion: We of­ten fail to prop­a­gate changes to our ab­stract be­liefs (and we of­ten make pre­dic­tions us­ing un-up­dated, spe­cial­ized com­po­nents of our soup of world-model). Thus, learn­ing modus tolens in the ab­stract doesn’t au­to­mat­i­cally change your an­swer to the Wa­son card test. Learn­ing about con­ser­va­tion of en­ergy doesn’t au­to­mat­i­cally change your fear when a bowl­ing ball is hurtling to­ward you. Un­der­stand­ing there aren’t ghosts doesn’t au­to­mat­i­cally change your an­ti­ci­pa­tions in a haunted house. (See Will’s ex­cel­lent post Tak­ing ideas se­ri­ously for fur­ther dis­cus­sion).

b. Goal com­part­men­tal­iza­tion: We of­ten fail to prop­a­gate in­for­ma­tion about what “los­ing weight”, “be­ing a skil­led thinker”, or other goals would con­cretely do for us. We also fail to prop­a­gate in­for­ma­tion about what spe­cific ac­tions could fur­ther these goals. Thus (ab­sent the con­crete vi­su­al­iza­tions recom­mended in many self-help books) our goals fail to pull our be­hav­ior, be­cause al­though we ver­bally know the con­se­quences of our ac­tions, we don’t vi­su­al­ize those con­se­quences on the “near-mode” level that prompts emo­tions and ac­tions.

c. Failure to flush garbage: We of­ten con­tinue to work to­ward a sub­goal that no longer serves our ac­tual goal (cre­at­ing what Eliezer calls a lost pur­pose). Similarly, we of­ten con­tinue to dis­cuss, and care about, con­cepts that have lost all their moor­ings in an­ti­ci­pated sense-ex­pe­rience.

II. Re­in­forced com­part­men­tal­iza­tion:

Type 1: Dis­torted re­ward sig­nals. If X is a re­in­forced goal-in­di­ca­tor (“I have sta­tus”; “my mother ap­proves of me”[2]), think­ing pat­terns that bias us to­ward X will be re­in­forced. We will learn to com­part­men­tal­ize away anti-X in­for­ma­tion.

The prob­lem is not just con­scious wish­ful think­ing; it is a sphex­ish, half-alien mind that dis­torts your be­liefs by re­in­forc­ing mo­tives, an­gles or ap­proach or anal­y­sis, choices of read­ing ma­te­rial or dis­cus­sion part­ners, etc. so as to bias you to­ward X, and to com­part­men­tal­ize away anti-X in­for­ma­tion.

Im­pair­ment to epistemic ra­tio­nal­ity:

  • “[com­plex rea­son­ing]… and so my past views are cor­rect!” (if I value “hav­ing ac­cu­rate views”, and so I’m re­in­forced for be­liev­ing my views ac­cu­rate)

  • “… and so my lat­est origi­nal the­ory is im­por­tant and worth fo­cus­ing my ca­reer on!” (if I value “do­ing high-qual­ity re­search”)

  • “… and so the op­ti­mal way to con­tribute to the world, is for me to con­tinue in ex­actly my pre­sent ca­reer...” (if I value both my pre­sent ca­reer and “be­ing a util­i­tar­ian”)

  • “… and so my friends’ poli­tics is cor­rect.” (if I have value both “tel­ling the truth” and “be­ing liked by my friends”)

Im­pair­ment to in­stru­men­tal ra­tio­nal­ity:

  • “… and so the two-fin­gered typ­ing method I’ve used all my life is effec­tive, and isn’t worth chang­ing” (if I value “us­ing effec­tive meth­ods” and/​or avoid­ing difficulty)

  • “… and so the ar­gu­ment was all his fault, and I was blame­less” (if I value “treat­ing my friends eth­i­cally”)

  • “… and so it’s be­cause they’re rot­ten peo­ple that they don’t like me, and there’s noth­ing I might want to change in my so­cial habits.”

  • “… and so I don’t care about dat­ing any­how, and I have no rea­son to risk ap­proach­ing some­one.”

Type 2: Ugh fields”, or “no thought zones”. If we have a large amount of anti-X in­for­ma­tion clut­ter­ing up our brains, we may avoid think­ing about X at all, since con­sid­er­ing X tends to re­duce com­part­men­tal­iza­tion and send us pain sig­nals. Some­times, this in­volves not-act­ing in en­tire do­mains of our lives, lest we be re­minded of X.

Im­pair­ment to epistemic ra­tio­nal­ity:

Im­pair­ment to in­stru­men­tal ra­tio­nal­ity:

  • Many of us avoid learn­ing new skills (e.g., tak­ing a dance class, or prac­tic­ing so­cial ban­ter), be­cause prac­tic­ing them re­minds us of our non-com­pe­tence, and sends pain sig­nals.

  • The longer we’ve avoided pay­ing a bill, start­ing a piece of writ­ing, clean­ing out the garage, etc., the harder it may be to think about the task at all (if we feel pain about hav­ing avoided it);

  • The more we care about our perfor­mance on a high-risk task, the harder it may be to start work­ing on it (so that the high­est value tasks, with the most un­cer­tain out­comes, are those we leave to the last minute de­spite the ex­pected im­pact of such pro­cras­ti­na­tion);

  • We may avoid mak­ing plans for death, dis­ease, break-up, un­em­ploy­ment, or other un­pleas­ant con­tin­gen­cies.

Type 3: Wire­head­ing pat­terns that fill our lives, and pre­vent other thoughts and ac­tions. [3]

Im­pair­ment to epistemic ra­tio­nal­ity:

  • We of­ten spend our think­ing time re­hears­ing rea­sons why our be­liefs are cor­rect, or why our the­o­ries are in­ter­est­ing, in­stead of think­ing new thoughts.

Im­pair­ment to in­stru­men­tal ra­tio­nal­ity:

  • We of­ten take ac­tions to sig­nal to our­selves that we have par­tic­u­lar goals, in­stead of act­ing to achieve those goals. For ex­am­ple, we may go through the mo­tions of study­ing or work­ing, and feel good about our dili­gence, while pay­ing lit­tle at­ten­tion to the re­sults.

  • We of­ten take ac­tions to sig­nal to our­selves that we already have par­tic­u­lar skills, in­stead of act­ing to ac­quire those skills. For ex­am­ple, we may pre­fer to play games against folks we of­ten beat, re­quest cri­tiques from those likely to praise our abil­ities, re­hearse yet more pro­jects in our do­mains of ex­ist­ing strength, etc.

Strate­gies for re­duc­ing com­part­men­tal­iza­tion:

A huge por­tion of both Less Wrong and the self-help and busi­ness liter­a­tures amounts to tech­niques for in­te­grat­ing your thoughts—for bring­ing your whole mind, with all your in­tel­li­gence and en­ergy, to bear on your prob­lems. Many fall into the fol­low­ing cat­e­gories, each of which boosts both epistemic and in­stru­men­tal ra­tio­nal­ity:

1. Some­thing to pro­tect (or, as Napoleon Hill has it, definite ma­jor pur­pose[4]): Find an ex­ter­nal goal that you care deeply about. Vi­su­al­ize the goal; re­mind your­self of what it can do for you; in­te­grate the de­sire across your mind. Then, use your de­sire to achieve this goal, and your knowl­edge that ac­tual in­quiry and effec­tive ac­tions can help you achieve it, to re­duce wire­head­ing temp­ta­tions.

2. Trans­late ev­i­dence, and goals, into terms that are easy to un­der­stand. It’s more painful to re­mem­ber “Aunt Jane is dead” than “Aunt Jane passed away” be­cause more of your brain un­der­stands the first sen­tence. There­fore use sim­ple, con­crete terms, whether you’re say­ing “Aunt Jane is dead” or “Damn, I don’t know calcu­lus” or “Light bends when it hits wa­ter” or “I will earn a mil­lion dol­lars”. Work to up­date your whole web of be­liefs and goals.

3. Re­duce the emo­tional gra­di­ents that fuel wire­head­ing. Leave your­self lines of re­treat. Re­cite the lita­nies of Gendlin and Tarski; vi­su­al­ize their mean­ing, con­cretely, for the task or ugh field bend­ing your thoughts. Think through the painful in­for­ma­tion; no­tice the ex­pected up­date, so that you need not fear fur­ther thought. On your to-do list, write con­crete “next ac­tions”, rather than vague goals with no clear steps, to make the list less scary.

4. Be aware of com­mon pat­terns of wire­head­ing or com­part­men­tal­iza­tion, such as failure to ac­knowl­edge sunk costs. Build habits, and per­haps iden­tity, around cor­rect­ing these pat­terns.

I sus­pect that if we fol­low up on these par­allels, and learn strate­gies for de­com­part­men­tal­iz­ing not only our far-mode be­liefs, but also our near-mode be­liefs, our mod­els of our­selves, our cu­ri­os­ity, and our near- and far-mode goals and emo­tions, we can cre­ate a more pow­er­ful ra­tio­nal­ity—a ra­tio­nal­ity for the whole mind.

[1] As­sum­ing it’s a re­in­force­ment learner, tem­po­ral differ­ence learner, per­cep­tual con­trol sys­tem, or similar.

[2] We re­ceive re­ward/​pain not only from “prim­i­tive re­in­forcers” such as smiles, sugar, warmth, and the like, but also from many long-term pre­dic­tors of those re­in­forcers (or pre­dic­tors of pre­dic­tors of those re­in­forcers, or...), such as one’s LW karma score, one’s num­ber the­ory prowess, or a spe­cific per­son’s es­teem. We prob­a­bly wish to re­gard some of these learned re­in­forcers as part of our real prefer­ences.

[3] Ar­guably, wire­head­ing gives us fewer long-term re­ward sig­nals than we would achieve from its ab­sence. Why does it per­sist, then? I would guess that the an­swer is not so much hy­per­bolic dis­count­ing (al­though this does play a role) as lo­cal hill-climb­ing be­hav­ior; the sim­ple, par­allel sys­tems that fuel most of our learn­ing can’t see how to get from “avoid think­ing about my bill” to “gen­uinely re­lax, af­ter pay­ing my bill”. You, though, can see such paths—and if you search for such im­prove­ments and vi­su­al­ize the re­wards, it may be eas­ier to re­duce wire­head­ing.

[4] I’m not recom­mend­ing Napoleon Hill. But even this un­usu­ally LW-un­friendly self-help book seems to get most points right, at least in the linked sum­mary. You might try read­ing the sum­mary as an ex­er­cise in rec­og­niz­ing mostly-ac­cu­rate state­ments when ex­pressed in the en­emy’s vo­cab­u­lary.