“Flinching away from truth” is often about *protecting* the epistemology

Re­lated to: Leave a line of re­treat; Cat­e­go­riz­ing has con­se­quences.

There’s a story I like, about this lit­tle kid who wants to be a writer. So she writes a story and shows it to her teacher.

“You mis­spelt the word ‘ocean’”, says the teacher.

“No I didn’t!”, says the kid.

The teacher looks a bit apolo­getic, but per­sists: “‘Ocean’ is spelt with a ‘c’ rather than an ‘sh’; this makes sense, be­cause the ‘e’ af­ter the ‘c’ changes its sound…”

No I didn’t!” in­ter­rupts the kid.

“Look,” says the teacher, “I get it that it hurts to no­tice mis­takes. But that which can be de­stroyed by the truth should be! You did, in fact, mis­spell the word ‘ocean’.”

“I did not!” says the kid, where­upon she bursts into tears, and runs away and hides in the closet, re­peat­ing again and again: “I did not mis­spell the word! I can too be a writer!”.

I like to imag­ine the in­side of the kid’s head as con­tain­ing a sin­gle bucket that houses three differ­ent vari­ables that are ini­tially all stuck to­gether:

Origi­nal state of the kid’s head:

The goal, if one is seek­ing ac­tual true be­liefs, is to sep­a­rate out each of these vari­ables into its own sep­a­rate bucket, so that the “is ‘os­hun’ spelt cor­rectly?” vari­able can up­date to the ac­cu­rate state of “no”, with­out si­mul­ta­neously forc­ing the “Am I al­lowed to pur­sue my writ­ing am­bi­tion?” vari­able to up­date to the in­ac­cu­rate state of “no”.

De­sir­able state (re­quires some­how ac­quiring more buck­ets):

The trou­ble is, the kid won’t nec­es­sar­ily ac­quire enough buck­ets by try­ing to “grit her teeth and look at the painful thing”. A naive at­tempt to “just re­frain from flinch­ing away, and form true be­liefs, how­ever painful” risks in­tro­duc­ing a more im­por­tant er­ror than her cur­rent spel­ling er­ror: mis­tak­enly be­liev­ing she must stop work­ing to­ward be­ing a writer, since the bit­ter truth is that she spel­led ‘os­hun’ in­cor­rectly.

State the kid might ac­ci­den­tally land in, if she naively tries to “face the truth”:

(You might take a mo­ment, right now, to name the cog­ni­tive rit­ual the kid in the story *should* do (if only she knew the rit­ual). Or to name what you think you’d do if you found your­self in the kid’s situ­a­tion—and how you would no­tice that you were at risk of a “buck­ets er­ror”.)

More ex­am­ples:

It seems to me that bucket er­rors are ac­tu­ally pretty com­mon, and that many (most?) men­tal flinches are in some sense at­tempts to avoid bucket er­rors. The fol­low­ing ex­am­ples are slightly-fic­tion­al­ized com­pos­ites of things I sus­pect hap­pen a lot (ex­cept the “me” ones; those are just liter­ally real):

Diet: Adam is on a diet with the in­tent to lose weight. Betty starts to tell him about some stud­ies sug­gest­ing that the diet he is on may cause health prob­lems. Adam com­plains: “Don’t tell me this! I need to stay mo­ti­vated!”

One in­ter­pre­ta­tion, as di­a­gramed above: Adam is at risk of ac­ci­den­tally equat­ing the two vari­ables, and ac­ci­den­tally *as­sum­ing* that the stud­ies im­ply that the diet must stop be­ing viscer­ally mo­ti­vat­ing. He semi-con­sciously per­ceives that this risks er­ror, and so ob­jects to hav­ing the in­for­ma­tion come in and po­ten­tially force the er­ror.

Pizza pur­chase: I was try­ing to save money. But I also wanted pizza. So I found my­self tempted to buy the pizza *re­ally quickly* so that I wouldn’t be able to no­tice that it would cost money (and, thus, so I would be able to buy the pizza):

On this nar­ra­tion: It wasn’t *nec­es­sar­ily* a mis­take to buy pizza to­day. Part of me cor­rectly per­ceived this “not nec­es­sar­ily a mis­take to buy pizza” state. Part of me also ex­pected that the rest of me wouldn’t per­ceive this, and that, if I started think­ing it through, I might get locked into the no-pizza state even if pizza was bet­ter. So it tried to ‘help’ by buy­ing the pizza *re­ally quickly, be­fore I could think and get it wrong*. [1]

On the par­tic­u­lar oc­ca­sion about the pizza (which hap­pened in 2008, around the time I be­gan read­ing Eliezer’s LW Se­quences), I ac­tu­ally man­aged to no­tice that the “rush to buy the pizza be­fore I could think” pro­cess was go­ing on. So I tried promis­ing my­self that, if I still wanted the pizza af­ter think­ing it through, I would get the pizza. My re­sis­tance to think­ing it through van­ished im­me­di­ately. [2]

To briefly give sev­eral more ex­am­ples, with­out di­a­grams (you might see if you can vi­su­al­ize how a buck­ets di­a­gram might go in these):

  • Carol is afraid to no­tice a po­ten­tial flaw in her startup, lest she lose the abil­ity to try full force on it.

  • Don finds him­self re­luc­tant to ques­tion his be­lief in God, lest he be forced to con­clude that there’s no point to moral­ity.

  • As a child, I was afraid to al­low my­self to ac­tu­ally con­sider giv­ing some of my al­lowance to poor peo­ple, even though part of me wanted to do so. My fear I was that if I al­lowed the “maybe you should give away your money, be­cause maybe ev­ery­one mat­ters evenly and you should be con­se­quen­tial­ist” the­ory to fully boot up in my head, I would end up hav­ing to give away *all* my money, which seemed bad.

  • Eleanore be­lieves there is no im­por­tant ex­is­ten­tial risk, and is re­luc­tant to think through whether that might not be true, in case it ends up hi­jack­ing her whole life.

  • Fred does not want to no­tice how much smarter he is than most of his class­mates, lest he stop re­spect­ing them and treat­ing them well.

  • Gina has mixed feel­ings about pur­su­ing money—she mostly avoids it—be­cause she wants to re­main a “car­ing per­son”, and she has a feel­ing that be­com­ing strate­gic about money would some­how in­volve giv­ing up on that.

It seems to me that in each of these cases, the per­son has an ar­guably worth­while goal that they might some­how lose track of (or might ac­ci­den­tally lose the abil­ity to act on) if they think some *other* mat­ter through—ar­guably be­cause of a defi­ciency of men­tal “buck­ets”.

More­over, “buck­ets er­rors” aren’t just thin­gies that af­fect think­ing in prospect—they also get ac­tu­ally made in real life. It seems to me that one rather of­ten runs into adults who de­cided they weren’t al­lowed to like math af­ter failing a quiz in 2nd grade; or who gave up on mean­ing for a cou­ple years af­ter los­ing their re­li­gion; or who oth­er­wise make some sort of vi­tal “buck­ets er­ror” that dis­torts a good chunk of their lives. Although of course this is mostly guess­work, and it is hard to know ac­tual causal­ity.

How I try to avoid “buck­ets er­rors”:

I ba­si­cally just try to do the “ob­vi­ous” thing: when I no­tice I’m averse to tak­ing in “ac­cu­rate” in­for­ma­tion, I ask my­self what would be bad about tak­ing in that in­for­ma­tion.[3] Usu­ally, I get a con­crete an­swer, like “If I no­ticed I could’ve saved all that time, I’ll have to feel bad”, or “if AI timelines are maybe-near, then I’d have to re­think all my plans”, or what have you.

Then, I re­mem­ber that I can con­sider each vari­able sep­a­rately. For ex­am­ple, I can think about whether AI timelines are maybe-near; and if they are, I can always de­cide to not-re­think my plans any­how, if that’s ac­tu­ally bet­ter. I men­tally list out all the de­ci­sions that *don’t* need to be si­mul­ta­neously forced by the info; and I promise my­self that I can take the time to get these other de­ci­sions not-wrong, even af­ter con­sid­er­ing the new info.

Fi­nally, I check to see if tak­ing in the in­for­ma­tion is still aver­sive. If it is, I keep try­ing to dis­assem­ble the aver­sive­ness into com­po­nent lego blocks un­til it isn’t. Once it isn’t aver­sive, I go ahead and think it through bit by bit, like with the pizza.

This is a change from how I used to think about flinches: I used to be moral­is­tic, and to feel dis­ap­proval when I no­ticed a flinch, and to as­sume the flinch had no pos­i­tive pur­pose. I there­fore used to try to just grit my teeth and think about the painful thing, with­out first “fac­tor­ing” the “pur­poses” of the flinch, as I do now. But I think my new rit­ual is bet­ter, at least now that I have enough in­tro­spec­tive skill that I can gen­er­ally finish this pro­ce­dure in finite time, and can still end up go­ing forth and tak­ing in the info a few min­utes later.

(Eliezer once de­scribed what I take to be the a similar rit­ual for avoid­ing bucket er­rors, as fol­lows: When de­cid­ing which apart­ment to rent (he said), one should first do out the math, and es­ti­mate the num­ber of dol­lars each would cost, the num­ber of min­utes of com­mute time times the rate at which one val­ues one’s time, and so on. But at the end of the day, if the math says the wrong thing, one should do the right thing any­way.)

[1]: As an anal­ogy: some­times, while pro­gram­ming, I’ve had the ex­pe­rience of:

  1. Writ­ing a pro­gram I think is maybe-cor­rect;

  2. In­putting 0 as a test-case, and know­ing ahead of time that the out­put should be, say, “7”;

  3. See­ing in­stead that the out­put was “5”; and

  4. Be­ing re­ally tempted to just add a “+2” into the pro­gram, so that this case will be right.

This edit is the wrong move, but not be­cause of what it does to MyPro­gram(0) — MyPro­gram(0) re­ally is right. It’s the wrong move be­cause it maybe messes up the pro­gram’s *other* out­puts.

Similarly, chang­ing up my be­liefs about how my fi­nances should work in or­der to get a pizza on a day when I want one *might* help with get­ting the right an­swer to­day about the pizza — it isn’t clear — but it’d risk mess­ing up other, fu­ture de­ci­sions.

The prob­lem with ra­tio­nal­iza­tion and men­tal flinches, IMO, isn’t so much the “in­tended” ac­tion that the ra­tio­nal­iza­tion or flinch ac­com­plishes in the mo­ment, but the mess it leaves of the code af­ter­ward.

[2] To be a bit more nit­picky about this: the prin­ci­ple I go for in such cases isn’t ac­tu­ally “af­ter think­ing it through, do the best thing”. It’s more like “af­ter think­ing it through, do the thing that, if re­li­ably al­lowed to be the de­ci­sion-crite­rion, will al­low in­for­ma­tion to flow freely within my head”.

The idea here is that my brain is some­times mo­ti­vated to achieve cer­tain things; and if I don’t al­low that at­tempted achieve­ment to oc­cur in plain sight, I in­cen­tivize my brain to sneak around be­hind my back and twist up my code base in an at­tempt to achieve those things. So, I try not to do that.

This is one rea­son it seems bad to me when peo­ple try to take “max­i­mize all hu­man well-be­ing, added evenly across peo­ple, with­out tak­ing my­self or my loved ones as spe­cial” as their goal. (Or any other fake util­ity func­tion.)

[3] To de­scribe this “ask­ing” pro­cess more con­cretely: I some­times do this as fol­lows: I con­cretely vi­su­al­ize a ‘magic but­ton’ that will cause me to take in the in­for­ma­tion. I reach to­ward the but­ton, and tell my brain I’m re­ally go­ing to press it when I finish count­ing down, un­less there are any ob­jec­tions (“3… 2… no ob­jec­tions, right?… 1...”). Usu­ally I then get a bit of an an­swer — a brief flash of worry, or a word or image or as­so­ci­a­tion.

Some­times the thing I get is already clear, like “if I ac­tu­ally did the forms wrong, and I no­tice, I’ll have to redo them”. Then all I need to do is sep­a­rate it into buck­ets (“How about if I figure out whether I did them wrong, and then, if I don’t want to redo them, I can always just not?”).

Other times, what I get is more like quick non­ver­bal flash, or a feel­ing of aver­sion with­out know­ing why. In such cases, I try to keep “feel­ing near” the aver­sion. I might for ex­am­ple try think­ing of differ­ent guesses (“Is it that I’d have to redo the forms?… no… Is it that it’d be em­bar­rass­ing?… no…”). The idea here is to see if any of the guesses “res­onate” a bit, or cause the feel­ing of aver­sive­ness to be­come tem­porar­ily a bit more vivid-feel­ing.

For a more de­tailed ver­sion of these in­struc­tions, and more thoughts on how to avoid bucket er­rors in gen­eral (un­der differ­ent ter­minol­ogy), you might want to check out Eu­gene Gendlin’s au­dio­book “Fo­cus­ing”.