Dark Arts of Rationality

To­day, we’re go­ing to talk about Dark ra­tio­nal­ist tech­niques: pro­duc­tivity tools which seem in­co­her­ent, mad, and down­right ir­ra­tional. Th­ese tech­niques in­clude:

  1. Willful Inconsistency

  2. In­ten­tional Compartmentalization

  3. Mod­ify­ing Ter­mi­nal Goals

I ex­pect many of you are already up in arms. It seems ob­vi­ous that con­sis­tency is a virtue, that com­part­men­tal­iza­tion is a flaw, and that one should never mod­ify their ter­mi­nal goals.

I claim that these ‘ob­vi­ous’ ob­jec­tions are in­cor­rect, and that all three of these tech­niques can be in­stru­men­tally ra­tio­nal.

In this ar­ti­cle, I’ll pro­mote the strate­gic cul­ti­va­tion of false be­liefs and con­done mind­hack­ing on the val­ues you hold most dear. Truly, these are Dark Arts. I aim to con­vince you that some­times, the benefits are worth the price.

Chang­ing your Ter­mi­nal Goals

In many games there is no “ab­solutely op­ti­mal” strat­egy. Con­sider the Pri­soner’s Dilemma. The op­ti­mal strat­egy de­pends en­tirely upon the strate­gies of the other play­ers. En­tirely.

In­tu­itively, you may be­lieve that there are some fixed “ra­tio­nal” strate­gies. Per­haps you think that even though com­plex be­hav­ior is de­pen­dent upon other play­ers, there are still some con­stants, like “Never co­op­er­ate with Defec­tBot”. Defec­tBot always defects against you, so you should never co­op­er­ate with it. Co­op­er­at­ing with Defec­tBot would be in­sane. Right?

Wrong. If you find your­self on a play­ing field where ev­ery­one else is a Trol­lBot (play­ers who co­op­er­ate with you if and only if you co­op­er­ate with Defec­tBot) then you should co­op­er­ate with Defec­tBots and defect against Trol­lBots.

Con­sider that. There are play­ing fields where you should co­op­er­ate with Defec­tBot, even though that looks com­pletely in­sane from a naïve view­point. Op­ti­mal­ity is not a fea­ture of the strat­egy, it is a re­la­tion­ship be­tween the strat­egy and the play­ing field.

Take this les­son to heart: in cer­tain games, there are strange play­ing fields where the op­ti­mal move looks com­pletely ir­ra­tional.

I’m here to con­vince you that life is one of those games, and that you oc­cupy a strange play­ing field right now.


Here’s a toy ex­am­ple of a strange play­ing field, which illus­trates the fact that even your ter­mi­nal goals are not sa­cred:

Imag­ine that you are com­pletely self-con­sis­tent and have a util­ity func­tion. For the sake of the thought ex­per­i­ment, pre­tend that your ter­mi­nal goals are dis­tinct, ex­clu­sive, or­thog­o­nal, and clearly la­beled. You value your goals be­ing achieved, but you have no prefer­ences about how they are achieved or what hap­pens af­ter­wards (un­less the goal ex­plic­itly men­tions the past/​fu­ture, in which case achiev­ing the goal puts limits on the past/​fu­ture). You pos­sess at least two ter­mi­nal goals, one of which we will call A.

Omega de­scends from on high and makes you an offer. Omega will cause your ter­mi­nal goal A to be­come achieved over a cer­tain span of time, with­out any ex­pen­di­ture of re­sources. As a price of tak­ing the offer, you must switch out ter­mi­nal goal A for ter­mi­nal goal B. Omega guaran­tees that B is or­thog­o­nal to A and all your other ter­mi­nal goals. Omega fur­ther guaran­tees that you will achieve B us­ing less time and re­sources than you would have spent on A. Any other con­cerns you have are ad­dressed via similar guaran­tees.

Clearly, you should take the offer. One of your ter­mi­nal goals will be achieved, and while you’ll be pur­su­ing a new ter­mi­nal goal that you (be­fore the offer) don’t care about, you’ll come out ahead in terms of time and re­sources which can be spent achiev­ing your other goals.

So the op­ti­mal move, in this sce­nario, is to change your ter­mi­nal goals.

There are times when the op­ti­mal move of a ra­tio­nal agent is to hack its own ter­mi­nal goals.

You may find this counter-in­tu­itive. It helps to re­mem­ber that “op­ti­mal­ity” de­pends as much upon the play­ing field as upon the strat­egy.

Next, I claim that such sce­nar­ios not re­stricted to toy games where Omega messes with your head. Hu­mans en­counter similar situ­a­tions on a day-to-day ba­sis.


Hu­mans of­ten find them­selves in a po­si­tion where they should mod­ify their ter­mi­nal goals, and the rea­son is sim­ple: our thoughts do not have di­rect con­trol over our mo­ti­va­tion.

Un­for­tu­nately for us, our “mo­ti­va­tion cir­cuits” can dis­t­in­guish be­tween ter­mi­nal and in­stru­men­tal goals. It is of­ten eas­ier to put in effort, ex­pe­rience in­spira­tion, and work tire­lessly when pur­su­ing a ter­mi­nal goal as op­posed to an in­stru­men­tal goal. It would be nice if this were not the case, but it’s a fact of our hard­ware: we’re go­ing to do X more if we want to do X for its own sake as op­posed to when we force X upon our­selves.

Con­sider, for ex­am­ple, a young woman who wants to be a rock­star. She wants the fame, the money, and the lifestyle: these are her “ter­mi­nal goals”. She lives in some strange world where rock­star­dom is wholly de­pen­dent upon merit (rather than so­cial luck and net­work effects), and de­cides that in or­der to be­come a rock­star she has to pro­duce re­ally good mu­sic.

But here’s the prob­lem: She’s a hu­man. Her con­scious de­ci­sions don’t di­rectly af­fect her mo­ti­va­tion.

In her case, it turns out that she can make bet­ter mu­sic when “Make Good Mu­sic” is a ter­mi­nal goal as op­posed to an in­stru­men­tal goal.

When “Make Good Mu­sic” is an in­stru­men­tal goal, she sched­ules prac­tice time on a sitar and grinds out the hours. But she doesn’t re­ally like it, so she cuts cor­ners when­ever akra­sia comes knock­ing. She lacks in­spira­tion and spends her spare hours dream­ing of star­dom. Her songs are shal­low and trite.

When “Make Good Mu­sic” is a ter­mi­nal goal, mu­sic pours forth, and she spends ev­ery spare hour play­ing her sitar: not be­cause she knows that she “should” prac­tice, but be­cause you couldn’t pry her sitar from her cold dead fingers. She’s not “prac­tic­ing”, she’s pour­ing out her soul, and no power in the ’verse can stop her. Her songs are emo­tional, deep, and mov­ing.

It’s ob­vi­ous that she should adopt a new ter­mi­nal goal.

Ideally, we would be just as mo­ti­vated to carry out in­stru­men­tal goals as we are to carry out ter­mi­nal goals. In re­al­ity, this is not the case. As a hu­man, your mo­ti­va­tion sys­tem does dis­crim­i­nate be­tween the goals that you feel obli­gated to achieve and the goals that you pur­sue as ends unto them­selves.

As such, it is some­times in your best in­ter­est to mod­ify your ter­mi­nal goals.


Mind the ter­minol­ogy, here. When I speak of “ter­mi­nal goals” I mean ac­tions that feel like ends unto them­selves. I am speak­ing of the stuff you wish you were do­ing when you’re do­ing bor­ing stuff, the things you do in your free time just be­cause they are fun, the ac­tions you don’t need to jus­tify.

This seems like the ob­vi­ous mean­ing of “ter­mi­nal goals” to me, but some of you may think of “ter­mi­nal goals” more akin to self-en­dorsed morally sound end-val­ues in some con­sis­tent util­ity func­tion. I’m not talk­ing about those. I’m not even con­vinced I have any.

Both types of “ter­mi­nal goal” are sus­cep­ti­ble to strange play­ing fields in which the op­ti­mal move is to change your goals, but it is only the former type of goal — the ac­tions that are sim­ply fun, that need no jus­tifi­ca­tion — which I’m sug­gest­ing you tweak for in­stru­men­tal rea­sons.


I’ve largely re­frained from goal-hack­ing, per­son­ally. I bring it up for a few rea­sons:

  1. It’s the eas­iest Dark Side tech­nique to jus­tify. It helps break peo­ple out of the mind­set where they think op­ti­mal ac­tions are the ones that look ra­tio­nal in a vac­uum. Re­mem­ber, op­ti­mal­ity is a fea­ture of the play­ing field. Some­times co­op­er­at­ing with Defec­tBot is the best strat­egy!

  2. Goal hack­ing segues nicely into the other Dark Side tech­niques which I use fre­quently, as you will see shortly.

  3. I have met many peo­ple who would benefit from a solid bout of goal-hack­ing.

I’ve crossed paths with many a con­fused per­son who (with­out any ex­plicit thought on their part) had re­ally silly ter­mi­nal goals. We’ve all met peo­ple who are act­ing as if “Ac­quire Money” is a ter­mi­nal goal, never notic­ing that money is al­most en­tirely in­stru­men­tal in na­ture. When you ask them “but what would you do if money was no is­sue and you had a lot of time”, all you get is a blank stare.

Even the LessWrong Wiki en­try on ter­mi­nal val­ues de­scribes a col­lege stu­dent for which uni­ver­sity is in­stru­men­tal, and get­ting a job is ter­mi­nal. This seems like a clear-cut case of a Lost Pur­pose: a job seems clearly in­stru­men­tal. And yet, we’ve all met peo­ple who act as if “Have a Job” is a ter­mi­nal value, and who then seem aim­less and undi­rected af­ter find­ing em­ploy­ment.

Th­ese peo­ple could use some goal hack­ing. You can ar­gue that Ac­quire Money and Have a Job aren’t “re­ally” ter­mi­nal goals, to which I counter that many peo­ple don’t know their ass from their elbow when it comes to their own goals. Goal hack­ing is an im­por­tant part of be­com­ing a ra­tio­nal­ist and/​or im­prov­ing men­tal health.

Goal-hack­ing in the name of con­sis­tency isn’t re­ally a Dark Side power. This power is only Dark when you use it like the mu­si­cian in our ex­am­ple, when you adopt ter­mi­nal goals for in­stru­men­tal rea­sons. This form of goal hack­ing is less com­mon, but can be very effec­tive.

I re­cently had a per­sonal con­ver­sa­tion with Alexei, who is earn­ing to give. He noted that he was not en­tirely satis­fied with his day-to-day work, and mused that per­haps goal-hack­ing (mak­ing “Do Well at Work” an end unto it­self) could make him more effec­tive, gen­er­ally hap­pier, and more pro­duc­tive in the long run.

Goal-hack­ing can be a pow­er­ful tech­nique, when cor­rectly ap­plied. Re­mem­ber, you’re not in di­rect con­trol of your mo­ti­va­tion cir­cuits. Some­times, strange though it seems, the op­ti­mal ac­tion in­volves fool­ing your­self.

You don’t get good at pro­gram­ming by sit­ting down and forc­ing your­self to prac­tice for three hours a day. I mean, I sup­pose you could get good at pro­gram­ming that way. But it’s much eas­ier to get good at pro­gram­ming by lov­ing pro­gram­ming, by be­ing the type of per­son who spends ev­ery spare hour tin­ker­ing on a pro­ject. Be­cause then it doesn’t feel like prac­tice, it feels like fun.

This is the power that you can har­ness, if you’re will­ing to tam­per with your ter­mi­nal goals for in­stru­men­tal rea­sons. As ra­tio­nal­ists, we would pre­fer to ded­i­cate to in­stru­men­tal goals the same vi­gor that is re­served for ter­mi­nal goals. Un­for­tu­nately, we find our­selves on a strange play­ing field where goals that feel jus­tified in their own right win the lion’s share of our at­ten­tion.

Given this strange play­ing field, goal-hack­ing can be op­ti­mal.

You don’t have to com­pletely man­gle your goal sys­tem. Our as­piring mu­si­cian from ear­lier doesn’t need to de­stroy her “Be­come a Rock­star” goal in or­der to adopt the “Make Good Mu­sic” goal. If you can suc­cess­fully con­vince your­self to be­lieve that some­thing in­stru­men­tal is a means unto it­self (e.g. ter­mi­nal), while still be­liev­ing that it is in­stru­men­tal, then more power to you.

This is, of course, an in­stance of In­ten­tional Com­part­men­tal­iza­tion.

In­ten­tional Compartmentalization

As soon as you en­dorse mod­ify­ing your own ter­mi­nal goals, In­ten­tional Com­part­men­tal­iza­tion starts look­ing like a pretty good idea. If Omega offers to achieve A at the price of drop­ping A and adopt­ing B, the ideal move is to take the offer af­ter find­ing a way to not ac­tu­ally care about B.

A con­sis­tent agent can­not do this, but I have good news for you: You’re a hu­man. You’re not con­sis­tent. In fact, you’re great at be­ing in­con­sis­tent!

You might ex­pect it to be difficult to add a new ter­mi­nal goal while still be­liev­ing that it’s in­stru­men­tal. You may also run into strange situ­a­tions where hold­ing an in­stru­men­tal goal as ter­mi­nal di­rectly con­tra­dicts other ter­mi­nal goals.

For ex­am­ple, our as­piring mu­si­cian might find that she makes even bet­ter mu­sic if “Be­come a Rock­star” is not among her ter­mi­nal goals.

This means she’s in trou­ble: She ei­ther has to drop “Be­come a Rock­star” and have a bet­ter chance at ac­tu­ally be­com­ing a rock­star, or she has to set­tle for a de­creased chance that she’ll be­come a rock­star.

Or, rather, she would have to set­tle for one of these choices — if she wasn’t hu­man.

I have good news! Hu­mans are re­ally re­ally good at be­ing in­con­sis­tent, and you can lev­er­age this to your ad­van­tage. Com­part­men­tal­ize! Main­tain goals that are “ter­mi­nal” in one com­part­ment, but which you know are “in­stru­men­tal” in an­other, then sim­ply never let those com­part­ments touch!

This may sound com­pletely crazy and ir­ra­tional, but re­mem­ber: you aren’t ac­tu­ally in con­trol of your mo­ti­va­tion sys­tem. You find your­self on a strange play­ing field, and the op­ti­mal move may in fact re­quire men­tal con­tor­tions that make epistemic ra­tio­nal­ists shud­der.

Hope­fully you never run into this par­tic­u­lar prob­lem (hold­ing con­tra­dic­tory goals in “ter­mi­nal” po­si­tions), but this illus­trates that there are sce­nar­ios where com­part­men­tal­iza­tion works in your fa­vor. Of course we’d pre­fer to have di­rect con­trol of our mo­ti­va­tion sys­tems, but given that we don’t, com­part­men­tal­iza­tion is a huge as­set.

Take a mo­ment and let this sink in be­fore mov­ing on.

Once you re­al­ize that com­part­men­tal­iza­tion is OK, you are ready to prac­tice my sec­ond Dark Side tech­nique: In­ten­tional Com­part­men­tal­iza­tion. It has many uses out­side the realm of goal-hack­ing.

See, mo­ti­va­tion is a fickle beast. And, as you’ll re­mem­ber, your con­scious choices are not di­rectly at­tached to your mo­ti­va­tion lev­els. You can’t just de­cide to be more mo­ti­vated.

At least, not di­rectly.

I’ve found that cer­tain be­liefs — be­liefs which I know are wrong — can make me more pro­duc­tive. (On a re­lated note, re­mem­ber that re­li­gious or­ga­ni­za­tions are gen­er­ally more co­or­di­nated than ra­tio­nal­ist groups.)

It turns out that, un­der these false be­liefs, I can tap into mo­ti­va­tional re­serves that are oth­er­wise un­available. The only prob­lem is, I know that these be­liefs are down­right false.

I’m just kid­ding, that’s not ac­tu­ally a prob­lem. Com­part­men­tal­iza­tion to the res­cue!

Here’s a cou­ple ex­am­ple be­liefs that I keep locked away in my men­tal com­part­ments, bound up in chains. Every so of­ten, when I need to be ex­tra pro­duc­tive, I don my pro­tec­tive gear and en­ter these com­part­ments. I never fully be­lieve these things — not globally, at least — but I’m ca­pa­ble of at­tain­ing “lo­cal be­lief”, of act­ing as if I hold these be­liefs. This, it turns out, is enough.

Noth­ing is Beyond My Grasp

We’ll start off with a tame be­lief, some­thing that is soundly rooted in ev­i­dence out­side of its lit­tle com­part­ment.

I have a global be­lief, out­side all my com­part­ments, that noth­ing is be­yond my grasp.

Others may un­der­stand things eas­ier I do or faster than I do. Peo­ple smarter than my­self grok con­cepts with less effort than I. It may take me years to wrap my head around things that other peo­ple find triv­ial. How­ever, there is no idea that a hu­man has ever had that I can­not, in prin­ci­ple, grok.

I be­lieve this with mod­er­ately high prob­a­bil­ity, just based on my own gen­eral in­tel­li­gence and the fact that brains are so tightly clus­tered in mind-space. It may take me a hun­dred times the effort to un­der­stand some­thing, but I can still un­der­stand it even­tu­ally. Even things that are be­yond the grasp of a mea­ger hu­man mind, I will one day be able to grasp af­ter I up­grade my brain. Even if there are limits im­posed by re­al­ity, I could in prin­ci­ple over­come them if I had enough com­put­ing power. Given any finite idea, I could in the­ory be­come pow­er­ful enough to un­der­stand it.

This be­lief, it­self, is not com­part­men­tal­ized. What is com­part­men­tal­ized is the cer­tainty.

In­side the com­part­ment, I be­lieve that Noth­ing is Beyond My Grasp with 100% con­fi­dence. Note that this is ridicu­lous: there’s no such thing as 100% con­fi­dence. At least, not in my global be­liefs. But in­side the com­part­ments, while we’re in la-la land, it helps to treat Noth­ing is Beyond My Grasp as raw, im­mutable fact.

You might think that it’s suffi­cient to be­lieve Noth­ing is Beyond My Grasp with very high prob­a­bil­ity. If that’s the case, you haven’t been listen­ing: I don’t ac­tu­ally be­lieve Noth­ing is Beyond My Grasp with an ex­traor­di­nar­ily high prob­a­bil­ity. I be­lieve it with mod­er­ate prob­a­bil­ity, and then I have a com­part­ment in which it’s a cer­tainty.

It would be nice if I never needed to use the com­part­ment, if I could face down tech­ni­cal prob­lems and in­com­pre­hen­si­ble lingo and be­ing re­ally out of my depth with a rel­a­tively high con­fi­dence that I’m go­ing to be able to make sense of it all. How­ever, I’m not in di­rect con­trol of my mo­ti­va­tion. And it turns out that, through some quirk in my psy­chol­ogy, it’s eas­ier to face down the op­pres­sive feel­ing of be­ing in way over my head if I have this rock-solid “be­lief” that Noth­ing is Beyond My Grasp.

This is what the com­part­ments are good for: I don’t ac­tu­ally be­lieve the things in­side them, but I can still act as if I do. That abil­ity al­lows me to face down challenges that would be difficult to face down oth­er­wise.

This com­part­ment was largely con­structed with the help of The Phan­tom Tol­l­booth: it taught me that there are cer­tain im­pos­si­ble tasks you can do if you think they’re pos­si­ble. It’s not always enough to know that if I be­lieve I can do a thing, then I have a higher prob­a­bil­ity of be­ing able to do it. I get an ex­tra boost from be­liev­ing I can do any­thing.

You might be sur­prised about how much you can do when you have a men­tal com­part­ment in which you are un­stop­pable.

My Willpower Does Not Deplete

Here’s an­other: My Willpower Does Not De­plete.

Ok, so my willpower ac­tu­ally does de­plete. I’ve been writ­ing about how it does, and dis­cussing meth­ods that I use to avoid de­ple­tion. Right now, I’m writ­ing about how I’ve ac­knowl­edged the fact that my willpower does de­plete.

But I have this com­part­ment where it doesn’t.

Ego de­ple­tion is a funny thing. If you don’t be­lieve in ego de­ple­tion, you suffer less ego de­ple­tion. This does not elimi­nate ego de­ple­tion.

Know­ing this, I have a com­part­ment in which My Willpower Does Not De­plete. I go there of­ten, when I’m study­ing. It’s easy, I think, for one to be­gin to feel tired, and say “oh, this must be ego de­ple­tion, I can’t work any­more.” When­ever my brain tries to go there, I wheel this bad boy out of his cage. “Nope”, I re­spond, “My Willpower Does Not De­plete”.

Sur­pris­ingly, this of­ten works. I won’t force my­self to keep work­ing, but I’m pretty good at pre­vent­ing men­tal es­cape at­tempts via “phan­tom akra­sia”. I don’t al­low my­self to in­voke ego de­ple­tion or akra­sia to stop be­ing pro­duc­tive, be­cause My Willpower Does Not De­plete. I have to ac­tu­ally be tired out, in a way that doesn’t trig­ger the My Willpower Does Not De­plete safe­guards. This doesn’t let me keep go­ing for­ever, but it pre­vents a lot of false alarms.

In my ex­pe­rience, the strong ver­sion (My Willpower Does Not De­plete) is much more effec­tive than the weak ver­sion (My Willpower is Not De­pleted Yet), even though it’s more wrong. This prob­a­bly says some­thing about my per­son­al­ity. Your mileage may vary. Keep in mind, though, that the effec­tive­ness of your men­tal com­part­ments may de­pend more on the mo­ti­va­tional con­tent than on de­gree of false­hood.

Any­thing is a Placebo

Place­bos work even when you know they are place­bos.

This is the sort of mad­ness I’m talk­ing about, when I say things like “you’re on a strange play­ing field”.

Know­ing this, you can eas­ily ac­ti­vate the placebo effect man­u­ally. Feel­ing sick? Here’s a free­bie: drink more wa­ter. It will make you feel bet­ter.

No? It’s just a placebo, you say? Doesn’t mat­ter. Tell your­self that wa­ter makes it bet­ter. Put that in a nice lit­tle com­part­ment, save it for later. It doesn’t mat­ter that you know what you’re do­ing: your brain is eas­ily fooled.

Want to be more pro­duc­tive, be healthier, and ex­er­cise more effec­tively? Try us­ing Any­thing is a Placebo! Pick some­thing triv­ial and non-harm­ful and tell your­self that it helps you perform bet­ter. Put the be­lief in a com­part­ment in which you act as if you be­lieve the thing. Cog­ni­tive dis­so­nance doesn’t mat­ter! Your brain is great at ig­nor­ing cog­ni­tive dis­so­nance. You can “know” you’re wrong in the global case, while “be­liev­ing” you’re right lo­cally.

For bonus points, try com­bin­ing ob­jec­tives. Are you con­stantly un­der­hy­drated? Try be­liev­ing that drink­ing more wa­ter makes you more alert!

Brains are weird.


Truly, these are the Dark Arts of in­stru­men­tal ra­tio­nal­ity. Epistemic ra­tio­nal­ists re­coil in hor­ror as I ad­vo­cate in­ten­tion­ally cul­ti­vat­ing false be­liefs. It goes with­out say­ing that you should use this tech­nique with care. Re­mem­ber to always au­dit your com­part­men­tal­ized be­liefs through the lens of your ac­tual be­liefs, and be very care­ful not to let in­cor­rect be­liefs leak out of their com­part­ments.

If you think you can achieve similar benefits with­out “fool­ing your­self”, then by all means, do so. I haven’t been able to find effec­tive al­ter­na­tives. Brains have been hon­ing com­part­men­tal­iza­tion tech­niques for eons, so I figure I might as well re-use the hard­ware.

It’s im­por­tant to re­it­er­ate that these tech­niques are nec­es­sary be­cause you’re not ac­tu­ally in con­trol of your own mo­ti­va­tion. Some­times, in­cor­rect be­liefs make you more mo­ti­vated. In­ten­tion­ally cul­ti­vat­ing in­cor­rect be­liefs is surely a path to the Dark Side: com­part­men­tal­iza­tion only miti­gates the dam­age. If you make sure you seg­re­gate the bad be­liefs and ac­knowl­edge them for what they are then you can get much of the benefit with­out pay­ing the cost, but there is still a cost, and the cur­rency is cog­ni­tive dis­so­nance.

At this point, you should be mildly un­com­fortable. After all, I’m ad­vo­cat­ing some­thing which is com­pletely epistem­i­cally ir­ra­tional. We’re not done yet, though.

I have one more Dark Side tech­nique, and it’s worse.

Willful Inconsistency

I use In­ten­tional Com­part­men­tal­iza­tion to “lo­cally be­lieve” things that I don’t “globally be­lieve”, in cases where the lo­cal be­lief makes me more pro­duc­tive. In this case, the be­liefs in the com­part­ments are things that I tell my­self. They’re like mantras that I re­peat in my head, at the Sys­tem 2 level. Sys­tem 1 is frag­mented and com­part­men­tal­ized, and hap­pily obliges.

Willful In­con­sis­tency is the grown-up, scary ver­sion of In­ten­tional Com­part­men­tal­iza­tion. It in­volves con­vinc­ing Sys­tem 1 wholly and en­tirely of some­thing that Sys­tem 2 does not ac­tu­ally be­lieve. There’s no com­part­men­tal­iza­tion and no frag­men­ta­tion. There’s nowhere to shove the in­cor­rect be­lief when you’re done with it. It’s taken over the in­tu­ition, and it’s always on. Willful In­con­sis­tency is about hav­ing gut-level in­tu­itive be­liefs that you ex­plic­itly dis­avow.

Your in­tu­itions run the show when­ever you’re not pay­ing at­ten­tion, so if you’re willfully in­con­sis­tent then you’re go­ing to ac­tu­ally act as if these in­cor­rect be­liefs are true in your day-to-day life, un­less your forcibly over­ride your de­fault ac­tions. Ego de­ple­tion and dis­trac­tion make you vuln­er­a­ble to your­self.

Use this tech­nique with cau­tion.

This may seem in­sane even to those of you who took the pre­vi­ous sug­ges­tions in stride. That you must some­times al­ter your ter­mi­nal goals is a fea­ture of the play­ing field, not the agent. The fact that you are not in di­rect con­trol of your mo­ti­va­tion sys­tem read­ily im­plies that trick­ing your­self is use­ful, and com­part­men­tal­iza­tion is an ob­vi­ous way to miti­gate the dam­age.

But why would any­one ever try to con­vince them­selves, deep down at the core, of some­thing that they don’t ac­tu­ally be­lieve?

The an­swer is sim­ple: spe­cial­iza­tion.

To illus­trate, let me ex­plain how I use willful in­con­sis­tency.

I have in­voked Willful In­con­sis­tency on only two oc­ca­sions, and they were similar in na­ture. Only one in­stance of Willful In­con­sis­tency is cur­rently ac­tive, and it works like this:

I have com­pletely and to­tally con­vinced my in­tu­itions that un­friendly AI is a prob­lem. A big prob­lem. Sys­tem 1 op­er­ates un­der the as­sump­tion that UFAI will come to pass in the next twenty years with very high prob­a­bil­ity.

You can imag­ine how this is some­what mo­ti­vat­ing.

On the con­scious level, within Sys­tem 2, I’m much less cer­tain. I solidly be­lieve that UFAI is a big prob­lem, and that it’s the prob­lem that I should be fo­cus­ing my efforts on. How­ever, my er­ror bars are far wider, my times­pan is quite broad. I ac­knowl­edge a de­cent prob­a­bil­ity of soft take­off. I as­sign mod­er­ate prob­a­bil­ities to a num­ber of other ex­is­ten­tial threats. I think there are a large num­ber of un­known un­knowns, and there’s a non-zero chance that the sta­tus quo con­tinues un­til I die (and that I can’t later be brought back). All this I know.

But, right now, as I type this, my in­tu­ition is scream­ing at me that the above is all wrong, that my er­ror bars are nar­row, and that I don’t ac­tu­ally ex­pect the sta­tus quo to con­tinue for even thirty years.

This is just how I like things.

See, I am con­vinced that build­ing a friendly AI is the most im­por­tant prob­lem for me to be work­ing on, even though there is a very real chance that MIRI’s re­search won’t turn out to be cru­cial. Per­haps other ex­is­ten­tial risks will get to us first. Per­haps we’ll get brain up­loads and Robin Han­son’s em­u­la­tion econ­omy. Per­haps it’s go­ing to take far longer than ex­pected to crack gen­eral in­tel­li­gence. How­ever, af­ter much re­flec­tion I have con­cluded that de­spite the un­cer­tainty, this is where I should fo­cus my efforts.

The prob­lem is, it’s hard to trans­late that de­ci­sion down to Sys­tem 1.

Con­sider a toy sce­nario, where there are ten prob­lems in the world. Imag­ine that, in the face of un­cer­tainty and diminish­ing re­turns from re­search effort, I have con­cluded that the world should al­lo­cate 30% of re­sources to prob­lem A, 25% to prob­lem B, 10% to prob­lem C, and 5% to each of the re­main­ing prob­lems.

Be­cause spe­cial­iza­tion leads to mas­sive benefits, it’s much more effec­tive to ded­i­cate 30% of re­searchers to work­ing on prob­lem A rather than hav­ing all re­searchers ded­i­cate 30% of their time to prob­lem A. So pre­sume that, in light of these con­clu­sions, I de­cide to ded­i­cate my­self to prob­lem A.

Here we have a prob­lem: I’m sup­posed to spe­cial­ize in prob­lem A, but at the in­tu­itive level prob­lem A isn’t that big a deal. It’s only 30% of the prob­lem space, af­ter all, and it’s not re­ally that much worse than prob­lem B.

This would be no is­sue if I were in con­trol of my own mo­ti­va­tion sys­tem: I could put the blin­ders on and fo­cus on prob­lem A, crank the mo­ti­va­tion knob to max­i­mum, and trust ev­ery­one else to fo­cus on the other prob­lems and do their part.

But I’m not in con­trol of my mo­ti­va­tion sys­tem. If my in­tu­itions know that there are a num­ber of other similarly wor­thy prob­lems that I’m ig­nor­ing, if they are dis­tracted by other is­sues of similar scope, then I’m tempted to work on ev­ery­thing at once. This is bad, be­cause out­put is max­i­mized if we all spe­cial­ize.

Things get es­pe­cially bad when prob­lem A is highly un­cer­tain and un­likely to af­fect peo­ple for decades if not cen­turies. It’s very hard to con­vince the mon­key brain to care about far-fu­ture va­garies, even if I’ve ra­tio­nally con­cluded that those are where I should ded­i­cate my re­sources.

I find my­self on a strange play­ing field, where the op­ti­mal move is to lie to Sys­tem 1.

Allow me to make that more con­crete:

I’m much more mo­ti­vated to do FAI re­search when I’m in­tu­itively con­vinced that we have a hard 15 year timer un­til UFAI.

Ex­plic­itly, I be­lieve UFAI is one pos­si­bil­ity among many and that the timeframe should be mea­sured in decades rather than years. I’ve con­cluded that it is my most press­ing con­cern, but I don’t ac­tu­ally be­lieve we have a hard 15 year count­down.

That said, it’s hard to un­der­state how use­ful it is to have a gut-level feel­ing that there’s a short, hard timeline. This “knowl­edge” pushes the mon­key brain to go all out, no holds barred. In other words, this is the method by which I con­vince my­self to ac­tu­ally spe­cial­ize.

This is how I con­vince my­self to de­ploy ev­ery available re­source, to at­tack the prob­lem as if the stakes were in­cred­ibly high. Be­cause the stakes are in­cred­ibly high, and I do need to de­ploy ev­ery available re­source, even if we don’t have a hard 15 year timer.

In other words, Willful In­con­sis­tency is the tech­nique I use to force my in­tu­ition to feel as if the stakes are as high as I’ve calcu­lated them to be, given that my mon­key brain is bad at re­spond­ing to un­cer­tain vague fu­ture prob­lems. Willful In­con­sis­tency is my counter to Scope Insen­si­tivity: my in­tu­ition has difficulty be­liev­ing the re­sults when I do the mul­ti­pli­ca­tion, so I lie to it un­til it acts with ap­pro­pri­ate vi­gor.

This is the fi­nal se­cret weapon in my mo­ti­va­tional ar­se­nal.

I don’t per­son­ally recom­mend that you try this tech­nique. It can have harsh side effects, in­clud­ing feel­ings of guilt, in­tense stress, and mas­sive amounts of cog­ni­tive dis­so­nance. I’m able to do this in large part be­cause I’m in a very good headspace. I went into this with full knowl­edge of what I was do­ing, and I am con­fi­dent that I can back out (and ac­tu­ally cor­rect my in­tu­itions) if the need arises.

That said, I’ve found that cul­ti­vat­ing a gut-level feel­ing that what you’re do­ing must be done, and must be done quickly, is an ex­traor­di­nar­ily good mo­ti­va­tor. It’s such a strong mo­ti­va­tor that I sel­dom ex­plic­itly ac­knowl­edge it. I don’t need to men­tally in­voke “we have to study or the world ends”. Rather, this knowl­edge lingers in the back­ground. It’s not a mantra, it’s not some­thing that I re­peat and wear thin. In­stead, it’s this gut-level drive that sits un­der­neath it all, that makes me strive to go faster un­less I ex­plic­itly try to slow down.

This mon­key-brain tun­nel vi­sion, com­bined with a long habit of pro­duc­tivity, is what keeps me Mov­ing Towards the Goal.


Those are my Dark Side tech­niques: Willful In­con­sis­tency, In­ten­tional Com­part­men­tal­iza­tion, and Ter­mi­nal Goal Mod­ifi­ca­tion.

I ex­pect that these tech­niques will be rather con­tro­ver­sial. If I may be so bold, I recom­mend that dis­cus­sion fo­cus on goal-hack­ing and in­ten­tional com­part­men­tal­iza­tion. I ac­knowl­edge that willful in­con­sis­tency is un­healthy and I don’t gen­er­ally recom­mend that oth­ers try it. By con­trast, both goal-hack­ing and in­ten­tional com­part­men­tal­iza­tion are quite sane and, in­deed, in­stru­men­tally ra­tio­nal.

Th­ese are cer­tainly not tech­niques that I would recom­mend CFAR teach to new­com­ers, and I re­mind you that “it is dan­ger­ous to be half a ra­tio­nal­ist”. You can roy­ally screw you over if you’re still figur­ing out your be­liefs as you at­tempt to com­part­men­tal­ize false be­liefs. I recom­mend only us­ing them when you’re sure of what your goals are and con­fi­dent about the bor­ders be­tween your ac­tual be­liefs and your in­ten­tion­ally false “be­liefs”.

It may be sur­pris­ing that chang­ing ter­mi­nal goals can be an op­ti­mal strat­egy, and that hu­mans should con­sider adopt­ing in­cor­rect be­liefs strate­gi­cally. At the least, I en­courage you to re­mem­ber that there are no ab­solutely ra­tio­nal ac­tions.

Mod­ify­ing your own goals and cul­ti­vat­ing false be­liefs are use­ful be­cause we live in strange, ham­pered con­trol sys­tems. Your brain was op­ti­mized with no con­cern for truth, and op­ti­mal perfor­mance may re­quire self de­cep­tion. I re­mind the un­com­fortable that in­stru­men­tal ra­tio­nal­ity is not about be­ing the most con­sis­tent or the most cor­rect, it’s about win­ning. There are games where the op­ti­mal move re­quires adopt­ing false be­liefs, and if you find your­self play­ing one of those games, then you should adopt false be­liefs. In­stru­men­tal ra­tio­nal­ity and epistemic ra­tio­nal­ity can be pit­ted against each other.

We are for­tu­nate, as hu­mans, to be skil­led at com­part­men­tal­iza­tion: this helps us work around our men­tal hand­i­caps with­out sac­ri­fic­ing epistemic ra­tio­nal­ity. Of course, we’d rather not have the men­tal hand­i­caps in the first place: but you have to work with what you’re given.

We are weird agents with­out full con­trol of our own minds. We lack di­rect con­trol over im­por­tant as­pects of our­selves. For that rea­son, it’s of­ten nec­es­sary to take ac­tions that may seem con­tra­dic­tory, crazy, or down­right ir­ra­tional.

Just re­mem­ber this, be­fore you con­demn these tech­niques: op­ti­mal­ity is as much an as­pect of the play­ing field as of the strat­egy, and hu­mans oc­cupy a strange play­ing field in­deed.