Reality-Revealing and Reality-Masking Puzzles

Tl;dr: I’ll try here to show how CFAR’s “art of ra­tio­nal­ity” has evolved over time, and what has driven that evolu­tion.

In the course of this, I’ll in­tro­duce the dis­tinc­tion be­tween what I’ll call “re­al­ity-re­veal­ing puz­zles” and “re­al­ity-mask­ing puz­zles”—a dis­tinc­tion that I think is al­most nec­es­sary for any­one at­tempt­ing to de­velop a psy­cholog­i­cal art in ways that will help rather than harm. (And one I wish I’d had ex­plic­itly back when the Cen­ter for Ap­plied Ra­tion­al­ity was founded.)

I’ll also be try­ing to elab­o­rate, here, on the no­tion we at CFAR have re­cently been toss­ing around about CFAR be­ing an at­tempt to bridge be­tween com­mon sense and Sin­gu­lar­ity sce­nar­ios—an at­tempt to figure out how peo­ple can stay grounded in com­mon sense and or­di­nary de­cency and hu­mane val­ues and so on, while also tak­ing in (and plan­ning ac­tions within) the kind of uni­verse we may ac­tu­ally be liv­ing in.


Arts grow from puz­zles. I like to look at math­e­mat­ics, or mu­sic, or un­godly things like mar­ket­ing, and ask: What puz­zles were its cre­ators tin­ker­ing with that led them to leave be­hind these struc­tures? (Struc­tures now be­ing used by other peo­ple, for other rea­sons.)

I pic­ture arts like coral reefs. Co­ral polyps build shell-bits for their own rea­sons, but over time there ac­cu­mu­lates a reef us­able by oth­ers. Math built up like this—and math is now a pow­er­ful struc­ture for build­ing from. [Sales and Freud and mod­ern mar­ket­ing/​self-help/​sales etc. built up some pat­terns too—and our ba­sic way of see­ing each other and our­selves is now built partly in and from all these struc­tures, for bet­ter and for worse.]

So let’s ask: What sort of reef is CFAR liv­ing within, and adding to? From what puz­zles (what pat­terns of tin­ker­ing) has our “ra­tio­nal­ity” ac­cu­mu­lated?

Two kinds of puz­zles: “re­al­ity-re­veal­ing” and “re­al­ity-mask­ing”

First, some back­ground. Some puz­zles in­vite a kind of tin­ker­ing that lets the world in and leaves you smarter. A kid whit­tling with a pocket knife is en­tan­gling her mind with bits of re­al­ity. So is a driver who no­tices some­thing small about how pedes­tri­ans dart into streets, and ad­justs ac­cord­ingly. So also is the math­e­mat­i­cian at her daily work. And so on.

Other puz­zles (or other con­texts) in­vite a kind of tin­ker­ing that has the op­po­site effect. They in­vite a tin­ker­ing that grad­u­ally figures out how to mask parts of the world from your vi­sion. For ex­am­ple, some months into my work as a math tu­tor I re­al­ized I’d been un­con­sciously learn­ing how to cue my stu­dents into act­ing like my words made sense (even when they didn’t). I’d learned to mask from my own senses the clues about what my stu­dents were and were not learn­ing.

We’ll be refer­ring to these puz­zle-types a lot, so it’ll help to have a term for them. I’ll call these puz­zles “good” or “re­al­ity-re­veal­ing” puz­zles, and “bad” or “re­al­ity-mask­ing” puz­zles, re­spec­tively. Both puz­zle-types ap­pear abun­dantly in most folks’ lives, of­ten mixed to­gether. The same kid with the pocket knife who is busy en­tan­gling her mind with data about bark and wood­chips and fine mo­tor pat­terns (from the “good” puz­zle of “how can I whit­tle this stick”), may si­mul­ta­neously be busy tin­ker­ing with the “bad” puz­zle of “how can I not-no­tice when my cre­ations fall short of my hopes.”

(Even “good” puz­zles can cause skill loss: a per­son who stud­ies Dvo­rak may lose some of their QWERTY skill, and some­one who adapts to the un­self­con­scious ar­gu­ing of the math de­part­ment may do worse for a while in con­texts re­quiring tact. The dis­tinc­tion is that “good” puz­zles do this only in­ci­den­tally. Good puz­zles do not in­vite a search for con­figu­ra­tions that mask bits of re­al­ity. Whereas with me and my math tu­tees, say, there was a di­rect re­ward/​con­di­tion­ing re­sponse that hap­pened speci­fi­cally when the “they didn’t get it” sig­nal was masked from my view. There was a small op­ti­mizer in­side of me that was learn­ing how to mask parts of the world from me, via feed­back from the sys­tems of mine it was learn­ing to be­fud­dle.)

Also, cer­tain good puz­zles (and cer­tain bad ones!) al­low un­usu­ally pow­er­ful ac­cu­mu­la­tions across time. I’d list math, com­puter sci­ence, and the English lan­guage as ex­am­ples of un­usu­ally pow­er­ful ar­ti­facts for im­prov­ing vi­sion. I’d list “sales and mar­ket­ing skill” as an ex­am­ple of an un­usu­ally pow­er­ful ar­ti­fact for im­pairing vi­sion (the sales­per­son’s own vi­sion, not just the cus­tomer’s).

The puz­zles that helped build CFAR

Much of what I love about CFAR is linked to the puz­zles we dwell near (the re­al­ity-re­veal­ing ones, I mean). And much of what gives me the shud­ders about CFAR comes from a re­al­ity-mask­ing puz­zle-set that’s been in­ter­linked with these.

Eliezer cre­ated the Se­quences af­ter star­ing a lot at the AI al­ign­ment prob­lem. He asked how a com­puter sys­tem could form a “map” that matches the ter­ri­tory; he asked how he him­self could do the same. He asked, “Why do I be­lieve what I be­lieve?” and checked whether the mechanis­tic causal his­tory that gave rise to his be­liefs would have yielded differ­ent be­liefs in a world where differ­ent things were true.

There’s a spring­ing up into self-aware­ness that can come from this! A tak­ing hold of our power as hu­mans to see. A child’s visceral sense that of course we care and should care—freed from its learned hope­less­ness. And tak­ing on the stars them­selves with dar­ing!

CFAR took these ori­gins and worked to make at least parts of them ac­cessible to some who bounced off the Se­quences, or who wouldn’t have read the Se­quences. We cre­ated feed­back loops for prac­tic­ing some of the core Se­quences-bits in the con­text of folks’ or­di­nary lives rather than in the con­text of philos­o­phy puz­zles. If you take a per­son (even a rather good sci­en­tist) and in­tro­duce them to the ques­tions about AI and the long-term fu­ture… of­ten noth­ing much hap­pens in their head ex­cept some ran­dom stuck non­sense in­tu­itions (“AIs wouldn’t do that, be­cause they’re our offspring. What’s for lunch?”). So we built a way to prac­tice some of the core moves that al­ign­ment think­ing needed. Espe­cially, we built a way to prac­tice hav­ing thoughts at all, in cases where stan­dard just-do-what-the-neigh­bors-do strate­gies would tend to block them off.

For ex­am­ple:

  • In­ner Si­mu­la­tor. (Your “be­liefs” are what you ex­pect to see hap­pen—not what you “en­dorse” on a ver­bal level. You can prac­tice track­ing these an­ti­ci­pa­tions in daily life! And mak­ing plans with them! And once you’ve seen that they’re use­ful for plan­ning—well, you might try also hav­ing them in con­texts like AI risk. Turns out you have be­liefs even where you don’t have offi­cial “ex­per­tise” or cre­den­tials au­tho­riz­ing be­lief-cre­ation! And you can di­a­log with them, and there’s sense there.)

  • Crux-Map­ping; Dou­ble Crux. (Ex­tends your abil­ity to di­a­log with in­ner simu­la­tor-style be­liefs. Lets you find in your­self a ran­dom opaque in­tu­ition about AI be­ing [likely/​un­likely/​safe/​what­ever], and then query it via thought ex­per­i­ments un­til it is more made out of in­tro­spectable ver­bal rea­son­ing. Lets two peo­ple with differ­ent in­tu­itions col­lide them in ver­bal con­ver­sa­tion.)

  • Goal Fac­tor­ing and Units of Ex­change. (Life isn’t mul­ti­ple choice; you can name the good things and the bad things, and you can in­vest in seek­ing the al­ter­na­tives with more of the good and less of the bad. For ex­am­ple, if you could save 4 months in a world where you were al­lowed to com­plete your PhD early, it may be worth more than sev­eral hours to scheme out how to some­how pur­chase per­mis­sion from your ad­vi­sor, since 4 months is worth rather more than sev­eral hours.)

  • Ham­ming Ques­tions. (Some ques­tions are worth a lot more than oth­ers. You want to fo­cus at least some of your at­ten­tion on the most im­por­tant ques­tions af­fect­ing your life, rather than just the ran­dom de­tails in front of you. And you can just de­cide to do that on pur­pose, by us­ing pen and pa­per and a timer!)[1]

Much good re­sulted from this—many loved the Se­quences; many loved CFAR’s in­tro work­shops; and a fair num­ber who started there went into ca­reers in AI al­ign­ment work and cred­ited CFAR work­shops as par­tially causal.

And still, as we did this, prob­lems arose. AI risk is di­s­ori­ent­ing! Helping AI risk hit more peo­ple meant “helping” more peo­ple en­counter some­thing di­s­ori­ent­ing. And so we set to work on that as well. The thing I would say now about the re­al­ity-re­veal­ing puz­zles that helped grow CFAR is that there were three, each closely linked with each other:

  1. Will AI at some point rad­i­cally trans­form our light­cone? (How /​ why /​ with what de­tails and in­ter­ven­tion op­tions?)

  2. How do we get our minds to make con­tact with prob­lem (1)? And how do we think ground­edly about such things, rather than hav­ing ac­ci­den­tal non­sense-in­tu­itions and stick­ing there?

  3. How do we stay hu­man, and stay re­li­ably in con­tact with what’s worth car­ing about (valu­ing hon­esty and com­pas­sion and hard work; hav­ing re­li­able friend­ships; be­ing good peo­ple and good thinkers and do­ers), while still tak­ing in how di­s­ori­ent­ingly differ­ent the fu­ture might be? (And while nei­ther pre­tend­ing that we have no shot at chang­ing the fu­ture, nor that “what ac­tions should I take to im­pact the fu­ture?” is a mul­ti­ple choice ques­tion with noth­ing fur­ther to do, nor that any par­tic­u­lar silly plan is more likely to work than it is?)

CFAR grew up around all three of these puz­zles—but (2) played an es­pe­cially large role over most of our his­tory, and (3) has played an es­pe­cially large role over the last year and (I think) will over the com­ing one.

I’d like to talk now about (3), and about the di­s­ori­en­ta­tion pat­terns that make (3) needed.

Di­sori­en­ta­tion patterns

To start with an analo­gous event: The pro­cess of los­ing a deeply held child­hood re­li­gion can be quite dis­rup­tive to a per­son’s com­mon sense and val­ues. Let us take as ex­am­ples the two com­mon­sen­si­cal state­ments:

  • (A) It is worth get­ting out of bed in the morn­ing; and,

  • (B) It is okay to care about my friends.

Th­ese two com­mon­sen­si­cal state­ments are held by most re­li­gious peo­ple. They are ac­tu­ally also held by most athe­ists. Nev­er­the­less, when a per­son loses their re­li­gion, they fairly of­ten be­come tem­porar­ily un­sure about whether these two state­ments (and var­i­ous similar such state­ments) are true. That’s be­cause some­how the per­son’s un­der­stand­ing of why state­ments (A) and (B) are true was of­ten tan­gled up in (for ex­am­ple) Je­ho­vah. And figur­ing out how to think about these things in the ab­sence of their child­hood re­li­gion (even in cases like this one where the state­ments should sur­vive!) can re­quire ac­tual work. (This is par­tic­u­larly true be­cause some things re­ally are differ­ent given that Je­ho­vah is false—and it can take work to de­ter­mine which is which.)

Over the last 12 years, I’ve chat­ted with small hun­dreds of peo­ple who were some­where “in pro­cess” along the path to­ward “okay I guess I should take Sin­gu­lar­ity sce­nar­ios se­ri­ously.” From watch­ing them, my guess is that the pro­cess of com­ing to take Sin­gu­lar­ity sce­nar­ios se­ri­ously is of­ten even more dis­rup­tive than is los­ing a child­hood re­li­gion. Among many other things, I have seen it some­times dis­rupt:

  • Peo­ple’s be­lief that they should have rest, free time, some money/​time/​en­ergy to spend on ob­jects of their choos­ing, abun­dant sleep, etc.

    • “It used to be okay to buy my­self hot co­coa from time to time, be­cause there used to be noth­ing im­por­tant I could do with money. But now—should I never buy hot co­coa? Should I ag­o­nize freshly each time? If I do buy a hot co­coa does that mean I don’t care?”

  • Peo­ple’s in-prac­tice abil­ity to “hang out”—to en­joy their friends, or the beach, in a “just be­ing in the mo­ment” kind of way.

    • “Here I am at the beach like my to-do list told me to be, since I’m a good EA who is plan­ning not to burn out. I’ve got my friends, beer, gui­tar, waves: check. But how is it that I used to be able to en­ter “hang­ing out mode”? And why do my friends keep mak­ing mean­ingless mouth-noises that have noth­ing to do with what’s even­tu­ally go­ing to hap­pen to ev­ery­one?”

  • Peo­ple’s un­der­stand­ing of whether com­mon­sense moral­ity holds, and of whether they can ex­pect other folks in this space to also be­lieve that com­mon­sense moral­ity holds.

    • “Given the vast cos­mic stakes, surely do­ing the thing that is ex­pe­di­ent is more im­por­tant than, say, hon­esty?”

  • Peo­ple’s in-prac­tice ten­dency to have se­ri­ous hob­bies and to take a deep in­ter­est in how the world works.

    • “I used to en­joy learn­ing math­e­mat­ics just for the sake of it, and try­ing to un­der­stand his­tory for fun. But it’s ac­tu­ally jillions of times higher value to work on [de­ci­sion the­ory, or ML, or what­ever else is pre-la­beled as ‘AI risk rele­vant’].”

  • Peo­ple’s abil­ity to link in with or­di­nary in­sti­tu­tions and take them se­ri­ously (e.g. to con­tinue learn­ing from their day job and car­ing about their col­leagues’ progress and prob­lems; to con­tinue en­joy­ing the dance club they used to dance at; to con­tinue to take an in­ter­est in their sig­nifi­cant other’s life and work; to con­tinue learn­ing from their PhD pro­gram; etc.)

    • “Here I am at my day job, mean­inglessly do­ing noth­ing to help no one, while the world is at stake—how is it that be­fore learn­ing about the Sin­gu­lar­ity, I used to be learn­ing skills and find­ing mean­ing and en­joy­ing my­self in this role?”

  • Peo­ple’s un­der­stand­ing of what’s worth car­ing about, or what’s worth fight­ing for

    • “So… ‘hap­piness’ is valuable, which means that I should hope we get an AI that tiles the uni­verse with a sin­gle re­peat­ing mouse or­gasm, right? … I won­der why imag­in­ing a ‘valuable’ fu­ture doesn’t feel that good/​mo­ti­vat­ing to me.”

  • Peo­ple’s un­der­stand­ing of when to use their own judg­ment and when to defer to oth­ers.

    • “AI risk is re­ally re­ally im­por­tant… which prob­a­bly means I should pick some ran­dom per­son at MIRI or CEA or some­where and as­sume they know more than I do about my own ca­reer and fu­ture, right?”

My take is that many of these di­s­ori­en­ta­tion-bits are analo­gous to the new athe­ist’s di­s­ori­en­ta­tion dis­cussed ear­lier. “Get­ting out of bed in the morn­ing” and “car­ing about one’s friends” turn out to be use­ful for more rea­sons than Je­ho­vah—but their deriva­tion in the mind of that per­son was en­tan­gled with Je­ho­vah. Hon­esty is analo­gously valuable for more rea­sons than its value as a lo­cal con­sump­tion good; and many of these rea­sons ap­ply ex­tra if the stakes are high. But the deriva­tion of hon­esty that many folks were raised with does not sur­vive the change in imag­ined sur­round­ings—and so it needs to be thought through freshly.

Another part of the di­s­ori­en­ta­tion per­haps stems from emo­tional reel­ing in con­tact with the pos­si­bil­ity of death (both one’s own death, and the death of the larger cul­ture/​tribe/​species/​val­ues/​life one has been part of).

And yet an­other part seems to me to stem from a set of “bad” puz­zles that were in­ad­ver­tently joined with the “good” puz­zles in­volved in think­ing through Sin­gu­lar­ity sce­nar­ios—“bad” puz­zles that dis­able the men­tal im­mune sys­tems that nor­mally pre­vent up­dat­ing in huge ways from weird and out-there claims. I’ll post­pone this third part for a sec­tion and then re­turn to it.

There is value in helping peo­ple with this di­s­ori­en­ta­tion; and much of this helping work is tractable

It seems not-sur­pris­ing that peo­ple are dis­rupted in cases where they se­ri­ously, viscer­ally won­der “Hey, is ev­ery­thing I know and ev­ery­thing hu­man­ity has ever been do­ing to maybe-end, and also to maybe be­come any num­ber of uni­mag­in­ably awe­some things? Also, am I per­son­ally in a po­si­tion of pos­si­bly in­cred­ibly high lev­er­age and yet also very high am­bi­guity with re­spect to all that?”

Per­haps it is more sur­pris­ing that peo­ple in fact some­times let this into their sys­tem 1’s at all. Many do, though; in­clud­ing many (but cer­tainly not all!) of those I would con­sider highly effec­tive. At least, I’ve had many many con­ver­sa­tions with peo­ple who seem viscer­ally af­fected by all this. Also, many peo­ple who tell me AI risk is “only ab­stract to [them]” still burst into tears or oth­er­wise ex­hibit un­am­bigu­ous strong emo­tion when asked cer­tain ques­tions—so I think peo­ple are some­times more af­fected than they think.

An ad­di­tional point is that many folks over the years have told me that they were choos­ing not to think much about Sin­gu­lar­ity sce­nar­ios lest such think­ing desta­bi­lize them in var­i­ous ways. I sus­pect that many who are in prin­ci­ple ca­pa­ble of do­ing use­ful tech­ni­cal work on AI al­ign­ment presently avoid the topic for such rea­sons. Also, many such folks have told me over the years that they found pieces at CFAR that al­lowed them to feel more con­fi­dent in at­tempt­ing such think­ing, and that find­ing these pieces then caused them to go forth and at­tempt such think­ing. (Alas, I know of at least one per­son who later re­ported that they had been in­ac­cu­rate in re­vis­ing this risk as­sess­ment! Cau­tion seems recom­mended.)

Fi­nally: peo­ple some­times sug­gest to me that re­searchers could dodge this whole set of difficul­ties by sim­ply rea­son­ing about Sin­gu­lar­ity sce­nar­ios ab­stractly, while avoid­ing ever let­ting such sce­nar­ios get into their viscera. While I ex­pect such at­tempts are in fact use­ful to some, I be­lieve this method in­suffi­cient for two rea­sons. First, as noted, it seems to me that these top­ics some­times get un­der peo­ple’s skin more than they in­tend or re­al­ize. Se­cond, it seems to me that visceral en­gage­ment with the AI al­ign­ment prob­lem is of­ten helpful for the best sci­en­tific re­search—if a per­son is to work with a given “puz­zle” it is eas­ier to do so when they can con­cretely pic­ture the puz­zle, in­clud­ing in their sys­tem 1. This is why math­e­mat­i­ci­ans of­ten take pains to “un­der­stand why a given the­o­rem is true” rather than only to fol­low its deriva­tion ab­stractly. This is why Richard Feyn­man took pains to pic­ture the physics he was work­ing with in the “make your be­liefs pay rent in an­ti­ci­pated ex­pe­riences” sense and took pains to en­sure that his stu­dents could link phrases such as “ma­te­ri­als with an in­dex of re­frac­tion” with ex­am­ples such as “wa­ter.” I would guess that with AI al­ign­ment re­search, as el­se­where, it is eas­ier to do first-rate sci­en­tific work when you have visceral mod­els of what the terms, claims, and puz­zles mean and how it all fits to­gether.

In terms of the tractabil­ity of as­sist­ing with di­s­ori­en­ta­tion in such cases: it seems to me that sim­ply pro­vid­ing con­texts for peo­ple to talk to folks who’ve “been there be­fore” can be pretty helpful. I be­lieve var­i­ous other con­cepts we have are also helpful, such as: fa­mil­iar­ity with what bucket er­rors of­ten look like for AI risk new­com­ers; dis­cus­sion of the unilat­er­al­ist’s curse; ex­pla­na­tions of why hob­bies and world-mod­el­ing and hon­esty still mat­ter when the stakes are high. (Cer­tainly par­ti­ci­pants some­times say that these are helpful.) The as­sis­tance is par­tial, but there’s a de­cent iter­a­tion loop for tin­ker­ing away at it. We’ll also be try­ing some LessWrong posts on some of this in the com­ing year.

A cluster of “re­al­ity-mask­ing” puz­zles that also shaped CFAR

To what ex­tent has CFAR’s art been shaped by re­al­ity-mask­ing puz­zles—tin­ker­ing loops that in­ad­ver­tently dis­able parts of our abil­ity to see? And how can we tell, and how can we re­duce such loops? And what role have re­al­ity-mask­ing puz­zles played in the dis­rup­tion that some­times hap­pens to folks who get into AI risk (in and out of CFAR)?

My guess is ac­tu­ally that a fair bit of this sort of re­al­ity-mask­ing has oc­curred. (My guess is that the amount is “strate­gi­cally sig­nifi­cant” but not “ut­terly over­whelming.”) To name one of the more im­por­tant dy­nam­ics:

Dis­abling pieces of the epistemic im­mune system

Folks ar­rive with piles of heuris­tics that help them avoid non­sense be­liefs and rash ac­tions. Un­for­tu­nately, many of these heuris­tics—in­clud­ing many of the gen­er­ally use­ful ones—can “get in the way.” They “get in the way” of think­ing about AI risk. They also “get in the way” of folks at main­line work­shops think­ing about chang­ing jobs/​re­la­tion­ships/​life pat­terns etc. un­re­lated to AI risk. And so dis­abling them can some­times help peo­ple ac­quire ac­cu­rate be­liefs about im­por­tant things, and have more felt free­dom to change their lives in ways they want.

Thus, the naive pro­cess of tin­ker­ing to­ward “re­ally helping this per­son think about AI risk” (or “re­ally helping this per­son con­sider their life op­tions and make choices”) can lead to folks dis­abling parts of their epistemic im­mune sys­tem. (And un­for­tu­nately also thereby dis­abling their fu­ture abil­ity to de­tect cer­tain classes of false claims!)

For ex­am­ple, the Se­quences make some effort to dis­able:

Similarly, CFAR work­shops some­times have the effect of dis­abling:

  • Taste as a fixed guide to which peo­ple/​or­ga­ni­za­tions/​ideas to take in or to spit out. (Peo­ple come in be­liev­ing that cer­tain things just “are” yucky. Then, we teach them how to “di­a­log” with their tastes… and they be­come more apt to some­times-ig­nore pre­vi­ous “yuck” re­ac­tions.)

  • An­ti­bod­ies that pro­tect peo­ple from up­dat­ing to­ward op­ti­miz­ing for a spe­cific goal, rather than for a port­fo­lio of goals. For ex­am­ple, en­ter­ing par­ti­ci­pants will say things like “I know it’s not ra­tio­nal, but I also like to [ac­tivity straw vul­cans un­der­value].” And even though CFAR work­shops ex­plic­itly warn against straw vul­can­ism, they also ex­plic­itly en­courage peo­ple to work to­ward hav­ing goals that are more in­ter­nally con­sis­tent, which some­times has the effect of dis­abling the an­ti­body which pre­vents peo­ple from sud­denly re-con­cep­tu­al­iz­ing most of their goal set as all be­ing in­stru­men­tal to/​in ser­vice of some par­tic­u­lar pur­port­edly-paramount goal.

  • Folks’ ten­dency to take ac­tions based on so­cial roles (e.g., CFAR’s Goal-Fac­tor­ing class used to ex­plic­itly teach peo­ple not to say “I’m study­ing for my exam be­cause I’m a col­lege stu­dent” or “I have to do it be­cause it’s my job,” and to in­stead say “I’m study­ing for my exam in or­der to [cause out­come X]”).

Again, these par­tic­u­lar shifts are not all bad; many of them have ad­van­tages. But I think their costs are easy to un­der­es­ti­mate, and I’m in­ter­ested in see­ing whether we can get a “ra­tio­nal­ity” that causes less dis­able­ment of or­di­nary hu­man pat­terns of func­tion­ing, while still helping peo­ple rea­son well in con­texts where there aren’t good prex­ist­ing epistemic guardrails. CFAR seems likely to spend a good bit of time mod­el­ing these prob­lems over the com­ing year, and try­ing to de­velop can­di­date solu­tions—we’re already play­ing with a bunch of new cur­ricu­lum de­signed pri­mar­ily for this pur­pose—and we’d love to get LessWrong’s thoughts be­fore play­ing fur­ther!


Thanks to Adam Scholl for helping a lot with the writ­ing. Re­main­ing flaws are of course my own.

Edited to add:

I think I did not spell out well enough what I mean by “re­al­ity-mask­ing puz­zles.” I try again in a com­ment.

I think that get­ting this on­tol­ogy right is a core and difficult task, and one I haven’t finished solv­ing yet—it is the task of find­ing analogs of the “rea­son­ing vs ra­tio­nal­iza­tion” dis­tinc­tion that are suit­able for un­der­stand­ing group dy­nam­ics. I would love help with this task—that is maybe the main rea­son I wrote this post.

I think this task is closely re­lated to what Zvi and the book “Mo­ral Mazes” are try­ing for.

  1. If you don’t know some of these terms but want to, you can find them in CFAR’s hand­book. ↩︎