Reality-Revealing and Reality-Masking Puzzles

AnnaSalamon16 Jan 2020 16:15 UTC

272 points

Rationality Pitfalls of Rationality Memetic Immune System

Tl;dr: I’ll try here to show how CFAR’s “art of rationality” has evolved over time, and what has driven that evolution.

In the course of this, I’ll introduce the distinction between what I’ll call “reality-revealing puzzles” and “reality-masking puzzles”—a distinction that I think is almost necessary for anyone attempting to develop a psychological art in ways that will help rather than harm. (And one I wish I’d had explicitly back when the Center for Applied Rationality was founded.)

I’ll also be trying to elaborate, here, on the notion we at CFAR have recently been tossing around about CFAR being an attempt to bridge between common sense and Singularity scenarios—an attempt to figure out how people can stay grounded in common sense and ordinary decency and humane values and so on, while also taking in (and planning actions within) the kind of universe we may actually be living in.

Arts grow from puzzles. I like to look at mathematics, or music, or ungodly things like marketing, and ask: What puzzles were its creators tinkering with that led them to leave behind these structures? (Structures now being used by other people, for other reasons.)

I picture arts like coral reefs. Coral polyps build shell-bits for their own reasons, but over time there accumulates a reef usable by others. Math built up like this—and math is now a powerful structure for building from. [Sales and Freud and modern marketing/self-help/sales etc. built up some patterns too—and our basic way of seeing each other and ourselves is now built partly in and from all these structures, for better and for worse.]

So let’s ask: What sort of reef is CFAR living within, and adding to? From what puzzles (what patterns of tinkering) has our “rationality” accumulated?

Two kinds of puzzles: “reality-revealing” and “reality-masking”

First, some background. Some puzzles invite a kind of tinkering that lets the world in and leaves you smarter. A kid whittling with a pocket knife is entangling her mind with bits of reality. So is a driver who notices something small about how pedestrians dart into streets, and adjusts accordingly. So also is the mathematician at her daily work. And so on.

Other puzzles (or other contexts) invite a kind of tinkering that has the opposite effect. They invite a tinkering that gradually figures out how to mask parts of the world from your vision. For example, some months into my work as a math tutor I realized I’d been unconsciously learning how to cue my students into acting like my words made sense (even when they didn’t). I’d learned to mask from my own senses the clues about what my students were and were not learning.

We’ll be referring to these puzzle-types a lot, so it’ll help to have a term for them. I’ll call these puzzles “good” or “reality-revealing” puzzles, and “bad” or “reality-masking” puzzles, respectively. Both puzzle-types appear abundantly in most folks’ lives, often mixed together. The same kid with the pocket knife who is busy entangling her mind with data about bark and woodchips and fine motor patterns (from the “good” puzzle of “how can I whittle this stick”), may simultaneously be busy tinkering with the “bad” puzzle of “how can I not-notice when my creations fall short of my hopes.”

(Even “good” puzzles can cause skill loss: a person who studies Dvorak may lose some of their QWERTY skill, and someone who adapts to the unselfconscious arguing of the math department may do worse for a while in contexts requiring tact. The distinction is that “good” puzzles do this only incidentally. Good puzzles do not invite a search for configurations that mask bits of reality. Whereas with me and my math tutees, say, there was a direct reward/conditioning response that happened specifically when the “they didn’t get it” signal was masked from my view. There was a small optimizer inside of me that was learning how to mask parts of the world from me, via feedback from the systems of mine it was learning to befuddle.)

Also, certain good puzzles (and certain bad ones!) allow unusually powerful accumulations across time. I’d list math, computer science, and the English language as examples of unusually powerful artifacts for improving vision. I’d list “sales and marketing skill” as an example of an unusually powerful artifact for impairing vision (the salesperson’s own vision, not just the customer’s).

The puzzles that helped build CFAR

Much of what I love about CFAR is linked to the puzzles we dwell near (the reality-revealing ones, I mean). And much of what gives me the shudders about CFAR comes from a reality-masking puzzle-set that’s been interlinked with these.

Eliezer created the Sequences after staring a lot at the AI alignment problem. He asked how a computer system could form a “map” that matches the territory; he asked how he himself could do the same. He asked, “Why do I believe what I believe?” and checked whether the mechanistic causal history that gave rise to his beliefs would have yielded different beliefs in a world where different things were true.

There’s a springing up into self-awareness that can come from this! A taking hold of our power as humans to see. A child’s visceral sense that of course we care and should care—freed from its learned hopelessness. And taking on the stars themselves with daring!

CFAR took these origins and worked to make at least parts of them accessible to some who bounced off the Sequences, or who wouldn’t have read the Sequences. We created feedback loops for practicing some of the core Sequences-bits in the context of folks’ ordinary lives rather than in the context of philosophy puzzles. If you take a person (even a rather good scientist) and introduce them to the questions about AI and the long-term future… often nothing much happens in their head except some random stuck nonsense intuitions (“AIs wouldn’t do that, because they’re our offspring. What’s for lunch?”). So we built a way to practice some of the core moves that alignment thinking needed. Especially, we built a way to practice having thoughts at all, in cases where standard just-do-what-the-neighbors-do strategies would tend to block them off.

For example:

Inner Simulator. (Your “beliefs” are what you expect to see happen—not what you “endorse” on a verbal level. You can practice tracking these anticipations in daily life! And making plans with them! And once you’ve seen that they’re useful for planning—well, you might try also having them in contexts like AI risk. Turns out you have beliefs even where you don’t have official “expertise” or credentials authorizing belief-creation! And you can dialog with them, and there’s sense there.)
Crux-Mapping; Double Crux. (Extends your ability to dialog with inner simulator-style beliefs. Lets you find in yourself a random opaque intuition about AI being [likely/unlikely/safe/whatever], and then query it via thought experiments until it is more made out of introspectable verbal reasoning. Lets two people with different intuitions collide them in verbal conversation.)
Goal Factoring and Units of Exchange. (Life isn’t multiple choice; you can name the good things and the bad things, and you can invest in seeking the alternatives with more of the good and less of the bad. For example, if you could save 4 months in a world where you were allowed to complete your PhD early, it may be worth more than several hours to scheme out how to somehow purchase permission from your advisor, since 4 months is worth rather more than several hours.)
Hamming Questions. (Some questions are worth a lot more than others. You want to focus at least some of your attention on the most important questions affecting your life, rather than just the random details in front of you. And you can just decide to do that on purpose, by using pen and paper and a timer!)^[1]

Much good resulted from this—many loved the Sequences; many loved CFAR’s intro workshops; and a fair number who started there went into careers in AI alignment work and credited CFAR workshops as partially causal.

And still, as we did this, problems arose. AI risk is disorienting! Helping AI risk hit more people meant “helping” more people encounter something disorienting. And so we set to work on that as well. The thing I would say now about the reality-revealing puzzles that helped grow CFAR is that there were three, each closely linked with each other:

Will AI at some point radically transform our lightcone? (How / why / with what details and intervention options?)
How do we get our minds to make contact with problem (1)? And how do we think groundedly about such things, rather than having accidental nonsense-intuitions and sticking there?
How do we stay human, and stay reliably in contact with what’s worth caring about (valuing honesty and compassion and hard work; having reliable friendships; being good people and good thinkers and doers), while still taking in how disorientingly different the future might be? (And while neither pretending that we have no shot at changing the future, nor that “what actions should I take to impact the future?” is a multiple choice question with nothing further to do, nor that any particular silly plan is more likely to work than it is?)

CFAR grew up around all three of these puzzles—but (2) played an especially large role over most of our history, and (3) has played an especially large role over the last year and (I think) will over the coming one.

I’d like to talk now about (3), and about the disorientation patterns that make (3) needed.

Disorientation patterns

To start with an analogous event: The process of losing a deeply held childhood religion can be quite disruptive to a person’s common sense and values. Let us take as examples the two commonsensical statements:

(A) It is worth getting out of bed in the morning; and,
(B) It is okay to care about my friends.

These two commonsensical statements are held by most religious people. They are actually also held by most atheists. Nevertheless, when a person loses their religion, they fairly often become temporarily unsure about whether these two statements (and various similar such statements) are true. That’s because somehow the person’s understanding of why statements (A) and (B) are true was often tangled up in (for example) Jehovah. And figuring out how to think about these things in the absence of their childhood religion (even in cases like this one where the statements should survive!) can require actual work. (This is particularly true because some things really are different given that Jehovah is false—and it can take work to determine which is which.)

Over the last 12 years, I’ve chatted with small hundreds of people who were somewhere “in process” along the path toward “okay I guess I should take Singularity scenarios seriously.” From watching them, my guess is that the process of coming to take Singularity scenarios seriously is often even more disruptive than is losing a childhood religion. Among many other things, I have seen it sometimes disrupt:

People’s belief that they should have rest, free time, some money/time/energy to spend on objects of their choosing, abundant sleep, etc.
- “It used to be okay to buy myself hot cocoa from time to time, because there used to be nothing important I could do with money. But now—should I never buy hot cocoa? Should I agonize freshly each time? If I do buy a hot cocoa does that mean I don’t care?”
People’s in-practice ability to “hang out”—to enjoy their friends, or the beach, in a “just being in the moment” kind of way.
- “Here I am at the beach like my to-do list told me to be, since I’m a good EA who is planning not to burn out. I’ve got my friends, beer, guitar, waves: check. But how is it that I used to be able to enter “hanging out mode”? And why do my friends keep making meaningless mouth-noises that have nothing to do with what’s eventually going to happen to everyone?”
People’s understanding of whether commonsense morality holds, and of whether they can expect other folks in this space to also believe that commonsense morality holds.
- “Given the vast cosmic stakes, surely doing the thing that is expedient is more important than, say, honesty?”
People’s in-practice tendency to have serious hobbies and to take a deep interest in how the world works.
- “I used to enjoy learning mathematics just for the sake of it, and trying to understand history for fun. But it’s actually jillions of times higher value to work on [decision theory, or ML, or whatever else is pre-labeled as ‘AI risk relevant’].”
People’s ability to link in with ordinary institutions and take them seriously (e.g. to continue learning from their day job and caring about their colleagues’ progress and problems; to continue enjoying the dance club they used to dance at; to continue to take an interest in their significant other’s life and work; to continue learning from their PhD program; etc.)
- “Here I am at my day job, meaninglessly doing nothing to help no one, while the world is at stake—how is it that before learning about the Singularity, I used to be learning skills and finding meaning and enjoying myself in this role?”
People’s understanding of what’s worth caring about, or what’s worth fighting for
- “So… ‘happiness’ is valuable, which means that I should hope we get an AI that tiles the universe with a single repeating mouse orgasm, right? … I wonder why imagining a ‘valuable’ future doesn’t feel that good/motivating to me.”
People’s understanding of when to use their own judgment and when to defer to others.
- “AI risk is really really important… which probably means I should pick some random person at MIRI or CEA or somewhere and assume they know more than I do about my own career and future, right?”

My take is that many of these disorientation-bits are analogous to the new atheist’s disorientation discussed earlier. “Getting out of bed in the morning” and “caring about one’s friends” turn out to be useful for more reasons than Jehovah—but their derivation in the mind of that person was entangled with Jehovah. Honesty is analogously valuable for more reasons than its value as a local consumption good; and many of these reasons apply extra if the stakes are high. But the derivation of honesty that many folks were raised with does not survive the change in imagined surroundings—and so it needs to be thought through freshly.

Another part of the disorientation perhaps stems from emotional reeling in contact with the possibility of death (both one’s own death, and the death of the larger culture/tribe/species/values/life one has been part of).

And yet another part seems to me to stem from a set of “bad” puzzles that were inadvertently joined with the “good” puzzles involved in thinking through Singularity scenarios—“bad” puzzles that disable the mental immune systems that normally prevent updating in huge ways from weird and out-there claims. I’ll postpone this third part for a section and then return to it.

There is value in helping people with this disorientation; and much of this helping work is tractable

It seems not-surprising that people are disrupted in cases where they seriously, viscerally wonder “Hey, is everything I know and everything humanity has ever been doing to maybe-end, and also to maybe become any number of unimaginably awesome things? Also, am I personally in a position of possibly incredibly high leverage and yet also very high ambiguity with respect to all that?”

Perhaps it is more surprising that people in fact sometimes let this into their system 1’s at all. Many do, though; including many (but certainly not all!) of those I would consider highly effective. At least, I’ve had many many conversations with people who seem viscerally affected by all this. Also, many people who tell me AI risk is “only abstract to [them]” still burst into tears or otherwise exhibit unambiguous strong emotion when asked certain questions—so I think people are sometimes more affected than they think.

An additional point is that many folks over the years have told me that they were choosing not to think much about Singularity scenarios lest such thinking destabilize them in various ways. I suspect that many who are in principle capable of doing useful technical work on AI alignment presently avoid the topic for such reasons. Also, many such folks have told me over the years that they found pieces at CFAR that allowed them to feel more confident in attempting such thinking, and that finding these pieces then caused them to go forth and attempt such thinking. (Alas, I know of at least one person who later reported that they had been inaccurate in revising this risk assessment! Caution seems recommended.)

Finally: people sometimes suggest to me that researchers could dodge this whole set of difficulties by simply reasoning about Singularity scenarios abstractly, while avoiding ever letting such scenarios get into their viscera. While I expect such attempts are in fact useful to some, I believe this method insufficient for two reasons. First, as noted, it seems to me that these topics sometimes get under people’s skin more than they intend or realize. Second, it seems to me that visceral engagement with the AI alignment problem is often helpful for the best scientific research—if a person is to work with a given “puzzle” it is easier to do so when they can concretely picture the puzzle, including in their system 1. This is why mathematicians often take pains to “understand why a given theorem is true” rather than only to follow its derivation abstractly. This is why Richard Feynman took pains to picture the physics he was working with in the “make your beliefs pay rent in anticipated experiences” sense and took pains to ensure that his students could link phrases such as “materials with an index of refraction” with examples such as “water.” I would guess that with AI alignment research, as elsewhere, it is easier to do first-rate scientific work when you have visceral models of what the terms, claims, and puzzles mean and how it all fits together.

In terms of the tractability of assisting with disorientation in such cases: it seems to me that simply providing contexts for people to talk to folks who’ve “been there before” can be pretty helpful. I believe various other concepts we have are also helpful, such as: familiarity with what bucket errors often look like for AI risk newcomers; discussion of the unilateralist’s curse; explanations of why hobbies and world-modeling and honesty still matter when the stakes are high. (Certainly participants sometimes say that these are helpful.) The assistance is partial, but there’s a decent iteration loop for tinkering away at it. We’ll also be trying some LessWrong posts on some of this in the coming year.

A cluster of “reality-masking” puzzles that also shaped CFAR

To what extent has CFAR’s art been shaped by reality-masking puzzles—tinkering loops that inadvertently disable parts of our ability to see? And how can we tell, and how can we reduce such loops? And what role have reality-masking puzzles played in the disruption that sometimes happens to folks who get into AI risk (in and out of CFAR)?

My guess is actually that a fair bit of this sort of reality-masking has occurred. (My guess is that the amount is “strategically significant” but not “utterly overwhelming.”) To name one of the more important dynamics:

Disabling pieces of the epistemic immune system

Folks arrive with piles of heuristics that help them avoid nonsense beliefs and rash actions. Unfortunately, many of these heuristics—including many of the generally useful ones—can “get in the way.” They “get in the way” of thinking about AI risk. They also “get in the way” of folks at mainline workshops thinking about changing jobs/relationships/life patterns etc. unrelated to AI risk. And so disabling them can sometimes help people acquire accurate beliefs about important things, and have more felt freedom to change their lives in ways they want.

Thus, the naive process of tinkering toward “really helping this person think about AI risk” (or “really helping this person consider their life options and make choices”) can lead to folks disabling parts of their epistemic immune system. (And unfortunately also thereby disabling their future ability to detect certain classes of false claims!)

For example, the Sequences make some effort to disable:

The absurdity heuristic
The habit of saying/believing one “doesn’t know” in cases where one hasn’t much “legitimate” evidence
Compartmentalization

Similarly, CFAR workshops sometimes have the effect of disabling:

Taste as a fixed guide to which people/organizations/ideas to take in or to spit out. (People come in believing that certain things just “are” yucky. Then, we teach them how to “dialog” with their tastes… and they become more apt to sometimes-ignore previous “yuck” reactions.)
Antibodies that protect people from updating toward optimizing for a specific goal, rather than for a portfolio of goals. For example, entering participants will say things like “I know it’s not rational, but I also like to [activity straw vulcans undervalue].” And even though CFAR workshops explicitly warn against straw vulcanism, they also explicitly encourage people to work toward having goals that are more internally consistent, which sometimes has the effect of disabling the antibody which prevents people from suddenly re-conceptualizing most of their goal set as all being instrumental to/in service of some particular purportedly-paramount goal.
Folks’ tendency to take actions based on social roles (e.g., CFAR’s Goal-Factoring class used to explicitly teach people not to say “I’m studying for my exam because I’m a college student” or “I have to do it because it’s my job,” and to instead say “I’m studying for my exam in order to [cause outcome X]”).

Again, these particular shifts are not all bad; many of them have advantages. But I think their costs are easy to underestimate, and I’m interested in seeing whether we can get a “rationality” that causes less disablement of ordinary human patterns of functioning, while still helping people reason well in contexts where there aren’t good prexisting epistemic guardrails. CFAR seems likely to spend a good bit of time modeling these problems over the coming year, and trying to develop candidate solutions—we’re already playing with a bunch of new curriculum designed primarily for this purpose—and we’d love to get LessWrong’s thoughts before playing further!

Acknowledgement

Thanks to Adam Scholl for helping a lot with the writing. Remaining flaws are of course my own.

Edited to add:

I think I did not spell out well enough what I mean by “reality-masking puzzles.” I try again in a comment.

I think that getting this ontology right is a core and difficult task, and one I haven’t finished solving yet—it is the task of finding analogs of the “reasoning vs rationalization” distinction that are suitable for understanding group dynamics. I would love help with this task—that is maybe the main reason I wrote this post.

I think this task is closely related to what Zvi and the book “Moral Mazes” are trying for.

↩︎
If you don’t know some of these terms but want to, you can find them in CFAR’s handbook.

What links here?

AnnaSalamon16 Jan 2020 16:15 UTC

272 points

57 comments13 min readLW link 1 review

Rationality Pitfalls of Rationality Memetic Immune System