Communications lead at MIRI. Unless otherwise indicated, my posts and comments here reflect my own views, and not necessarily my employer’s.
Rob Bensinger
Caveat: I didn’t run the above comments by MIRI researchers, and MIRI researchers aren’t a monolith in any case. E.g., I could imagine people’s probabilities in “scaled-up deep nets are a complete dead end in terms of alignability” looking like “Eliezer ≈ Benya ≈ Nate >> Scott >> Abram > Evan >> Paul”, or something?
I’d have thought the Paul-argument is less timeline-dependent than that—more like ‘even if timelines are long, there’s no reason to expect any totally new unexplored research direction to pay off so spectacularly that it can compete with the state of the art n years from now; and prosaic alignment seems like it may work, so we should focus more on that until we’re confident it’s a dead end’.
The base rate of new ideas paying off in a big way, even if they’re very promising-seeming at the outset, is super low. It may be useful for some people to pursue ideas like this, but (on my possibly-flawed Paul-model) the bulk of the field’s attention should be on AI techniques that already have a proven track record of competitiveness, until we know this is unworkable.
Whereas if you’re already confident that scaled-up deep learning in the vein of current ML is unalignable, then base rates are a bit of a moot point; we have to find new approaches one way or another, even if it’s hard-in-expectation. So “are scaled-up deep nets a complete dead end in terms of alignability?” seems like an especially key crux to me.
I like the idea of this existing as a top-level post somewhere.
‘Straw MIRI researchers’ seems basically right to me. Though if I were trying to capture all MIRI research I’d probably replace “try to build a robust theory of agency” with “try to get deconfused about powerful general-purpose intelligence/optimization” or “try to ensure that the future developers of AGI aren’t flying blind; less like the black boxes of current ML, more like how NASA has to deal with some chaotic wind and weather patterns but the principles and parts of the rocket are fundamentally well-understood”.
‘Straw Paul Christiano’ doesn’t sound right to me, but I’m not sure how to fix it. Some things that felt off to me (though maybe I’m wrong about this too):
Disagreements about whether MIRI’s approach is doomed or too-hard seem smaller and less cruxy to me than disagreements about whether prosaic AGI alignment is doomed.
“Timelines are too short” doesn’t sound like a crux I’ve heard before.
A better example of a thing I think Paul thinks is pretty doomed is “trying to align AGI in hard-takeoff scenarios”. I could see takeoff speed/continuity being a crux: either disagreement about the likelihood of hard takeoff, or disagreement about the feasibility of alignment given hard takeoff.
Reminds me of Logan Strohl’s use of meditation for brainstorming, orienting, etc.:
For some reason nobody I talk to does the kind of meditation I do. In case that’s because nobody knows about it, I thought I’d explain it.
When I was about nineteen or twenty, I spent a summer practicing Soto Zen at a residential temple in North Carolina. The Soto style of meditation is called “shikantaza”, and it’s sometimes referred to as a “methodless method”. I wasn’t really given any instructions for how to meditate because, well, for shikantaza, no instruction is appropriate. “Sit down, don’t talk, don’t fall asleep.” That is how you meditate in Soto Zen. (For real, in the founder’s “Instructions for Meditation”, pretty much all of the concrete instructions are about stuff like how to arrange your limbs. ( https://global.sotozen-net.or.jp/eng/practice/zazen/advice/fukanzanzeng.html ))
I’ve tried a variety of meditations over the years. I got a degree in religious studies. I’ve gone to a few kinds of temples, I’ve learned yogic breathing and chakra meditations, I’ve invented a lot of my own attentional kata and cognitive techniques. But what have I actually ended up doing on a regular basis ten years later?
What I’ve actually ended up doing is a little more formalized and strategic than shikantaza, but it’s basically the same: Sit down, don’t talk, don’t fall asleep. What does my mind do? Whatever it wants. Really? Yes. Like when you’re in a boring lecture and haven’t heard a word the teacher’s said for the past twenty minutes because you were daydreaming about your crush? Yes, exactly like that.
I mostly do it in the morning, to decide what kind of day I’m going to have. I stare at a wall and let my mind wander, and as I do, I take some notes (often on paper) about what sorts of thoughts and experiences I’m having. I note things like “imaginary argument with Kevin”, “buzzing in my right arm”, and “tangled up”. Afterward, I make a little summary: “I seem to be attempting a lot of problem-solving this morning”, or “That sure looks like the mind of a person who is overwhelmed by their responsibilities.” Then as I plan my day, I cater to a version of me with those particular things going on for them, which usually means adding a couple opportunities and re-ordering things, but sometimes means throwing out my plans entirely so I can eat ice cream and binge Netflix.
When I ask someone whether they meditate and they say yes, sometimes I ask them what sort of meditation they do. Usually they describe a form of mindfulness like Vipassana, or breath meditation, or emotional processing, or metta, or a tantric thing. And when I ask them, “Do you ever just, like, let your mind wander and see what happens?”, every person I’ve so far asked has said “no”.
So, I dunno, maybe give that a try. I think it’s pretty good.
I was curious about Zvi’s performance since he started making predictions about the US in mid-November, so I tossed together a (very quick and sloppy, inconsistently out of date due to some numbers being more updated) visual comparison:
Predictions are a week in advance. Dotted lines / triangles are predictions; blue is % tests positive, yellow is millions of tests administered, red is official deaths.
I’ve also heard a lot of negative accounts of DT (possibly the same accounts others in this thread have heard, so plausibly don’t treat this as a separate data point). The conclusion I drew from the accounts is ‘DT has unusually unhealthy norms for a group house, and the particular ways it’s bad are pretty unusual for rationalist group houses’.
It sounds like a lot to me if I imagine going grocery shopping regularly, because ‘grocery shopping’ isn’t where I want to spend a large chunk of my risk budget. Suppose that:
My goal is to have at most a 1% risk of catching COVID per year.
Every grocery visit costs me 70 microcovids, rather than this being an especially risky time. (Note that this assumption is false.)
I go grocery shopping once per week.
Then over a year, I’ve spent ~3600 microcovids out of a ~10,000 microcovid budget. If I can cut out a third of my risk for the year, that gives me a lot more room to travel, see friends, respond to emergencies, etc.
Yeah, I should have said “relatively simple” or “relatively straightforward” instead of “relatively trivial”.
Of course, some of the above advantages go away if your way of catching COVID-19 involves getting a higher-than-typical viral load. So I also don’t see a good way to do it, especially compared to the relatively trivial alternatives. (“I know, I’ll try the crazy munchkiny solution of… just sitting at home for a few months until I get my vaccine booster.”)
“Just not caring” means you might catch it during the peak of a fourth wave, which (a) increases the risk that you won’t be able to get hospital care if required, and (b) may increase the risk you get a higher viral load. (Though it’s confusing to me that the higher viral loads of the new strain have reportedly not been accompanied by worse symptoms.)
“Just not caring” also has the disadvantage that you may spread COVID-19 to others before you realize you’re sick. If there were instead a maximally safe and ethical way to catch it at a known time and place, then you could immediately lock down hard after catching it.
Not sure it’s unethical either, given that others who engage in high risk activities have chosen the risk voluntarily
Two other reasons it may be unethical: many of the people taking on unusual risk are uninformed (which from my perspective, makes it feel at least a little more like I’m tricking them if I fully indulge them); and increasing their exposure puts one more COVID-infected person in the world, which can put third parties at risk who didn’t intend to throw caution to the wind.
I think Frankish is squirming a bit here, and that he should bite the relevant bullets more forthrightly (though to his credit, he’s still reasonably up front). No one ever thought that phenomenal zombies lacked introspective access to their own mental states, since they were by hypothesis functionally identical to humans; and the central function of “what it’s like” talk in the discourse about consciousness has been to point to/characterize phenomenal consciousness.
As an illusionist, I endorse biting this bullet. I think I just am a p-zombie.
I also endorse the rest of your post!
Because this account is more of a promissory note than a developed theory, it doesn’t provide a ton of content to aid in constructing an illusionist model of how your mind works.
Notably, the dualist sort of agrees that a story like this must be possible, since they think it’s possible to fully reductively explain p-zombies, who dualists agree do have delusive beliefs and perceptions exactly like those. (Or p-beliefs and p-perceptions, if you prefer.)
An important question here, perhaps, is whether the process of fully reductively explaining the p-zombie would help make illusionism feel less mysterious or counter-intuitive. (I have to imagine it would, even if there would always be an element of mind-bending oddness to the claim.)
I agree with this post.
If the physics map doesn’t imply the mind map (because of the zombie argument, the Mary’s room argument, etc.), then how do you come to know about the mind map? The causal process by which you come to know the physics map is easy to understand:
Light leaves the Sun and strikes your shoelaces and bounces off; some photons enter the pupils of your eyes and strike your retina; the energy of the photons triggers neural impulses; the neural impulses are transmitted to the visual-processing areas of the brain; and there the optical information is processed and reconstructed into a 3D model that is recognized as an untied shoelace.
What is the version of this story for the mind map, once we assume that the mind map has contents that have no causal effect on the physical world? (E.g., your mind map had absolutely no effect on the words you typed into the LW page.)
At some point you didn’t have a concept for “qualia”; how did you learn it, if your qualia have no causal effects?
At some point you heard about the zombie argument and concluded “ah yes, my mental map must be logically independent of my physical map”; how did you do that without your mental map having any effects?
I can imagine an interactionist video game, where my brain has more processing power than the game and therefore can’t be fully represented in the game itself. It would then make sense that I can talk about properties that don’t exist within the game’s engine: I myself exist outside the game universe, and I can use that fact to causally change the game’s outcomes in ways that a less computationally powerful agent could not.
Equally, I can imagine an epiphenomenal video game, where I’m strapped into a headset but forbidden from using the controls. I passively watch the events occurring in the game; but no event in the game ever reflects or takes note of the fact that I exist or have any ‘unphysical’ properties, and if there is an AI steering my avatar or camera’s behavior, the AI knows zilch about me. (You could imagine a programmer deliberately designing the game to have NPCs talk about entities outside the game world; but then the programmer’s game-transcending cognitive capacities are not epiphenomenal relative to the game.)
The thing that doesn’t make sense is to import intuitions from the interactionist game to the epiphenomenal game, while insisting it’s all still epiphenomenal.
Hrothgar: What’s your answer to the hard problem of consciousness?
Rob Bensinger: The hard problem makes sense, and seems to successfully do away with ‘consciousness is real and reducible’. But ‘consciousness is real and irreducible’ isn’t tenable: it either implies violations of physics as we know it (interactionism), or implies we can’t know we’re conscious (epiphenomenalism).
So we seem to be forced to accept that consciousness (of the sort cited in the hard problem) is somehow illusory. This is… very weird and hard to wrap one’s head around. But some version of this view (illusionism) seems incredibly hard to avoid.
(Note: This is a twitter-length statement of my view, so it leaves out a lot of details. E.g., I think panpsychist views must be interactionist or epiphenomenalist, in the sense that matters. But this isn’t trivial to establish.)
Hrothgar: What does “illusory” mean here? I think I’m interpreting as gesturing toward denying consciousness is happening, which is, like, the one thing that can’t even be doubted (since the experience of doubt requires a conscious experiencer in the first place)
Rob Bensinger: I think “the fact that I’m having an experience” seems undeniable. E.g., it seems to just be a fact that I’m experiencing this exact color of redness as I look at the chair next to me. There’s a long philosophical tradition of treating experience as ‘directly given’, the foundation on which all our other knowledge is built.
I find this super compelling and intuitive at a glance, even if I can’t explain how you’d actually build a brain/computer that has infallible ‘directly given’ knowledge about some of its inner workings.
But I think the arguments alluded to above ultimately force us to reject this picture, and endorse the crazy-sounding view ‘the character of my own experiences can be illusory, even though it seems obviously directly given’.
An attempt to clarify what this means: https://nothingismere.com/2017/02/23/phenomenal-consciousness-is-a-quasiperceptual-illusion-objections-and-replies/
I don’t want to endorse the obviously false claim ‘light isn’t bouncing off the chair, hitting my eyes, and getting processed as environmental information by my brain.’
My brain is tracking facts about the environment. And it can accurately model many, many things about itself!
But I think my brain’s native self-modeling gets two things wrong: (1) it models my subjective experience as a sort of concrete, ‘manifest’ inner world; (2) it represents this world as having properties that are too specific or arbitrary to logically follow from ‘mere physics’.
I think there is a genuine perception-like (not ‘hunch-like’) introspective illusion that makes those things appear to be true (to people who are decent introspectors and have thought through the implications) -- even though they’re not true. Like a metacognitive optical illusion.
And yes, this sounds totally incoherent from the traditional Descartes-inspired philosophical vantage point.
Optical illusions are fine; calling consciousness itself an illusion invites the question ‘what is conscious of this illusion?’.
I nonetheless think this weird view is right.
I want to say: There’s of course something going on here; and the things that seems present in my visual field must correspond to real things insofar as they have the potential to affect my actions. But my visual field as-it-appears-to-me isn’t a real movie screen playing for an inner Me.
And what’s more, the movie screen isn’t translatable into neural firings that encode all the ‘given’-seeming stuff. (!)
The movie screen is a lie the brain tells itself—tells itself at the sensory, raw-feel level, not just at the belief/hunch level. (Illusion, rather than delusion.)
And (somehow! this isn’t intuitive to me either!) since there’s no homunculus outside the brain to notice all this, there’s no ‘check’ on the brain forcing it to not trick itself in how it represents the most basic features of ‘experience’ to itself.
The way the brain models itself is entirely a product of the functioning of that very brain, with no law of physics or CS to guarantee the truth of anything! No matter how counter-intuitive that seems to the brain itself. (And yes, it’s still counter-intuitive to me. I wouldn’t endorse this view if I didn’t think the alternatives were even worse!)
Core argument:
1. a Bayesian view of cognition. ‘the exact redness of red’ has to cause brain changes, or our brains can’t know about it.
2. we know enough about physics to know these causes aren’t coming from outside of physics.
3. hard problem: ‘the exact redness of red’ isn’t reducible.
Thus, ‘the exact redness of red’ must somehow not be real. Secondarily, we can circle back and consider things that help make sense of this conclusion and help show it isn’t nonsense:
4. thinking in detail about what cognition goes on in p-zombies’ heads that makes them think there’s a hard problem.
5. questioning the claim that (e.g.) my visual field is ‘directly given’ to me in an infallible way. questioning how you could design a computer that genuinely has infallible access to its internal states.
Hrothgar: But even if I grant that experience is illusion, the fact of ‘experiencing illusion’ is itself then undeniable. I don’t consider it a philosophical tradition, just a description of reality ᾓ7
Whether this reconciles with physics etc seems like a downstream problem
Reading what you wrote again, I think it’s likely I’m misunderstanding you.
What you’re saying seems crazy or nonsensical to me, and/but I’m super appreciative that you wrote this all out, and I do intend to spend more time with your words (now or later) to see if i can catch more of your drift
(I don’t claim to have it all figured out)
Rob Bensinger: Good, if it sounds crazy/nonsensical then I suspect that (a) I’ve communicated well, and (b) we share key background context: ‘why does consciousness seem obviously real?’, ‘why does the hard problem seem so hard?’, etc.
If my claims seemed obviously true, I’d be worried.
Hrothgar: I haven’t read your blog post yet, but i suppose my main objection right now is something like, “Thinking is itself sensorial in nature, & that nature precedes its content. Effectively it seems like you’re using thinking to try to refute thinking, & we get into gödel problems”
Rob Bensinger: I agree that thinking has an (apparent) phenomenal character, like e.g. seeing.
I don’t think that per se raises a special problem. A calculator could introspect on its acts of calculating and wrongly perceive them as ‘fluffy’ or ‘flibulous’, while still getting 2+2=4 right.
Hrothgar: Why would fluffy or flibulous be wrong? I don’t see what correctness has to do with it (fluffiness is neither wrong nor right) -- where is there a logical basis to evaluate “correctness” of that which isn’t a proposition?
Rob Bensinger: If we take ‘fluffy’ literally, then the computations can’t be fluffy because they aren’t physical. It’s possible to think that some property holds of your thoughts, when it simply doesn’t.
Copied from some conversations on Twitter:
· · · · · · · · · · · · · · ·
Eric Rogstad: I think “illusionism” is a really misleading term. As far as I can tell, illusionists believe that consciousness is real, but has some diff properties than others believe.
It’s like if you called Einstein an “illusionist” w.r.t. space or time.
See my comments here:
https://www.lesswrong.com/posts/biKchmLrkatdBbiH8/book-review-rethinking-consciousness
Rob Bensinger: I mostly disagree. It’s possible to define a theory-neutral notion of ‘consciousness’, but I think it’s just true that ‘there’s no such thing as subjective awareness / qualia / etc.’, and I think this cuts real dang deep into the heart of what most people mean by consciousness.
Before the name illusionism caught on, I had to use the term ‘eliminativism’, but I had to do a lot of work to clarify that I’m not like old-school eliminativists who think consciousness is obviously or analytically fake. Glad to have a clearer term now.
I think people get caught up in knots about the hard problem of consciousness because they try to gesture at ‘the fact that they have subjective awareness’, without realizing they’re gesturing at something that contains massive introspective misrepresentations / illusions.
Seeing that we have to throw out a key part of Chalmers’ explanandum is an important insight for avoiding philosophical knots, even though it doesn’t much help us build a positive account. That account matters, but epistemic spring cleaning matters too.
Eric Rogstad:
‘the fact that they have subjective awareness’, without realizing they’re gesturing at something that contains massive introspective misrepresentations / illusions
@robbensinger I don’t see how this is so different from my relativity example.
The situation still seems to be that there is a real thing that we’re pointing at with “subjective awareness”, and also people have a lot of wrong beliefs about it.
I guess the question is whether consciousness is more like “space” or more like “phlogiston”.
It’s possible to define a theory-neutral notion of ‘consciousness’
@robbensinger If we defined a theory-neutral notion of ‘consciousness’ (to refer to whatever the thing is that causes us to talk about our experiences), would you still want to describe yourself as an ‘illusionist’ w.r.t that theory?
Keith Frankish: Yes! In fact that’s exactly what I do. (The claim isn’t that consciousness itself is illusory, only that qualia are)
Rob Bensinger: Yeah, I’m happy to define ‘consciousness’ in theory-neutral terms and say it exists in some sense. ‘Qualia are an illusion’ or ‘phenomenal consciousness is an illusion’ is more precise anyway.
I don’t care how we define ‘consciousness’. The claims I care about are:
1. Illusionism is asserting something substantive and (even among physicalists) controversial.
2. Illusionism is genuinely denying the existence of something widely seen as real and crucial (and self-evident!).
3. On the object level: phenomenal consciousness is in fact an introspective illusion.
‘Phenomenal consciousness isn’t real’, ‘phenomenal consciousness is real and reducible’, and ‘phenomenal consciousness is real and fundamental’ are three substantively different views.
Gotcha. Either way, I think this is a great idea for a thread, and I appreciate you making it. :)
To avoid confusion, when I say “agent foundations” I mean one of these things:
Work that’s oriented toward the original “Agent Foundations” agenda, which put a large focus on “highly reliable agent design” (usually broken up into logical uncertainty and naturalized induction, decision theory, and Vingean reflection), and also tends to apply an HRAD-informed perspective to understanding things like corrigibility and value learning.
Work that’s oriented toward the “Embedded Agency” confusions, which are mostly the same as the original “Agent Foundations” agenda plus subsystem alignment.
We originally introduced the term “agent foundations” because (a) some people (I think Stuart Russell?) thought it was a better way of signposting the kind of alignment research we were doing, and (b) we wanted to distinguish our original research agenda from the 2016 “Alignment for Advanced Machine Learning Systems” agenda (AAMLS).
A better term might have been “agency foundations,” since you almost certainly don’t want your first AGI systems to be “agentic” in every sense of the word, but you do want to fundamentally understand the components of agency (good reasoning, planning, self-modeling, optimization, etc.). The idea is to understand how agency works, but not to actually build a non-task-directed, open-ended optimizer (until you’ve gotten a lot of practice with more limited, easier-to-align AGI systems).
(I work at MIRI.)
We’re still pursuing work related to agent foundations, embedded agency, etc. We shifted a large amount of our focus onto the “new research directions” in early 2017 (post), and then we wrote a longer explanation of what we were doing and why in 2018 (post). The 2020 strategy update is an update that MIRi’s scaling back work on the “new research directions,” not scaling back work on the set of projects linked to agent foundations.
Connor’s last talk was good:
I look forward to checking out the recording of this upcoming one.