Thanks for drawing distinctions—I mean #1 only.
Is there already a concept handle for the notion of a Problem Where The Intuitive Solution Actually Makes It Worse But Makes You Want To Use Even More Dakka On It?
My most salient example is the way that political progressives in the Bay Area tried using restrictive zoning and rent control in order to prevent displacement… but this made for a housing shortage and made the existing housing stock skyrocket in value… which led to displacement happening by other (often cruel and/or backhanded) methods… which led to progressives concluding that their rules weren’t restrictive enough.
Another example is that treating a chunk of the population with contempt makes a good number of people in that chunk become even more opposed to you, which makes you want to show even more contempt for them, etc. (Which is not to say their ideas are correct or even worthy of serious consideration—but the people are always worthy of respect.)
That sort of dynamic is how you can get an absolutely fucked-up self-reinforcing situation, an inadequate quasi-equilibrium that’s not even a Nash equilibrium, but exists because at least one party is completely wrong about its incentives.
(And before you get cynical, of course there are disingenuous people whose preferences are perfectly well served in that quasi-equilibrium. But most activists do care about the outcomes, and would change their actions if they were genuinely convinced the outcomes would be different.)
You can see my other reviews from this and past years, and check that I don’t generally say this sort of thing:
This was the best post I’ve written in years. I think it distilled an idea that’s perennially sorely needed in the EA community, and presented it well. I fully endorse it word-for-word today.
The only edit I’d consider making is to have the “Denial” reaction explicitly say “that pit over there doesn’t really exist”.
(Yeah, I know, not an especially informative review—just that the upvote to my past self is an exceptionally strong one.)
Re: your second paragraph, I was (and am) of the opinion that, given the first sentence, readers were in danger of being sucked down into their thoughts on the object-level topic before they would even reach the meta-level point. So I gave a hard disclaimer then and there.
Your mileage varied, of course, but I model more people as having been saved by the warning lights than blinded by them.
There are some posts with perennial value, and some which depend heavily on their surrounding context. This post is of the latter type. I think it was pretty worthwhile in its day (and in particular, the analogy between GPT upgrades and developmental stages is one I still find interesting), but I leave it to you whether the book should include time capsules like this.
It’s also worth noting that, in the recent discussions, Eliezer has pointed to the GPT architecture as an example that scaling up has worked better than expected, but he diverges from the thesis of this post on a practical level:
I suspect that you cannot get this out of small large amounts of gradient descent on small large layered transformers, and therefore I suspect that GPT-N does not approach superintelligence before the world is ended by systems that look differently, but I could be wrong about that.
I unpack this as the claim that someone will always be working on directly goal-oriented AI development, and that inner optimizers in an only-indirectly-goal-oriented architecture like GPT-N will take enough hardware that someone else will have already built an outer optimizer by the time it happens.
That sounds reasonable, it’s a consideration I’d missed at the time, and I’m sure that OpenAI-sized amounts of money will be paid into more goal-oriented natural language projects adapted to whatever paradigm is prominent at the time. But I still agree with Eliezer’s “but I could be wrong” here.
Fighting is different from trying. To fight harder for X is more externally verifiable than to try harder for X.
It’s one thing to acknowledge that the game appears to be unwinnable. It’s another thing to fight any less hard on that account.
One tiny note: I was among the people on AAMLS; I did leave MIRI the next year; and my reasons for so doing are not in any way an indictment of MIRI. (I was having some me-problems.)
I still endorse MIRI as, in some sense, being the adults in the AI Safety room, which has… disconcerting effects on my own level of optimism.
Ditto—the first half makes it clear that any strategy which isn’t at most 2 years slower than an unaligned approach will be useless, and that prosaic AI safety falls into that bucket.
Thanks for asking about the ITT.
I think that if I put a more measured version of myself back into that comment, it has one key difference from your version.
“Pay attention to me and people like me” is a status claim rather than a useful model.
I’d have said “pay attention to a person who incurred social costs by loudly predicting one later-confirmed bad actor, when they incur social costs by loudly predicting another”.
(My denouncing of Geoff drove a wedge between me and several friends, including my then-best friend; my denouncing of the other one drove a wedge between me and my then-wife. Obviously those rifts had much to do with how I handled those relationships, but clearly it wasn’t idle talk from me.)
Otherwise, I think the content of your ITT is about right.
(The emotional tone is off, even after translating from Duncan-speak to me-speak, but that may not be worth going into.)
For the record, I personally count myself 2 for 2.5 on precision. (I got a bad vibe from a third person, but didn’t go around loudly making it known; and they’ve proven to be not a trustworthy person but not nearly as dangerous as I view the other two. I’ll accordingly not name them.)
Thanks, supposedlyfun, for pointing me to this thread.
I think it’s important to distinguish my behavior in writing the comment (which was emotive rather than optimized—it would even have been in my own case’s favor to point out that the 2012 workshop was a weeklong experiment with lots of unstructured time, rather than the weekend that CFAR later settled on, or to explain that his CoZE idea was to recruit teens to meddle with the other participants’ CoZE) from the behavior of people upvoting the comment.
I expect that many of the upvotes were not of the form “this is a good comment on the meta level” so much as “SOMEBODY ELSE SAW THE THING ALL ALONG, I WORRIED IT WAS JUST ME”.
Is this meant to be a linkpost? I don’t see any content except for the comment above.
The subconscious mind knows exactly what it’s flinching away from considering. :-)
Non-agenda’d question: about when did you notice changes in him?
A secondary concern in that it’s better to have one org that has some people in different locations, but everyone communicating heavily, than to have two separate organizations.
Sure—and MIRI/FHI are a decent complement to each other, the latter providing a respectable academic face to weird ideas.
Generally though, it’s far more productive to have ten top researchers in the same org rather than having five orgs each with two top researchers and a couple of others to round them out. Geography is a secondary concern to that.
Additionally, as a canary statement: I was also never asked to sign an NDA.
Thank you for writing this, Jessica. First, you’ve had some miserable experiences in the last several years, and regardless of everything else, those times sound terrifying and awful. You have my deep sympathy.
Regardless of my seeing a large distinction between the Leverage situation and MIRI/CFAR, I agree with Jessica that this is a good time to revisit the safety of various orgs in the rationality/EA space.
I almost perfectly overlapped with Jessica at MIRI from March 2015 to June 2017. (Yes, this uniquely identifies me. Don’t use my actual name here anyway, please.) So I think I can speak to a great deal of this.
I’ll run down a summary of the specifics first (or at least, the specifics I know enough about to speak meaningfully), and then at the end discuss what I see overall.
Claim: People in and adjacent to MIRI/CFAR manifest major mental health problems, significantly more often than the background rate.
I think this is true; I believe I know two of the first cases to which Jessica refers; and I’m probably not plugged-in enough socially to know the others. And then there’s the Ziz catastrophe.
Claim: Eliezer and Nate updated sharply toward shorter timelines, other MIRI researchers became similarly convinced, and they repeatedly tried to persuade Jessica and others.
This is true, but non-nefarious in my genuine opinion, because it’s a genuine belief and because given that belief, you’ll have better odds of success if the whole team at least takes the hypothesis quite seriously.
(As for me, I’ve stably been at a point where near-term AGI wouldn’t surprise me much, but the lack of it also wouldn’t surprise me much. That’s all it takes, really, to be worried about near-term AGI.)
Claim: MIRI started getting secretive about their research.
This is true, to some extent. Nate and Eliezer discussed with the team that some things might have to be kept secret, and applied some basic levels of it to things we thought at the time might be AGI-relevant instead of only FAI-relevant. I think that here, the concern was less about AGI timelines and more about the multipolar race caused by DeepMind vs OpenAI. Basically any new advance gets deployed immediately in our current world.
However, I don’t recall ever being told I’m not allowed to know what someone else is working on, at least in broad strokes. Maybe my memory is faulty here, but it diverges from Jessica’s.
(I was sometimes coy about whether I knew anything secret or not, in true glomarization fashion; I hope this didn’t contribute to that feeling.)
There are surely things that Eliezer and Nate only wanted to discuss with each other, or with a specific researcher or two.
Claim: MIRI had rarity narratives around itself and around Eliezer in particular.
This is true. It would be weird if, given MIRI’s reason for being, it didn’t at least have the institutional rarity narrative—if one believed somebody else were just as capable of causing AI to be Friendly, clearly one should join their project instead of starting one’s own.
About Eliezer, there was a large but not infinite rarity narrative. We sometimes joked about the “bus factor”: if researcher X were hit by a bus, how much would the chance of success drop? Setting aside that this is a ridiculous and somewhat mean thing to joke about, the usual consensus was that Eliezer’s bus quotient was the highest one but that a couple of MIRI’s researchers put together exceeded it. (Nate’s was also quite high.)
(My expectation is that the same would not have been said about Geoff within Leverage.)
Claim: Working at MIRI/CFAR made it harder to connect with people outside the community.
There’s an extent to which this is true of any community that includes an idealistic job (i.e. a paid political activist probably has likeminded friends and finds it a bit more difficult to connect outside that circle). Is it true beyond that?
Not for me, at least. I maintained my ties with the other community I’d been plugged into (social dancing) and kept in good touch with my family (it helps that I have a really good family). As with the above example, the social path of least resistance would have been to just be friends with the same network of people in one’s work orbit, but there wasn’t anything beyond that level of gravity in effect for me.
Claim: CFAR got way too far into Shiny-Woo-Adjacent-Flavor-Of-The-Week.
This is a unfair framing… because I agree with Jessica’s claim 100%. Besides Kegan Levels and the MAPLE dalliance, there was the Circling phase and probably much else I wasn’t around for.
As for causes, I’ve been of the opinion that Anna Salamon has a lot of strengths around communicating ideas, but that her hiring has had as many hits as misses. There’s massive churn, people come in with their Big Ideas and nobody to stop them, and also people come in who aren’t in a good emotional place for their responsibilities. I think CFAR would be better off if Anna delegated hiring to someone else. [EDIT: Vaniver corrects me to say that Pete Michaud has been mostly in charge of hiring for the past several years, in which case I’m criticizing him rather than Anna for any bad hiring decisions during that time.]
Essentially, I think there’s one big difference between issues with MIRI/CFAR and issues at Leverage:
The actions of CFAR/MIRI harmed people unintentionally, as evidenced by the result that people burned out and left quickly and with high frequency. The churn, especially in CFAR, hurt the mission, so it was definitely not the successful result of any strategic process.
Geoff Anders and others at Leverage harmed people intentionally, in ways that were intended to maintain control over those people. And to a large extent, that seems to have succeeded until Leverage fell apart.
Specifically, [accidentally triggering psychotic mental states by conveying a strange but honestly held worldview without adding adequate safeties] is different from [intentionally triggering psychotic mental states in order to pull people closer and prevent them from leaving], which is Zoe’s accusation. Even if it’s possible for a mental breakdown to be benign under the right circumstances, and even if an unplanned one is more likely to result in very very wrong circumstances, I’m far more terrified of a group that strategically plans for its members to have psychosis with the intent of molding those members further toward the group’s mission.
Unintentional harm is still harm, of course! It might have even been greater harm in total! But it makes a big difference when it comes to assessing how realistic a project of reform might be.
There are surely some deep reforms along these lines that CFAR/MIRI must consider. For one thing: scrupulosity, in the context of AI safety, seems to be a common thread in several of these breakdowns. I’ve taken this seriously enough in the past to post extensively on it here. I’d like CFAR/MIRI leadership to carefully update on how scrupulosity hurts both their people and their mission, and think about changes beyond surface-level things like adding a curriculum on scrupulosity. The actual incentives ought to change.
Finally, a good amount of Jessica’s post (similarly to Zoe’s post) concerns her inner experiences, on which she is the undisputed expert. I’m not ignoring those parts above. I just can’t say anything about them, merely that as a third person observer it’s much easier to discuss the external realities than the internal ones. (Likewise with Zoe and Leverage.)
My own strong agreement with the content makes it hard to debias my approval here, but I want to generally massively praise edits that explicitly cross out the existing comment, and state that they’ve changed their minds, and why they’ve done so.
(There are totally good reasons to retract without comment, of course, and I’m glad that LW now offers this option. I’m just giving Davis credit for putting his update out there like this.)
There’s a lot going on in this comment, but I note with interest that this is the first time I’ve seen someone weigh in on questions of cultish behavior from the perspective of a former cult leader.
I’m fascinated with the claim that if you take on the outer facade of a cult, you now have a strong incentive gradient to turn up the cultishness (maybe because you’re now drawing in people who are looking for more of that, and driving away anyone who’s put off by it). Obviously the claim needs more than one person’s testimony, but it makes sense.
I wonder if some early red flags with Leverage (living together with your superiors who also did belief reporting sessions with you, believing Geoff’s theories were the word of god, etc) were explicitly laughed off as “oh, haha, we know we’re not a cult, so we can chuckle about our resemblances to cults”.
On the other hand, sometimes people end up walking right through what the established experts thought to be a wall. The rise of deep learning from a stagnant backwater in 2010 to a dominant paradigm today (crushing the old benchmarks in basically all of the most well-studied fields) is one such case.
In any particular case, it’s best to expect progress to take much, much longer than the Inside View indicates. But at the same time, there’s some part of the research world where a major rapid shift is about to happen.