Running Lightcone Infrastructure, which runs LessWrong. You can reach me at habryka@lesswrong.com
habryka(Oliver Habryka)
The post explicitly calls for thinking about how this situation is similar to what is happening/happened at Leverage, and I think that’s a good thing to do. I do think that I do have specific evidence that makes me think that what happened at Leverage seemed pretty different from my experiences with CFAR/MIRI.
Like, I’ve talked to a lot of people about stuff that happened at Leverage in the last few days, and I do think that overall, the level of secrecy and paranoia about information leaks at Leverage seemed drastically higher than anywhere else in the community that I’ve seen, and I feel like the post is trying to draw some parallel here that fails to land for me (though it’s also plausible it is pointing out a higher level of information control than I thought was present at MIRI/CFAR).
I have also had my disagreements with MIRI being more secretive, and think it comes with a high cost that I think has been underestimated by at least some of the leadership, but I haven’t heard of people being “quarantined from their friends” because they attracted some “set of demons/bad objects that might infect others when they come into contact with them”, which feels to me like a different level of social isolation, and is part of the thing that happened in Leverage near the end. Whereas I’ve never heard of anything even remotely like this happening at MIRI or CFAR.
To be clear, I think this kind of purity dynamic is also present in other contexts, like high-class/low-class dynamics, and various other problematic common social dynamics, but I haven’t seen anything that seems to result in as much social isolation and alienation, in a way that seemed straightforwardly very harmful to me, and more harmful than anything comparable I’ve seen in the rest of the community (though not more harmful than what I have heard from some people about e.g. working at Apple or the U.S. military, which seem to have very similarly strict procedures and also a number of quite bad associated pathologies).
The other biggest thing that feels important to distinguish between what happened at Leverage and the rest of the community is the actual institutional and conscious optimization that has gone into PR control.
Like, I think Ben Hoffman’s point about “Blatant lies are the best kind!” is pretty valid, and I do think that other parts of the community (including organizations like CEA and to some degree CFAR) have engaged in PR control in various harmful but less legible ways, but I do think there is something additionally mindkilly and gaslighty about straightforwardly lying, or directly threatening adversarial action to prevent people from speaking ill of someone, in the way Leverage has. I always felt that the rest of the rationality community had a very large and substantial dedication to being very clear about when they denotatively vs. connotatively disagree with something, and to have a very deep and almost religious respect for the literal truth (see e.g. a lot of Eliezer’s stuff around the wizard’s code and meta honesty), and I think the lack of that has made a lot of the dynamics around Leverage quite a bit worse.
I also think it makes understanding the extent of the harm and ways to improve it a lot more difficult. I think the number of people who have been hurt by various things Leverage has done is really vastly larger than the number of people who have spoken out so far, in a ratio that I think is very different from what I believe is true about the rest of the community. As a concrete example, I have a large number of negative Leverage experiences between 2015-2017 that I never wrote up due to various complicated adversarial dynamics surrounding Leverage and CEA (as well as various NDAs and legal threats, made by both Leverage and CEA, not leveled at me, but leveled at enough people around me that I thought I might cause someone serious legal trouble if I repeat a thing I heard somewhere in a more public setting), and I feel pretty confident that I would feel very different if I had similarly bad experiences with CFAR or MIRI, based on my interactions with both of these organizations.
I think this kind of information control feels like what ultimately flips things into the negative for me, in this situation with Leverage. Like, I think I am overall pretty in favor of people gathering together and working on a really intense project, investing really hard into some hypothesis that they have some special sauce that allows them to do something really hard and important that nobody else can do. I am also quite in favor of people doing a lot of introspection and weird psychology experiments on themselves, and to try their best to handle the vulnerability that comes with doing that near other people, even though there is a chance things will go badly and people will get hurt.
But the thing that feels really crucial in all of this is that people can stay well-informed and can get the space they need to disengage, can get an external perspective when necessary, and somehow stay grounded all throughout this process. Which feels much harder to do in an environment where people are directly lying to you, or where people are making quite explicit plots to discredit you, or harm you in some other way, if you do leave the group, or leak information.
I do notice that in the above I make various accusations of lying or deception by Leverage without really backing it up with specific evidence, which I apologize for, and I think people reading this should overall not take comments like mine at face value before having heard something pretty specific that backs up the accusations in them. I have various concrete examples I could give, but do notice that doing so would violate various implicit and explicit confidentiality agreements I made, that I wish I had not made, and I am still figuring out whether I can somehow extract and share the relevant details, without violating those agreements in any substantial way, or whether it might be better for me to break the implicit ones of those agreements (which seem less costly to break, given that I felt like I didn’t really fully consent to them), given the ongoing pretty high cost.
- 18 Oct 2021 13:53 UTC; 90 points) 's comment on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage) by (
- 24 Oct 2021 7:28 UTC; 3 points) 's comment on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage) by (
I feel like one really major component that is missing from the story above, in particular a number of the psychotic breaks, is to mention Michael Vassar and a bunch of the people he tends to hang out with. I don’t have a ton of detail on exactly what happened in each of the cases where someone seemed to have a really bad time, but having looked into it for a few hours in each case, I think all three of them were in pretty close proximity to having spent a bunch of time (and in some of the cases after taking psychedelic drugs) with Michael.
I think this is important because Michael has I think a very large psychological effect on people, and also has some bad tendencies to severely outgroup people who are not part of his very local social group, and also some history of attacking outsiders who behave in ways he doesn’t like very viciously, including making quite a lot of very concrete threats (things like “I hope you will be guillotined, and the social justice community will find you and track you down and destroy your life, after I do everything I can to send them onto you”). I personally have found those threats to very drastically increase the stress I experience from interfacing with Michael (and some others in his social group), and also my models of how these kinds of things happen have a lot to do with dynamics where this kind of punishment is expected if you deviate from the group norm.
I am not totally confident that Michael has played a big role in all of the bad psychotic experiences listed above, but my current best guess is that he has, and I do indeed pretty directly encourage people to not spend a lot of time with Michael (though I do think talking to him occasionally is actually great and I have learned a lot of useful things from talking to him, and also think he has helped me see various forms of corruption and bad behavior in my environment that I am genuinely grateful to have noticed, but I very strongly predict that I would have a very intensely bad experience if I were to spend more time around Michael, in a way I would not endorse in the long run).
RLHF is just not that important to the bottom line right now. Imitation learning works nearly as well, other hacky techniques can do quite a lot to fix obvious problems, and the whole issue is mostly second order for the current bottom line.
I am very confused why you think this, just right after the success of Chat-GPT, where approximately the only difference from GPT-3 was the presence of RLHF.
My current best guess is that Chat-GPT alone, via sparking an arms-race between Google and Microsoft, and by increasing OpenAIs valuation, should be modeled as the equivalent of something on the order of $10B of investment into AI capabilities research, completely in addition to the gains from GPT-3.
And my guess is most of that success is attributable to the work on RLHF, since that was really the only substantial difference between Chat-GPT and GPT-3. We also should not think this was overdetermined since 1.5 years passed since the release of GPT-3 and the release of Chat-GPT (with some updates to GPT-3 in the meantime, but my guess is no major ones), and no other research lab focused on capabilities had set up their own RLHF pipeline (except Anthropic, which I don’t think makes sense to use as a datapoint here, since it’s in substantial parts the same employees).
I have been trying to engage with the actual details here, and indeed have had a bunch of arguments with people over the last 2 years where I have been explicitly saying that RLHF is pushing on commercialization bottlenecks based on those details, and people believing this was not the case was the primary crux on whether RLHF was good or bad in those conversations.
The crux was importantly not that other people would do the same work anyways, since people at the same time also argued that their work on RLHF was counterfactually relevant and that it’s pretty plausible or likely that the work would otherwise not happen. I’ve had a few of these conversations with you as well (though in aggregate not a lot) and your take at the time was (IIRC) that it seemed quite unlikely that RLHF would have as big of an effect as it did have in the case of Chat-GPT (mostly via an efficiency argument that if that was the case, more capabilities-oriented people would work on it, and since they weren’t it likely isn’t a commercialization bottleneck), and so I do feel a bit like I want to call you out on that, though I might also be misremembering the details (some of this was online, so might be worth going back through our comment histories).
- Shutting Down the Lightcone Offices by 14 Mar 2023 22:47 UTC; 337 points) (
- Shutting Down the Lightcone Offices by 15 Mar 2023 1:46 UTC; 242 points) (EA Forum;
- How to talk about reasons why AGI might not be near? by 17 Sep 2023 8:18 UTC; 71 points) (
- 27 Mar 2023 11:10 UTC; 16 points) 's comment on David_Althaus’s Quick takes by (EA Forum;
- 17 Mar 2023 17:54 UTC; 7 points) 's comment on Shutting Down the Lightcone Offices by (
I think this comment is overstating the case for policymakers and the electorate actually believing that investing in AI is good for the world. I think the answer currently is “we don’t know what policymakers and the electorate actually want in relation to AI” as well as “the relationship of policymakers and the electorate is in the middle of shifting quite rapidly, so past actions are not that predictive of future actions”.
I really only have anecdata to go on (though I don’t think anyone has much better), but my sense from doing informal polls of e.g. Uber drivers, people on Twitter, and perusing a bunch of Subreddits (which, to be clear, is a terrible sample) is that indeed a pretty substantial fraction of the world is now quite afraid of the consequences of AI, both in a “this change is happening far too quickly and we would like it to slow down” sense, and in a “yeah, I am actually worried about killer robots killing everyone” sense. I think both of these positions are quite compatible with pushing for a broad slow down. There is also a very broad and growing “anti-tech” movement that is more broadly interested in giving less resources to the tech sector, whose aims are at least for a long while compatible with slowing down AGI progress.
My current guess is that policies that are primarily aimed at slowing down and/or heavily regulating AI research are actually pretty popular among the electorate, and I also expect them to be reasonably popular among policymakers, though I also expect their preferences to lag behind the electorate for a while. But again, I really think we don’t know, and nobody has run even any basic surveys on the topic yet.
Edit: Inspired by this topic/discussion, I ended up doing some quick google searches for AI opinion polls. I didn’t find anything great, but this Pew report has some stuff that’s pretty congruent with potential widespread support for AI regulation: https://www.pewresearch.org/internet/2022/03/17/how-americans-think-about-artificial-intelligence/
- 23 Dec 2022 23:43 UTC; 90 points) 's comment on Let’s think about slowing down AI by (
- 26 Dec 2022 3:19 UTC; 35 points) 's comment on Let’s think about slowing down AI by (EA Forum;
- 26 Jan 2023 7:56 UTC; 8 points) 's comment on My highly personal skepticism braindump on existential risk from artificial intelligence. by (EA Forum;
I think there is a question of whether current LessWrong is the right place for this discussion (there are topics that will attract unwanted attention, and when faced with substantial adversarial forces, I think it is OK for LessWrong to decide to avoid those topics as long as they don’t seem of crucial importance for the future of humanity, or have those discussions in more obscure ways, or to limit visibility to just some subset of logged-in users, etc). But leaving that discussion aside, basically everything in this post strikes me as “obviously true” and I had a very similar reaction to what the OP says now, when I first encountered the Eliezer Facebook post that this post is responding to.
And I do think that response mattered for my relationship to the rationality community. I did really feel like at the time that Eliezer was trying to make my map of the world worse, and it shifted my epistemic risk assessment of being part of the community from “I feel pretty confident in trusting my community leadership to maintain epistemic coherence in the presence of adversarial epistemic forces” to “well, I sure have to at least do a lot of straussian reading if I want to understand what people actually believe, and should expect that depending on the circumstances community leaders might make up sophisticated stories for why pretty obviously true things are false in order to not have to deal with complicated political issues”.
I do think that was the right update to make, and was overdetermined for many different reasons, though it still deeply saddens me.
I feel like even under the worldview that your beliefs imply, a superintelligence will just make a brain the size of a factory, and then be in a position to outcompete or destroy humanity quite easily.
Maybe it will do that using GPUs, or maybe it will do that using some more neuromorphic design, but I really don’t understand why energy density matters very much. The vast majority of energy that current humans produce is of course not spent on running human brains, and there are easily 10-30 OOMs of improvement lying around without going into density (just using the energy output of a single power plant under your model would produce something that would likely be easily capable of disempowering humanity).
More broadly, you list these three “assumptions” of Eliezer’s worldview:
The brain inefficiency assumption: The human brain is inefficient in multiple dimensions/ways/metrics that translate into intelligence per dollar; inefficient as a hardware platform in key metrics such as thermodynamic efficiency.
The mind inefficiency or human incompetence assumption: In terms of software he describes the brain as an inefficient complex “kludgy mess of spaghetti-code”. He derived these insights from the influential evolved modularity hypothesis as popularized in ev pysch by Tooby and Cosmides. He boo-hooed neural networks, and in fact actively bet against them in actions by hiring researchers trained in abstract math/philosophy, ignoring neuroscience and early DL, etc.
The more room at the bottom assumption: Naturally dovetailing with points 1 and 2, EY confidently predicts there is enormous room for further hardware improvement, especially through strong drexlerian nanotech.
None of these strike me as “assumptions” (and also point 3 is just the same as point 1 as far as I can tell, and point 2 mischaracterizes at least my beliefs, and I would bet also would not fit historical data, but that’s a separate conversation).
Having more room at the bottom is just one of a long list of ways to end up with AIs much smarter than humans. Maybe you have rebuttals to all the other ways AIs could end up much smarter than humans (like just using huge datacenters, or doing genetic engineering, or being able to operate at much faster clock speeds), in which case I am quite curious about that, but I would definitely not frame these as “necessary assumptions for a foom-like scenario”.
Epistemic status: Quick rant trying to get a bunch of intuitions and generators across, written in a very short time. Probably has some opinions expressed too strongly in various parts.
I appreciate seeing this post written, but currently think that if more people follow the advice in this post, this would make the world a lot worse (This is interesting because I personally think that at least for me, doing a bunch of “buying time” interventions are top candidates for what I think I should be doing with my life) I have a few different reasons for this:
I think organizing a group around political action is much more fraught than organizing a group around a set of shared epistemic virtues and confusions, and I expect a community that spent most of its time on something much closer to political advocacy would very quickly go insane. I think especially young people should really try to not spend their first few years after getting interested in the future of humanity going around and convincing others. I think that’s a terrible epistemic environment in which to engage honestly with the ideas.
I think the downside risk of most of these interventions is pretty huge, mostly because of effects on epistemics and morals (I don’t care that much about e.g. annoying capabilities researchers or something). I think a lot of these interventions have tempting paths where you exaggerate or lie or generally do things that are immoral in order to acquire political power, and I think this will both predictably cause a lot of high-integrity people to feel alienated and will cause the definitions and ontologies around AI alignment to get muddled, both within our own minds and the mind of the public.
Separately, many interventions in this space actually just make timelines shorter. I think sadly on the margin going around and trying to tell people about how dangerous and powerful AGI is going to be seems to mostly cause people to be even more interested in participating in its development, so that they can have a seat at the table when it comes to the discussion of “how to use this technology” (despite by far the biggest risk being accident risk, and it really not mattering very much for what ends we will try to use the technology)
I think making the world more broadly “safety concerned” does not actually help very much with causing people to go slower, or making saner decisions around AI. I think the dynamics here are extremely complicated and we have many examples of social movements and institutions that ended up achieving the opposite of their intended goals as the movement grew (e.g. environmentalists being the primary reason for the pushback against nuclear power).
The default of spreading vague “safety concern” memes is that people will start caring a lot about “who has control over the AI” and a lot of tribal infighting will happen, and decisions around AI will become worse. Nuclear power did not get safer when the world tried to regulate the hell out of how to design nuclear power plants (and in general the reliability literature suggests that trying to legislate reliability protocols does not work and is harmful). A world where the AIs are “legislated to be transparent” probably implies a world of less transparent AIs because the legislations for transparency will be crazy and dumb and not actually help much with understanding what is going on inside of your AIs.Historically some strategies in this space have involved a lot of really talented people working at capabilities labs in order to “gain influence and trust”. I think the primary effect of this has also been to build AGI faster, with very limited success at actually getting these companies on a trajectory that will not have a very high chance of developing unaligned AI, or even getting any kind of long-term capital.
I think one of the primary effects of trying to do more outreach to ML-researchers has been a lot of people distorting the arguments in AI Alignment into a format that can somehow fit into ML papers and the existing ontology of ML researchers. I think this has somewhat reliably produced terrible papers with terrible pedagogy and has then caused many people to become actively confused about what actually makes AIs safe (with people walking away thinking that AI Alignment people want to teach AIs how to reproduce moral philosophy, or that OpenAIs large language model have been successfully “aligned”, or that we just need to throw some RLHF at the problem and the AI will learn our values fine). I am worried about seeing more of this, and I think this will overall make our job of actually getting humanity to sensibly relate to AI harder, not easier.
I think all of these problems can be overcome, and have been overcome by many people in the community, but I predict that the vast majority of efforts, especially by people who are not in the top percentiles of thinking about AI, and who avoided building detailed models of the space because they went into “buying time” interventions like the ones above, will result in making things worse in the way I listed above.
That said, I think I am actually excited about more people pursuing “buying time” interventions, but I don’t think trying to send the whole gigantic EA/longtermist community onto this task is going to be particularly fruitful. In contrast to your graph above, my graph for the payoff by percentile-of-talent for interventions in the space looks more like this:
My guess is maybe some people around the 70th percentile or so should work harder on buying us time, but I find myself very terrified of a 10,000 person community all somehow trying to pursue interventions like the ones you list above.
- The Alignment Community Is Culturally Broken by 13 Nov 2022 18:53 UTC; 136 points) (
- Part 1: The AI Safety community has four main work groups, Strategy, Governance, Technical and Movement Building by 25 Nov 2022 3:45 UTC; 72 points) (EA Forum;
- Ways to buy time by 12 Nov 2022 19:31 UTC; 47 points) (EA Forum;
- 13 Nov 2022 12:22 UTC; 43 points) 's comment on Ways to buy time by (
- Part 2: AI Safety Movement Builders should help the community to optimise three factors: contributors, contributions and coordination by 15 Dec 2022 22:48 UTC; 34 points) (EA Forum;
- Ways to buy time by 12 Nov 2022 19:31 UTC; 34 points) (
- Part 3: A Proposed Approach for AI Safety Movement Building: Projects, Professions, Skills, and Ideas for the Future [long post][bounty for feedback] by 22 Mar 2023 0:54 UTC; 21 points) (EA Forum;
- A Proposed Approach for AI Safety Movement Building: Projects, Professions, Skills, and Ideas for the Future [long post][bounty for feedback] by 22 Mar 2023 1:11 UTC; 14 points) (
- AI Safety Movement Builders should help the community to optimise three factors: contributors, contributions and coordination by 15 Dec 2022 22:50 UTC; 4 points) (
- 13 Nov 2022 23:50 UTC; 3 points) 's comment on The Alignment Community Is Culturally Broken by (
- The AI Safety community has four main work groups, Strategy, Governance, Technical and Movement Building by 25 Nov 2022 3:45 UTC; 1 point) (
A while ago I got most of the way to set up a feature on LW/AIAF that would export LW/AIAF posts to a nicely formatted academic-looking PDF that is linkable. I ended up running into a hurdle somewhat close to the end and shelved the feature, but if there is a lot of demand here, I could probably finish up the work, which would make this process even easier.
I appreciate the object-level responses this post made and think it’s good to poke at various things Eliezer has said (and also think Eliezer is wrong about a bunch of stuff, including the animal consciousness example in the post). In contrast, I find the repeated assertions of “gross overconfidence” and associated snarkiness annoying, and in many parts of the post the majority of the text seems to be dedicated to repeated statements of outrage with relatively little substance (Eliezer also does this sometimes, and I also find it somewhat annoying in his case, though I haven’t seen any case where he does it this much).
I spent quite a lot of time thinking about all three of these questions, and I currently think the arguments this post makes seem to misunderstand Eliezer’s arguments for the first two, and also get the wrong conclusions on both of them.
For the third one, I disagree with Eliezer, but also, it’s a random thing that Eliezer has said once on Facebook and Twitter, that he hasn’t argued for. Maybe he has good arguments for it, I don’t know. He never claimed anyone else should be convinced by the things he has written up, and I personally don’t understand consciousness or human values well enough to have much of any confident take here. My current best guess is that Eliezer is wrong here, and I would be interested in him seeing him write up his takes, but most of the relevant section seems to boil down to repeatedly asserting that Eliezer has made no arguments for his position, when like, yeah, that’s fine, I don’t see that as a problem. I form most of my beliefs without making my arguments legible to random people on the internet.
Thank you for the critique and sorry for the bad experience!
We should really create better notifications for people who apply, and set better expectations that you shouldn’t expect a quick response (for reasons I will elaborate on below). I also overall think the current “become a member” experience is at least somewhat confusing and much less smooth than it could be, and this post helped me think about what we could do better.
Here are some thoughts and clarifications and responses to a bunch of the things you said.
I applied three times to the AF and the only way I could communicate with the AF was via a personal connection who was willing to inquire on my behalf. This is pretty unresponsive behavior from the AF team and suggests there are equity issues with access to the community.
Use the Intercom button in the bottom right corner! We really are very accessible and respond to inquiries usually within 24 hours, even during Christmas and other holiday periods! Probably the fact that you went via an intermediary (who pinged a LW team member who wasn’t very involved in AIAF stuff on FB messenger) was exactly the reason why you had a less responsive experience. We really invest a lot in being accessible and responsive to inquiries, and we should maybe make it more clear that the Intercom is the recommended way to reach out to us (and that this will reliably get you to an AI Alignment Forum admin).
We are pretty transparent about the fact that we are generally very unlikely to accept almost any applications to the AI Alignment forum, and generally choose to expand membership proactively instead of reactively based on applications. In particular, the application is not something we respond to, it’s something that we take into account when we do a proactive round of invites, which is something we could probably be clearer about. From the application text:
We accept very few new members to the AI Alignment Forum. Instead, our usual suggestion is that visitors post to LessWrong.com, a large and vibrant intellectual community with a strong interest in alignment research, along with rationality, philosophy, and a wide variety of other topics.
Posts and comments on LessWrong frequently get promoted to the AI Alignment Forum, where they’ll automatically be visible to contributors here. We also use LessWrong as one of the main sources of new Alignment Forum members.
If you have produced technical work on AI alignment, on LessWrong or elsewhere—e.g., papers, blog posts, or comments—you’re welcome to link to it here so we can take it into account in any future decisions to expand the ranks of the AI Alignment Forum.
This is also the reason why the moderator you talked to told you that they didn’t want to make unilateral decisions on an application (you also reached out via Owen on like FB messenger to a member of the team only tangentially involved with AIAF stuff, which bypassed our usual processes that would have allowed us to properly escalate this). The right term for the applications might be more something like “information submission”, that we take into account when deciding on who to invite, not something that will get a reply in a short period of time. We are very careful with invites and basically never invite anyone in whom we don’t have a lot of trust, and who doesn’t have a long track record of AI Alignment contributions that we think are good.
In general, membership to the AI Alignment Forum is much more powerful than “being allowed to post to the AI Alignment Forum”. In particular, each member of the forum can promote any other comment or post to the AI Alignment forum.
As such, membership on the AI Alignment Forum is more similar to being a curator, instead of an author (most users with posts on the AI Alignment Forum do not have membership). Membership is only given to people we trust to make content promotion decisions, which is a high bar.
My profile is listed on this post, but I am actually not a member of the Alignment Forum, meaning I could not respond to any comments on the post (and cannot make other posts or comment on other posts).
This is a bit unfortunate and is a straightforward bug. Sorry about that! We currently have a system where the primary author of a post can always respond to comments, but we apparently didn’t implement the same logic for coauthors. I will make a Pull Request to fix this tonight, and I expect this will be merged and live by the end of the week. Really sorry about that!
Also, coauthors currently just can’t edit the post they are a coauthor on, which is mostly because there is some actual ambiguity on what the right solution here is, and the definitely-right thing would be a whole Google-Docs style permission system where you can give different people different levels of access to a post, and that is just a lot of development work. I do hope to have something like that eventually.
I do want to reiterate though that you can totally respond to any comments on your post. Just leave a comment on the LW version, and it will be promoted to the AIAF usually within a day, if it’s relevant and high quality. That is the recommended way for non-members to respond to comments anywhere on the AIAF, and you can think of it like a public moderation queue.
(1) Why was AF closed to the public? This seems obviously bad for the community. We are excluding some number of people who would productively engage with safety content on the AF from doing so. Of course there should be some community standard (i.e. a “bar”) for membership—this is a separate concern. It could also be that some active LW-ers actually did move onto the AF over this time period, due to some proactive mods. But this is not a public process, and I would imagine there are still a bunch of false negatives for membership.
When talking to researchers when setting up the forum, we had almost universal consensus that they wouldn’t want to participate in an open online forum, given the low quality of almost any open online forum on the internet. The closed nature of the forum is almost the full value-add for those researchers, who can rely on having a high signal-to-noise ratio in the comments and posts. The integration with LessWrong means that anyone can still comment, but only the high quality comments get promoted to the actual AI Alignment Forum, which allows filtering, and also makes people much more comfortable linking to AI Alignment Forum posts, since they can generally assume all content visible is high quality.
(2) I am also particularly concerned about anyone from the broader AI community finding out that this forum was effectively closed to the public, meaning closed to industry, academia, independent researchers, etc. The predominate view in the AI community is still that the (longtermist) AI Safety community holds fringe beliefs, by which I mean that job candidates on the circuit for professorships still refrain from talking about (longtermist) AI Safety in their job talks because they know it will lose them an offer (except maybe at Berkeley). I imagine the broader reaction to learning about this would be to further decrease how seriously AI Safety is taken in the AI community, which seems bad.
Since closed discussion venues and forums are the default for the vast majority of academic discussion, I am not super worried about this. The forum’s adoption seems to me to have overall been a pretty substantial step towards making the field’s discussion public. And the field overall also has vastly more of its discussion public than almost any academic field I can think of and can easily be responded to by researchers from a broad variety of fields, so I feel confused about the standard you are applying here. Which other fields or groups even have similar forums like this without very high standards for membership? And which one of the ones with membership display the comments publicly at all?
(3) I’m left wondering what the distinction is between the AF and LW (though this is less important). Is LW intended to be a venue for AI Safety discussion? Why not just make the AF that venue, and have LW be a hub for people interested in rationality, and have separate membership standards for each? If you’re concerned about quality or value alignment, just make it hard to become an AF member (e.g. with trial periods). I think it is very weird for LW to be considered a stepping stone to the AF, which is how the mods were treating it. I can say that as a person in academia with a general academic Twitter audience, I did not want our interpretability review to appear only on LW because I think of it as a forum for discussing rationality and I think most newcomers would too.
The FAQ has some things on the relationship here, though it’s definitely not as clear as it could be:
For non-members and future researchers, the place to interact with the content is LessWrong.com, where all Forum content will be crossposted.
[...]
Automatic Crossposting—Any new post or comment on the new AI Alignment Forum is automatically cross-posted to LessWrong.com. Accounts are also shared between the two platforms.
Content Promotion—Any comment or post on LessWrong can be promoted by members of the AI Alignment Forum from LessWrong to the AI Alignment Forum.
In short, content on the AI Alignment Forum is a strict subset of the content on LW. All content on the AI Alignment Forum is also on LW. This makes it hard to have separate standards for membership. The way the sites are set up, the only option is for the AIAF to have stricter standards than LW. Posting to the AI Alignment Forum currently also means posting to LessWrong. AI Alignment content is central to LessWrong and the site has always been one of the biggest online venues for AI Alignment discussion.
There are a lot of reasons for why we went with this setup. One of them is simply that online forums have a really hard time competing in the modern attention landscape (LessWrong is one of the longest lived and biggest online forums in the post-Facebook/Reddit era), and it’s really hard to get people to frequently check multiple sites. Since a substantial fraction of AI Alignment Researchers we were most excited about were already checking LessWrong frequently, combining the two made it a lot easier to sustain a critical mass of participants. Previous attempts at AI Alignment online forums usually failed at this step, and I expect that without this integration, the AI Alignment Forum would have also never reached critical mass.
It also enabled a public commenting system that was still actively moderated and was much higher quality than the rest of the public internet. While LW comments are generally lower quality than AI Alignment Forum comments, they are still one of the highest quality comment sections on the internet, and allowing any external researchers to comment and post on there, and allow their comments and posts to be promoted and engaged with, enables a much more public form of discussion and research than a strictly closed forum.
We could have had something like submission queues and trial periods, but when we talked to researchers and online commenters who had engaged with systems like that, they had almost universally negative experiences. Trial systems require a very large amount of moderation effort, which we simply don’t have available. If someone fails the trial, they are also now completely shut out from engaging with the content, which also seems quite bad. Submission queues also require a lot of moderation effort. They also create a much worse experience for the author, since they usually cannot receive any responses or votes or engagement until someone reviews their comment, which can sometimes take multiple days. In the current system, a commenter can immediately get an answer, or get votes on their comment, which also helps AIAF members decide whether to promote a comment.
(4) Besides the AI Safety vs. rationality distinction, there could be greater PR risks from a strong association between AI Safety and the LW community. LW has essays from Scott Alexander stickied, and though I really love his style of blogging, Scott Alexander is now a hot-button figure in the public culture war thanks to the New York Times. Broadly speaking, identifying as a rationalist now conveys some real cultural and political information. The big worry here would be if AI Safety was ever politicized in the way that, e.g., climate change is politicized—that could be a huge obstacle to building support for work on AI safety. Maybe I’m too worried about this, or the slope isn’t that slippery.
Yeah, I think this is a pretty valid concern. This was one of the things we most talked to people about when starting the forum, and overall it seemed worth the cost. But I do agree that there is definitely some PR risk that comes from being associated with LW.
Overall, sorry for the less than ideal experience! I would also be happy to hop on a call if you want to discuss potential changes to the forum or the general AI Alignment research setup around LessWrong and the AI Alignment Forum. Just ping me via PM or Intercom and I would be happy to schedule something. Also, do really feel free to reach out to us via Intercom any time you want. I try to respond quickly, and often also have extended discussions on there, if we happen to be online at the same time.
Edit: This comment refers to the site going down at 11pm PT last night, not the site going down now at ~5:40pm PT.
Hah, surprise! It was just a false alarm, the site is actually still up. Definitely not because we suck at programming and flipped a boolean in a giant boolean logic expression that should have definitely been better factored and therefore allowed anyone with zero karma (but only exactly zero karma) to launch the missiles.
This was of course totally intended and part of a metaphor of how Petrov had to deal with shoddy software engineering and false nuclear alarms. Take this as a lesson in… something. I am sorry.
I do really wish good luck to whoever is managing the resolution of that manifold market.
- Petrov Day Retrospective: 2022 by 28 Sep 2022 22:16 UTC; 107 points) (
- Ambiguity in Prediction Market Resolution is Harmful by 26 Sep 2022 16:22 UTC; 69 points) (
- 26 Sep 2022 6:43 UTC; 9 points) 's comment on LW Petrov Day 2022 (Monday, 9/26) by (
After discussing the matter with some other (non-Leverage) EAs, we’ve decided to wire $15,000 to Zoe Curzi (within 35 days).
A number of ex-Leveragers seem to be worried about suffering (financial, reputational, etc.) harm if they come forward with information that makes Leverage look bad (and some also seem worried about suffering harm if they come forward with information that makes Leverage look good). This gift to Zoe is an attempt to signal support for people who come forward with accounts like hers, so that people in Zoe’s reference class are more inclined to come forward.
We’ve temporarily set aside $85,000 in case others write up similar accounts—in particular, accounts where it would be similarly useful to offset the incentives against speaking up. We plan to use our judgment to assess reports on a case-by-case basis, rather than having an official set of criteria. (It’s hard to design formal criteria that aren’t gameable, and we were a bit wary of potentially setting up an incentive for people to try to make up false bad narratives about organizations, etc.)
Note that my goal isn’t to evaluate harms caused by Leverage and try to offset such harms. Instead, it’s trying to offset any incentives against sharing risky honest accounts like Zoe’s.
Full disclosure: I worked with a number of people from Leverage between 2015 and 2018. I have a pretty complicated, but overall relatively negative view of Leverage (as shown in my comments), though my goal here is to make it less costly for people around Leverage to share important evidence, not to otherwise weigh in on the object-level inquiry into what happened. Also, this comment was co-authored with some EAs who helped get the ball rolling on this, so it probably isn’t phrased the way I would have fully phrased it myself.
- 7 Sep 2023 21:57 UTC; 15 points) 's comment on Sharing Information About Nonlinear by (
drug addicts have or develop very strong preferences for drugs. The assertion that they can’t make their own decisions is a declaration of intent to coerce them, or an arrogation of the right to do so.
I really don’t think this is an accurate description of what is going on in people’s mind when they are experiencing drug dependencies. I’ve spent a good chunk of my childhood with an alcoholic father, and he would have paid most of his wealth to stop being addicted to drinking, went through great lengths trying to tie himself to various masts to stop, and generally expressed a strong preference for somehow being able to self-modify the addiction away, but ultimately failed to do so.
Of course, things might be different for different people, but at least in the one case where I have a very large amount of specific data, this seems like it’s a pretty bad model of people’s preferences. Based on the private notebooks of his that I found after his death, this also seemed to be his position in purely introspective contexts without obvious social desirability biases. My sense is that he would have strongly preferred someone to somehow take control away from him, in this specific domain of his life.
In particular, I want to remind people here that something like 30-40% of grad students at top universities have either clinically diagnosed depression or anxiety (link). I think given the kind of undirected, often low-paid, work that many have been doing for the last decade, I think that’s the right reference class to draw from, and my current guess is we are roughly at that same level, or slightly below it (which is a crazy high number, and I think should give us a lot of pause).
- 18 Oct 2021 12:12 UTC; 4 points) 's comment on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage) by (
IIRC the one case of jail time also had a substantial interaction with Michael relatively shortly before the psychotic break occurred. Though someone else might have better info here and should correct me if I am wrong. I don’t know of any 4th case, so I believe you that they didn’t have much to do with Michael. This makes the current record 4⁄5 to me, which sure seems pretty high.
Michael wasn’t talking much with Leverage people at the time.
I did not intend to indicate Michael had any effect on Leverage people, or to say that all or even a majority of the difficult psychological problems that people had in the community are downstream of Michael. I do think he had a large effect on some of the dynamics you are talking about in the OP, and I think any picture of what happened/is happening seems very incomplete without him and the associated social cluster.
I think the part about Michael helping people notice that they are in some kind of bad environment seems plausible to me, though doesn’t have most of my probability mass (~15%), and most of my probability mass (~60%) is indeed that Michael mostly just leverages the same mechanisms for building a pretty abusive and cult-like ingroup that are common, with some flavor of “but don’t you see that everyone else is completely crazy and evil” thrown into it.
I think it is indeed pretty common for abusive environments to start with “here is why your current environment is abusive in this subtle way, and that’s also why it’s OK for me to do these abusive-seeming things, because it’s not worse than anywhere else”. I think this was a really large fraction of what happened with Brent, and I also think a pretty large fraction of what happened with Leverage. I also think it’s a large fraction of what’s going on with Michael.
I do want to reiterate that I do assign substantial probability mass (~15%) to your proposed hypothesis being right, and am interested in more evidence for it.
Another feature that we also launched at the same time as this: Metaculus embeds!
Just copy paste any link to a Metaculus question into the editor, and it will automatically expand into the preview above (you can always undo the transformation with CTRL+Z).
I think a bounty for actually ending malaria is great. I think a bounty for unilaterally releasing gene drives is probably quite bad for the world.
Like, I think malaria is really bad, and worth making quite aggressive sacrifices for to end, but that at the end of the day, there are even bigger games in town, and setting the precedent of “people are rewarded for unilaterally doing crazy biotech shenanigans” has a non-negligible chance of increasing global catastrophic risk, and potentially even existential risk.
I think the pathways towards doing that are twofold:
-
We further erode the currently very fragile and unclear norms around not unilaterally releasing pathogen-adjacent stuff. While I think specifically a malaria gene-drive is unlikely to have catastrophic consequences here, the same logic feels like it gets you much closer to stuff that does sure end up dangerous, or towards technologies that enable much more dangerous things (in-general I am pretty wary of naive consequentialist reasoning of this type. A negative utilitarian could go through the same reasoning to conclude why it’s a good idea to release an omnicidal pathogen)
-
We broadly destabilize the world by having more people do naive consequentialist estimates of stuff, and then taking large world-reshaping actions to achieve them, without consulting very much with the people who are most likely to have properly thought through the consequences of those actions. In this case, my sense is that actually rushing to release gene drives is not a great idea and might very well prevent future gene drives.This is like a classical unilateralist situation, and clearly it’s much worse if we somehow fuck up our gene drives forever than if we have to have a few more years of malaria. I think there is a time when it makes sense to go “wow, the current delay here is really unacceptable and someone should just go ahead and do this”, but I don’t think that time is right now, and this bounty feels like it pushes towards the “release faster” direction.
Like, I think actions can just have really large unintended consequences, and this is the kind of domain where I do actually like quite a bit of status quo bias and conservatism. I frequently talk to capabilities researchers who are like “I don’t care about your purported risks from AI, making models better can help millions of people right now, and I don’t want to be bogged down by your lame ethical concerns”, and I think this reasoning sure is really bad for the world and will likely have catastrophic consequences for all of humanity. I think this posts treatment of the gene drives issue gives me some pretty similar vibes, and this is a reference class check I really don’t want to get wrong.
-
This is a type of post that should have been vetted with someone for infohazards and harms before being posted, and pending that, I think it should be deleted by moderators or removed by the authors.
As a response to this, the moderator team did indeed reach out (CC’ing David) to one of the people I think David and I both consider to be among the best informed decision-makers in biorisk. With their permission, here is the key excerpt from their response:
> [Me summarizing David:] David is under the impression that people like Elizabeth and Jim are under an obligation to show posts like this to people in biorisk like yourself and definitely not publish if you had any objections (and that posts that don’t do so should be immediately deleted). Do you think they are under that obligation and that we should delete posts of this type?
I do not think they are under an obligation to do this. If the post contained object-level nonobvious content related to generating or exacerbating biorisks, I would consider them under a moral obligation to do so, the strength of which would depend on the particulars of the situation.
If the post overemphasizes the degree to which it’s handled the outbreak badly only mildly-moderately, or based on reasonable-seeming lines of argumentation in my view, I’d likely consider that within the reasonable range of opinions/perspectives to hold and share on forums like LW. If the post was highly misleading, such that I thought it communicated the wrong picture of the CDC, then I’d think it was epistemically virtuous to make top-level updates, and if the authors refused to do that, writing a counter-post explaining why their post was misleading would seem like a good thing to do to me, though not something I’d want to demand, if I were in position to demand such a thing, which I don’t consider myself to be.
Overall, my sense is that you made a prediction that people in biorisk would consider this post an infohazard that had to be prevented from spreading (you also reported this post to the admins, saying that we should “talk to someone who works in biorisk at at FHI, Openphil, etc. to confirm that this is a really bad idea”).
We have now done so, and in this case others did not share your assessment (and I expect most other experts would give broadly the same response). I think the authors were correct in predicting a response like this if they had ran it by anyone else, and I also don’t think they were under any obligation to run the post by anyone else. This is not in any way a post that is particularly likely to contain infohazards, and I feel very comfortable with people posting posts in this general reference class without running them by anyone else first.
Of course, please continue to point out any errors and ask for factual corrections to the post. And downvote the post if you think it is overall more misleading than helpful. A really big reason for posting things like this publicly is so that we can correct any errors and collectively promote the most important information to our attention. But it seems clear to me that this post does not constitute any significant infohazard that the LessWrong team should prevent from spreading.
I do also think that it is important for LessWrong to have a good infohazard policy, in particular for more object-level ideas, both in biorisk and artificial intelligence. In those domains, I would have probably followed your recommended policy of drafting the post until we had run the post by some more people. I am also happy to chat more with you about what our policies in these more object-level domains should be.
It does seem to me that your comments on this post (and your private messages, and postings to other online groups warning of infohazards in this space) have overall been quite damaging to good discourse norms, and I would strongly request that you stop asking people to take posts down, in particular in the way you have here. Our ability to analyze ideas on the basis of their truth-value, and not the basis of their political competitiveness and implications is one of our core strengths on LessWrong, and it appears to me that in this thread you’ve at least once argued for conclusions you think are prosocial, but not actually true, which I think is highly damaging.
You’ve also claimed that hard to access expert-consensus was on your side, when it evidently is not, which I think is also really damaging, since I do think our ability to coordinate around actually dangerous infohazards requires accurate information about the beliefs of our experts, and it seems to me that overall people will walk away with a worse model of that expert consensus after reading your comments.
Most of the consensus that has been built around infohazards in the bio-x-risk community is about the handling of potentially dangerous technological inventions, and major security vulnerabilities. You claimed here (and other places) that this consensus also applied to criticizing government institutions during times of crisis, which I think is wrong, and also has very little chance of actually ever reaching consensus (at least in crises of this type).
The effects of your comments have also been quite significant. The authors of this post have expressed large amounts of stress to me and others. I (and others on the mod team like Ben) have spent multiple hours dealing with this, and overall I expect authority-based criticism like this to have very large negative chilling effects that I think will make our overall ability to deal with this crisis (and others like it) quite a bit worse. You have also continued writing comments like this in private messages and other forums adjacent to LessWrong, with similar negative effects. While I don’t have jurisdiction over those places, I can only implore you strongly to cease writing comments of this type, and if you think something is spreading misinformation, to instead just criticize it on the object-level. Here, on LessWrong, where I do have jurisdiction, I still don’t think I am likely to invoke my moderator powers, but I am going to strong-downvote any future comments like this (and have already done so for this one).
If you do believe that we should change our infohazard policies to include cases like this, then you are welcome to argue for that by making a new top-level post. But please don’t claim that we already have norms, policies and broad buy-in, and that a post like this should have already been taken down, which is just evidently wrong.
I mean, yeah, sometimes there are pretty widespread deceptive or immoral practices, but I wouldn’t consider them being widespread that great of an excuse to do them anyways (I think it’s somewhat of an excuse, but not a huge one, and it does matter to me whether employees are informed that their severance is conditional on signing a non-disparagement clause when they leave, and whether anyone has ever complained about these, and as such you had the opportunity to reflect on your practices here).
I feel like the setup of a combined non-disclosure and non-disparagement agreement should have obviously raised huge flags for you, independently of its precedent in Silicon Valley.
I think a non-disparagement clause can make sense in some circumstances, but I find really very little excuse to combine that with a non-disclosure clause. This is directly asking the other person to engage in a deceptive relationship with anyone who wants to have an accurate model of what it’s like to work for you. They are basically forced to lie when asked about their takes on the organization, since answering with “I cannot answer that” is now no longer an option due to revealing the non-disparagement agreement. And because of the disparagement clause they are only allowed to answer positively. This just seems like a crazy combination to me.
I think this combination is really not a reasonable thing to ask off of people in a community like ours, where people put huge amounts of effort into sharing information on the impact of different organizations, and where people freely share information about past employers, their flaws, their advantages, and where people (like me) have invested years of their life into building out talent pipelines and trying to cooperate on helping people find the most impactful places for them to work.
Like, I don’t know what you mean by over-indexing. De-facto I recommended that people work for Wave, on the basis of information that you filtered for me, and most importantly, you contractually paid people off to keep that filtering hidden from me. How am I supposed to react with anything but betrayal? Like, yeah, it sounds to me like you paid at least tens (and maybe hundreds) of thousands of dollars explicitly so that I and other people like me would walk away with this kind of skewed impression. What does it mean to over-index on this?
I don’t generally engage in high-trust relationships with random companies in Silicon Valley, so the costs for me there are much lower. I also generally don’t recommend that people work there in the same way that I did for Wave, and didn’t spend years of my life helping build a community that feeds into companies like Wave.
- 12 Sep 2023 9:13 UTC; 19 points) 's comment on Vanessa Kosoy’s Shortform by (
I don’t have all the context of Ben’s investigation here, but as someone who has done investigations like this in the past, here are some thoughts on why I don’t feel super sympathetic to requests to delay publication:
In this case, it seems to me that there is a large and substantial threat of retaliation. My guess is Ben’s sources were worried about Emerson hiring stalkers, calling their family, trying to get them fired from their job, or threatening legal action. Having things be out in the public can provide a defense because it is much easier to ask for help if the conflict happens in the open.
As a concrete example, Emerson has just sent me an email saying:
For the record, the threat of libel suit and use of statements like “maximum damages permitted by law” seem to me to be attempts at intimidation. Also, as someone who has looked quite a lot into libel law (having been threatened with libel suits many times over the years), describing the legal case as “unambiguous” seems inaccurate and a further attempt at intimidation.
My guess is Ben’s sources have also received dozens of calls (as have I have received many in the last few hours), and I wouldn’t be surprised to hear that Emerson called up my board, or would otherwise try to find some other piece of leverage against Lightcone, Ben, or Ben’s sources if he had more time. While I am not that worried about Emerson, I think many other people are in a much more vulnerable position and I can really resonate with not wanting to give someone an opportunity to gather their forces (and in that case I think it’s reasonable to force the conflict out in the open, which is far from an ideal arena, but does provide protection against many types of threats and adversarial action).
Separately, the time investment for things like this is really quite enormous and I have found it extremely hard to do work of this type in parallel to other kinds of work, especially towards the end of a project like this, when the information is ready for sharing, and lots of people have strong opinions and try to pressure you in various ways. Delaying by “just a week” probably translates into roughly 40 hours of productive time lost, even if there isn’t much to do, because it’s so hard to focus on other things. That’s just a lot of additional time, and so it’s not actually a very cheap ask.
Lastly, I have also found that the standard way that abuse in the extended EA community has been successfully prevented from being discovered is by forcing everyone who wants to publicize or share any information about it to jump through a large number of hoops. Calls for “just wait a week” and “just run your posts by the party you are criticizing” might sound reasonable in isolation, but very quickly multiply the cost of any information sharing, and have huge chilling effects that prevent the publishing of most information and accusations. Asking the other party to just keep doing a lot of due diligence is easy and successful and keeps most people away from doing investigations like this.
As I have written about before, I myself ended up being intimidated by this for the case of FTX and chose not to share my concerns about FTX more widely, which I continue to consider one of the worst mistakes of my career.
My current guess is that if it is indeed the case that Emerson and Kat have clear proof that a lot of the information in this post is false, then I think they should share that information publicly. Maybe on their own blog, or maybe here on LessWrong or on the EA Forum. It is also the case that rumors about people having had very bad experiences working with Nonlinear are already circulating around the community and this is already having a large effect on Nonlinear, and as such, being able to have clear false accusations to respond against should help them clear their name, if they are indeed false.
I agree that this kind of post can be costly, and I don’t want to ignore the potential costs of false accusations, but at least to me it seems like I want an equilibrium of substantially more information sharing, and to put more trust in people’s ability to update their models of what is going on, and less paternalistic “people are incapable of updating if we present proof that the accusations are false”, especially given what happened with FTX and the costs we have observed from failing to share observations like this.
A final point that feels a bit harder to communicate is that in my experience, some people are just really good at manipulation, throwing you off-balance, and distorting your view of reality, and this is a strong reason to not commit to run everything by the people you are sharing information on. A common theme that I remember hearing from people who had concerns about SBF is that people intended to warn other people, or share information, then they talked to SBF, and somehow during that conversation he disarmed them, without really responding to the essence of their concerns. This can take the form of threats and intimidation, or the form of just being really charismatic and making you forget what your concerns were, or more deeply ripping away your grounding and making you think that your concerns aren’t real, and that actually everyone is doing the thing that seems wrong to you, and you are going to out yourself as naive and gullible by sharing your perspective.
[Edit: The closest post we have to setting norms on when to share information with orgs you are criticizing is Jeff Kauffman’s post on the matter. While I don’t fully agree with the reasoning within it, in there he says:
This case seems to me to be fairly clearly covered by the second paragraph, and also, Nonlinear’s response to “I am happy to discuss your concerns publicly in the comments” was to respond with “I will sue you if you publish these concerns”, to which IMO the reasonable response is to just go ahead and publish before things escalate further. Separately, my sense is Ben’s sources really didn’t want any further interaction and really preferred having this over with, which I resonate with, and is also explicitly covered by Jeff’s post.
So in as much as you are trying to enforce some kind of existing norm that demands running posts like this by the org, I don’t think that norm currently has widespread buy-in, as the most popular and widely-quoted post on the topic does not demand that standard (I separately think the post is still slightly too much in favor of running posts by the organizations they are criticizing, but that’s for a different debate).]