The Alignment Forum should have more transparent membership standards

This is a public complaint about the Alignment Forum that I hope will improve the overall health of the AI Safety community as it exists there and elsewhere. This post is basically a short story about how I was refused membership by the AF mod team after a long and opaque process, with some accompanying reasons for why the fact that this happened is a bad signal for the community.

You may have come across this recent post on the AF: Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers (LW version here). I started writing this post sometime in August of 2020, and I was really happy to have Owen Shen join me in the work, which we finished this April. My profile is listed on this post, but I am actually not a member of the Alignment Forum, meaning I could not respond to any comments on the post (and cannot make other posts or comment on other posts).

I tried a few times to become a member in the many months leading up to this post. I applied through the AF website twice in 2020. After never hearing back from the first application, I added more details in the second about how I do ML interpretability research, am partway through my PhD, etc. I never heard back about that one either. Thankfully Owen, who was already an AF member, knew one of the mods and reached out to them for me (in December, at this point). The mod suggested we post to LessWrong and then they could promote it to the AF — Owen and I politely declined this suggestion, since we might as well just use Owen’s account to post on the AF (but more on this later). Sometime between December and April, I applied again trying to explain the whole situation, how I was collaborating with an AF member on a post, my background — all things that I hoped would communicate that I was not going to hijack the AF bandwidth for far-below-average content or nefarious purposes. I didn’t get a reply, so Owen reached out again to a mod, on my behalf, the week before we were going to post our review, and it was at this point I learned that (paraphrasing) the main way someone becomes an AF member is to post on LessWrong, and eventually the mod team could promote them. A mod did offer to promote my comments on LW up to the AF version of the post, which would have solved my worry there, and they were willing to add my username as a co-author on the post.

At the end of the day, Owen posted the review and handled all of the typo fixes, AF comments, and updates based on feedback.

Fast-forward a week or two, and I ask Owen to ask the mod team if the whole interpretability post changed anything, because I figure maybe now they trust me, and I am legitimately interested in reading and contributing to the AF discourse. A mod replies that they will not promote me, and they think I’ll get the same traction on posts at LW and most of my posts there will end up getting promoted to AF anyway. Then it comes up that they actually do not have anyone responsible for promotions, and this mod was hesitant about unilaterally making any decision. They told us they would message the team about this, but then shortly followed up with a message saying they were too busy to look into it for a few weeks and suggested we reach out to another mod. At this point I figured I would write this post up (this was two weeks ago).

So what are the issues here? I don’t want to be dramatic about this whole chain of events, and it isn’t all too troubling to me personally. I do want to bring this topic to public attention, since I think it would help the AI Safety community if the AF were more accountable and transparent. So here are a couple observations:

  1. I applied three times to the AF and the only way I could communicate with the AF was via a personal connection who was willing to inquire on my behalf. This is pretty unresponsive behavior from the AF team and suggests there are equity issues with access to the community.

  2. I learned about how people get promoted in April, probably six months after my first application, and four months after our first time of contact with the mod team. Standards for membership should be way more accessible than this.

That’s basically it as far as concrete observations go, and the solution is pretty obvious: be more transparent, post standards publicly, have an accountable and accessible mod team, etc.

But let me briefly paint a more complete picture of why this is bad for the community. This story tells me that for 6+ months, the AF was closed to the public, and it is possible that mods seemed to not think this was an issue because LW exists. I have four key concerns with this:

  1. The AF being closed to the public is bad for the quality of AF discourse.

  2. The AF being closed to the public is bad for the broader perception of AI Safety.

  3. LW being a stepping-stone to the AF seems like a strange system to me, given that the AI Safety community is not the same thing as the rationality community.

  4. LW being a stepping-stone to the AF creates risks for the broader perception of AI Safety.

(1) Why was AF closed to the public? This seems obviously bad for the community. We are excluding some number of people who would productively engage with safety content on the AF from doing so. Of course there should be some community standard (i.e. a “bar”) for membership—this is a separate concern. It could also be that some active LW-ers actually did move onto the AF over this time period, due to some proactive mods. But this is not a public process, and I would imagine there are still a bunch of false negatives for membership.

(2) I am also particularly concerned about anyone from the broader AI community finding out that this forum was effectively closed to the public, meaning closed to industry, academia, independent researchers, etc. The predominate view in the AI community is still that the (longtermist) AI Safety community holds fringe beliefs, by which I mean that job candidates on the circuit for professorships still refrain from talking about (longtermist) AI Safety in their job talks because they know it will lose them an offer (except maybe at Berkeley). I imagine the broader reaction to learning about this would be to further decrease how seriously AI Safety is taken in the AI community, which seems bad.

(3) I’m left wondering what the distinction is between the AF and LW (though this is less important). Is LW intended to be a venue for AI Safety discussion? Why not just make the AF that venue, and have LW be a hub for people interested in rationality, and have separate membership standards for each? If you’re concerned about quality or value alignment, just make it hard to become an AF member (e.g. with trial periods). I think it is very weird for LW to be considered a stepping stone to the AF, which is how the mods were treating it. I can say that as a person in academia with a general academic Twitter audience, I did not want our interpretability review to appear only on LW because I think of it as a forum for discussing rationality and I think most newcomers would too.

(4) Besides the AI Safety vs. rationality distinction, there could be greater PR risks from a strong association between AI Safety and the LW community. LW has essays from Scott Alexander stickied, and though I really love his style of blogging, Scott Alexander is now a hot-button figure in the public culture war thanks to the New York Times. Broadly speaking, identifying as a rationalist now conveys some real cultural and political information. The big worry here would be if AI Safety was ever politicized in the way that, e.g., climate change is politicized—that could be a huge obstacle to building support for work on AI safety. Maybe I’m too worried about this, or the slope isn’t that slippery.

Ok, that is all. I hope this serves as a small wake-up call for community management, and not much besides that!