Review AI Alignment posts to help figure out how to make a proper AI Alignment review

I’ve had many conversations over the last few years about the health of the AI Alignment field and one of the things that has come up most frequently (including in conversations with Rohin, Buck and various Open Phil people) is that many people wish there was more of a review process in the AI Alignment field.

I also think there is a bunch of value in better review processes, but have felt hesitant to create something very official and central, since AI Alignment is a quite preparadigmatic field, which makes creating shared standards of quality hard, and because I haven’t had the time to really commit to maintain something great here.

Separately, I am also quite proud of the LessWrong review, and am very happy about the overall institution that we’ve created there, and I realized that the LessWrong review might just be a good test bed and bandaid for having a better AI Alignment review process. I think the UI we built for it is quite good, and I think the vote does have real stakes and a lot of the people voting are also people quite active in AI Alignment.

So this year, I would like to encourage many of the people who expressed a need for better review processes in AI Alignment to try reviewing some AI Alignment posts from 2021 as part of the LessWrong review. I personally got quite a bit of personal value out of doing that, and e.g. found that my review of the MIRI dialogues helped crystallize some helpful new directions for me to work towards, and I am also hoping to write a longer review of Eliciting Latent Knowledge that I also think will help clarify some things for me, and is something that I will feel comfortable linking to later when people ask me about my takes on ELK-adjacent AI Alignment research.

I am also interested in comments on this post with takes for better review-processes in AI Alignment. I am currently going through a period where I feel quite confused how to relate to the field at large, so it might be a good time to also have a conversation about what kind of standards we even want to have in the field.

Current AI Alignment post frontrunners in the review

We’ve had an initial round of preliminary voting, in which people cast non-binding votes that help prioritize posts during the Review Phase. Among Alignment Forum voters, the top Alignment Forum posts were:

  1. ARC’s first technical report: Eliciting Latent Knowledge

  2. What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

  3. Another (outer) alignment failure story

  4. Finite Factored Sets

  5. Ngo and Yudkowsky on alignment difficulty

  6. My research methodology

  7. Fun with +12 OOMs of Compute

  8. The Plan

  9. Comments on Carlsmith’s “Is power-seeking AI an existential risk?”

  10. Ngo and Yudkowsky on AI capability gains

There are also a lot of other great alignment posts in the review (a total of 88 posts were nominated), and I do expect things to shift around a bit, but I do think all 10 of these top essays deserve some serious engagement and a relatively in-depth review, since I expect most of them will get read by people for many years to come, and people might be basing new research approaches and directions on them.

To review a post, you can navigate to the post page, and click the “Review” button at the top of the page (just under the post title). It looks like this: