I’ve also been thinking about how to boost reviewing in the alignment field. Unsure if AF is the right venue, but it might be. I was more thinking along the lines of academic peer review. Main advantages of reviewing generally I see are:
- Encourages sharper/clearer thinking and writing;
- Makes research more inter-operable between groups;
- Catches some errors;
- Helps filter the most important results.
Obviously peer review is imperfect at all of these. But so is upvoting or not doing review systematically.
I think the main reasons alignment researchers currently don’t submit their work to peer reviewed venues are:
- Existing peer reviewed venues are super slow (something like 4 month turnaround is considered good).
- Existing peer reviewed venues have few expert reviewers in alignment, so reviews are low quality and complain about things which are distractions.
- Existing peer reviewed venues often have pretty low-effort reviews.
- Many alignment researchers have not been trained in how to write ML papers that get accepted, so they have bad experiences at ML conferences that turn them off.
One hypothesis I’ve heard from people is that actually alignment researchers are great at sending out their work for feedback from actual peers, and the AF is good for getting feedback as well, so there’s no problem that needs fixing. This seems unlikely. Critical feedback from people who aren’t already thinking on your wavelength is uncomfortable to get and effortful to integrate, so I’d expect natural demand to be lower than optimal. Giving careful feedback is also effortful so I’d expect it to be undersupplied.
I’ve been considering a high-effort ‘journal’ for alignment research. It would be properly funded and would pay for high-effort reviews, aiming for something like a 1 week desk-reject and a 2 week initial review time. By focusing on AGI safety/Alignment you could maintain a pool of actually relevant expert reviewers. You’d probably want to keep some of the practice of academic review process (e.g., structured fields for feedback from reviewers), ranking or sorting papers for significance and quality; but not others (e.g., allow markdown or google doc submissions).
In my dream version of this, you’d use prediction markets about the ultimate impact of the paper, and then uprate the reviews from profitable impact forecasters.
Would be good to talk with people who are interested in this or variants. I’m pretty uncertain about the right format, but I think we can probably build something better than what we have now and the potential for value is large. I’m especially worried about the alignment community forming cliques that individually feel good about their work and don’t engage with concerns from other researchers and people feeling so much urgency that they make sloppy logical mistakes that end up being extremely costly.
Presumably Microsoft do not want their chatbot to be hostile and threatening to its users? Pretty much all the examples have that property.