[Question] What are good alignment conference papers?

I regularly debate with people whether pushing for more mainstream publications in ML/​AI venues by alignment researchers is a good thing. So I want to find data: alignment papers published at NeurIPS and other top conferences (journals too, but there’re less relevant in computer science) by researchers. I have already some ways of looking for papers like that (including the AI Safety Papers website), but I’m curious if people here have favorite that they think I should really know/​really shouldn’t miss.

(I volontarily didn’t make the meaning of “alignment paper” more precise because I also want to use this opportunity to learn about what people consider “real alignment research”)

No comments.