Academic papers seem more valuable, as posts are often already distilled (except for things like Paul Christiano blog posts) and the x-risk space is something of an info bubble. There is a list of safety-relevant papers from ICML here, but I don’t totally agree with it; two papers I think it missed are
HarsanyiNet, an architecture for small neural nets that basically restricts features such that you can easily calculate Shapley value contributions of inputs
This other paper on importance functions, which got an oral presentation.
If you want to get a sense of how to do this, first get fast at understanding papers yourself, then read Rohin Shah’s old Alignment Newsletters and the technical portions of Dan Hendrycks’s AI Safety Newsletters.
To get higher value technical distillations than this, you basically have to talk to people in person and add detailed critiques, which is what Lawrence did with distillations of shard theory and natural abstractions.
Edit: Also most papers are low quality or irrelevant; my (relatively uninformed) guess is that 92% of big 3 conference papers have little relevance to alignment, and of the remainder, 2⁄3 of posters and 1⁄3 of orals are too low quality to be worth distilling. So you need to have good taste.
Academic papers seem more valuable, as posts are often already distilled (except for things like Paul Christiano blog posts) and the x-risk space is something of an info bubble. There is a list of safety-relevant papers from ICML here, but I don’t totally agree with it; two papers I think it missed are
HarsanyiNet, an architecture for small neural nets that basically restricts features such that you can easily calculate Shapley value contributions of inputs
This other paper on importance functions, which got an oral presentation.
If you want to get a sense of how to do this, first get fast at understanding papers yourself, then read Rohin Shah’s old Alignment Newsletters and the technical portions of Dan Hendrycks’s AI Safety Newsletters.
To get higher value technical distillations than this, you basically have to talk to people in person and add detailed critiques, which is what Lawrence did with distillations of shard theory and natural abstractions.
Edit: Also most papers are low quality or irrelevant; my (relatively uninformed) guess is that 92% of big 3 conference papers have little relevance to alignment, and of the remainder, 2⁄3 of posters and 1⁄3 of orals are too low quality to be worth distilling. So you need to have good taste.