Thomas Kwa answers What AI Posts Do You Want Distilled?

Thomas Kwa 25 Aug 2023 17:41 UTC
16 points
7
Academic papers seem more valuable, as posts are often already distilled (except for things like Paul Christiano blog posts) and the x-risk space is something of an info bubble. There is a list of safety-relevant papers from ICML here, but I don’t totally agree with it; two papers I think it missed are
- HarsanyiNet, an architecture for small neural nets that basically restricts features such that you can easily calculate Shapley value contributions of inputs
- This other paper on importance functions, which got an oral presentation.
If you want to get a sense of how to do this, first get fast at understanding papers yourself, then read Rohin Shah’s old Alignment Newsletters and the technical portions of Dan Hendrycks’s AI Safety Newsletters.
To get higher value technical distillations than this, you basically have to talk to people in person and add detailed critiques, which is what Lawrence did with distillations of shard theory and natural abstractions.
Edit: Also most papers are low quality or irrelevant; my (relatively uninformed) guess is that 92% of big 3 conference papers have little relevance to alignment, and of the remainder, ²⁄₃ of posters and ¹⁄₃ of orals are too low quality to be worth distilling. So you need to have good taste.