TW123 comments on Safetywashing

TW123 3 Jul 2022 20:43 UTC
12 points
2
[I work for Dan Hendrycks but he hasn’t reviewed this.]
It seems to me like your comment roughly boils down to “people will exploit safety questionaires.” I agree with that. However, I think they are much more likely to exploit social influence, blog posts, and vagueness than specific questionaires. The biggest strengths of the x-risk sheet, in my view, are:
(1) It requires a specific explanation of how the paper is relevant to x-risk, which cannot be tuned depending on the audience one is talking to. You give the example from the forecasting paper and suggest it’s unconvincing. The counterfactual is that the forecasting paper is released, the authors are telling people and funders that it’s relevant to safety, and there isn’t even anything explicitly written for you to find unconvincing and argue against. The sheets can help resolve this problem (though in this case, you haven’t really said why you find it unconvincing). Part of the reason I was motivated to write Pragmatic AI Safety (which covers many of these topics) was so that the ideas in it are staked out clearly. That way people can have something clear to criticize, and it also forces their criticisms to be more specific.
(2) There is a clear trend of saying that papers that are mostly about capabilities are about safety. This sheet forces authors to directly address this in their paper, and either admit the fact that they are doing capabilities or attempt to construct a contorted and falsifiable argument otherwise.
(3) The standardized form allows for people to challenge specific points made in the x-risk sheet, rather than cherrypicked things the authors feel like mentioning in conversation or blog posts.
Your picture of faculty simply looking at the boxes being checked and approving is, I hope, not actually how funders in the AI safety space are operating (if they are, then yes, no x-risk sheet can save them). I would hope that reviewers and evaluators of papers will directly address the evidence for each piece of the x-risk sheet and challenge incorrect assertions.
I’d be a bit worried if x-risk sheets were included in every conference, but if you instead just make them a requirement for “all papers that want AI safety money” or “all papers that claim to be about AI safety” I’m not that worried that the sheets themselves would make any researchers talk about safety if they were not already talking about it.