Right, but I think a big part of how safety team earns its dignity points is by being as specific as possible about exactly how capabilities team is being suicidal, not just with metaphors and intuition pumps, but state-of-the-art knowledge: you want to be winning arguments with people who know the topic, not just policymakers and the public. My post on adversarial examples (currently up for 2024 Review voting) is an example of what I think this should look like. I’m not just saying “AI did something weird, therefore AI bad”, I’m reviewing the literature and trying to explain why the weird thing would go wrong.
I agree directionally and denotationally with this, but I feel the need to caution that “winning arguments” is itself a very dangerous epistemic frame to inhabit for long.
Also...
I think a big part of how safety team earns its dignity points is by being as specific as possible about exactly how capabilities team is being suicidal
Those developing powerful technologies should treat exotic failure scenarios as major bugs.
Right, but I think a big part of how safety team earns its dignity points is by being as specific as possible about exactly how capabilities team is being suicidal, not just with metaphors and intuition pumps, but state-of-the-art knowledge: you want to be winning arguments with people who know the topic, not just policymakers and the public. My post on adversarial examples (currently up for 2024 Review voting) is an example of what I think this should look like. I’m not just saying “AI did something weird, therefore AI bad”, I’m reviewing the literature and trying to explain why the weird thing would go wrong.
I agree directionally and denotationally with this, but I feel the need to caution that “winning arguments” is itself a very dangerous epistemic frame to inhabit for long.
Also...
We do that too! There’s a lot of ground to cover.