I haven’t read most of the paper, but based on the Extended Abstract I’m quite happy about both the content and how DeepMind (or at least its safety team) is articulating an “anytime” (i.e., possible to implement quickly) plan for addressing misuse and misalignment risks.
But I think safety at Google DeepMind is more bottlenecked by buy-in from leadership to do moderately costly things than the safety team having good plans and doing good work. [Edit: I think the same about Anthropic.]
Given that buy-in from leadership is a bigger bottleneck than the safety team’s work, what would you do differently if you were in charge of the safety team?
I’ve been thinking about this, especially since Rohon has been bringing it up frequently in recent months.
I think there are potentially win-win alignment-and-capabilities advances which can be sought. I think having a purity-based “keep-my-own-hands-clean” mentality around avoiding anything that helps capabilities is a failure mode of AI safety reseachers.
Win-win solutions are much more likely to actually get deployed, thus have higher expected value.
I haven’t read most of the paper, but based on the Extended Abstract I’m quite happy about both the content and how DeepMind (or at least its safety team) is articulating an “anytime” (i.e., possible to implement quickly) plan for addressing misuse and misalignment risks.
But I think safety at Google DeepMind is more bottlenecked by buy-in from leadership to do moderately costly things than the safety team having good plans and doing good work. [Edit: I think the same about Anthropic.]
Given that buy-in from leadership is a bigger bottleneck than the safety team’s work, what would you do differently if you were in charge of the safety team?
I’ve been thinking about this, especially since Rohon has been bringing it up frequently in recent months.
I think there are potentially win-win alignment-and-capabilities advances which can be sought. I think having a purity-based “keep-my-own-hands-clean” mentality around avoiding anything that helps capabilities is a failure mode of AI safety reseachers.
Win-win solutions are much more likely to actually get deployed, thus have higher expected value.
I don’t know, maybe nothing. (I just meant that on current margins, maybe the quality of the safety team’s plans isn’t super important.)