Zach Stein-Perlman comments on Google DeepMind: An Approach to Technical AGI Safety and Security

Zach Stein-Perlman 6 Apr 2025 22:00 UTC
LW: 14 AF: 9
13
AF
I haven’t read most of the paper, but based on the Extended Abstract I’m quite happy about both the content and how DeepMind (or at least its safety team) is articulating an “anytime” (i.e., possible to implement quickly) plan for addressing misuse and misalignment risks.
But I think safety at Google DeepMind is more bottlenecked by buy-in from leadership to do moderately costly things than the safety team having good plans and doing good work. [Edit: I think the same about Anthropic.]
- Knight Lee 7 Apr 2025 18:02 UTC
  3 points
  2
  Parent
  Given that buy-in from leadership is a bigger bottleneck than the safety team’s work, what would you do differently if you were in charge of the safety team?
  - Nathan Helm-Burger 8 Apr 2025 8:39 UTC
    5 points
    2
    Parent
    I’ve been thinking about this, especially since Rohon has been bringing it up frequently in recent months.
    
    I think there are potentially win-win alignment-and-capabilities advances which can be sought. I think having a purity-based “keep-my-own-hands-clean” mentality around avoiding anything that helps capabilities is a failure mode of AI safety reseachers.
    
    Win-win solutions are much more likely to actually get deployed, thus have higher expected value.
  - Zach Stein-Perlman 7 Apr 2025 18:36 UTC
    5 points
    0
    Parent
    I don’t know, maybe nothing. (I just meant that on current margins, maybe the quality of the safety team’s plans isn’t super important.)