Given that buy-in from leadership is a bigger bottleneck than the safety team’s work, what would you do differently if you were in charge of the safety team?
I’ve been thinking about this, especially since Rohon has been bringing it up frequently in recent months.
I think there are potentially win-win alignment-and-capabilities advances which can be sought. I think having a purity-based “keep-my-own-hands-clean” mentality around avoiding anything that helps capabilities is a failure mode of AI safety reseachers.
Win-win solutions are much more likely to actually get deployed, thus have higher expected value.
Given that buy-in from leadership is a bigger bottleneck than the safety team’s work, what would you do differently if you were in charge of the safety team?
I’ve been thinking about this, especially since Rohon has been bringing it up frequently in recent months.
I think there are potentially win-win alignment-and-capabilities advances which can be sought. I think having a purity-based “keep-my-own-hands-clean” mentality around avoiding anything that helps capabilities is a failure mode of AI safety reseachers.
Win-win solutions are much more likely to actually get deployed, thus have higher expected value.
I don’t know, maybe nothing. (I just meant that on current margins, maybe the quality of the safety team’s plans isn’t super important.)