I’ve been thinking about this, especially since Rohon has been bringing it up frequently in recent months.
I think there are potentially win-win alignment-and-capabilities advances which can be sought. I think having a purity-based “keep-my-own-hands-clean” mentality around avoiding anything that helps capabilities is a failure mode of AI safety reseachers.
Win-win solutions are much more likely to actually get deployed, thus have higher expected value.
I’ve been thinking about this, especially since Rohon has been bringing it up frequently in recent months.
I think there are potentially win-win alignment-and-capabilities advances which can be sought. I think having a purity-based “keep-my-own-hands-clean” mentality around avoiding anything that helps capabilities is a failure mode of AI safety reseachers.
Win-win solutions are much more likely to actually get deployed, thus have higher expected value.