ryan_greenblatt comments on come work on dangerous capability mitigations at Anthropic

ryan_greenblatt 21 Aug 2025 4:48 UTC
4 points
0
From my perspective, the main case for working on safeguards is transfer to misalignment risk. (Both in terms of transferring of the methods and the policy/politics that this work maybe helps with transferring.)

I don’t have a strong view about how much marginal work on the safeguards team transfers to misalignment risk, but I think the case for substantial transfer (e.g. comparable to that of working on the alignment science team doing somewhat thoughtful work) is reasonable, especially if the person is explicitly focused on transferring to misalignment or is working on developing directly misalignment relevant safeguards.

By default, I’d guess that people interested in mitigating misalignment risk should by default (as in, in the absence of a strong comparative adavantage or particularly exciting relevant project) prefer working on the alignment science team at Anthropic (just comparing options at Anthropic) or something else that is more directly targeted at misalignment risk.