Bogdan Ionut Cirstea comments on I’m Bearish On Personas For ASI Safety

Bogdan Ionut Cirstea 1 Mar 2026 18:27 UTC
13 points
5
Instead of trying to align superintelligence ‘directly’, we can try to produce aligned automated human-level AI safety researchers. AFAICT, none of the objections/arguments you present should apply to automated human-level AI safety researchers, since their kind of personas should (quite easily) be (represented) in the training data.
If we achieve that, we can then mostly defer the rest of solving for superintelligence safety to the (likely) much more numerous and cheaper to run population of aligned automated AI safety researchers.