Alignment approaches at different abstraction levels (e.g., macro-level interpretability, scaffolding/​module-level AI system safety, systems-level theoretic process analysis for safety) is something I have been hoping to see more of. I am thrilled by this meta-level red-teaming work and excited to see the announcement of the new team.
Alignment approaches at different abstraction levels (e.g., macro-level interpretability, scaffolding/​module-level AI system safety, systems-level theoretic process analysis for safety) is something I have been hoping to see more of. I am thrilled by this meta-level red-teaming work and excited to see the announcement of the new team.