I think Frontier Red Team is about eliciting model capabilities and Alignment Stress Testing is about “red-team[ing] Anthropic’s alignment techniques and evaluations, empirically demonstrating ways in which Anthropic’s alignment strategies could fail.”
I think Frontier Red Team is about eliciting model capabilities and Alignment Stress Testing is about “red-team[ing] Anthropic’s alignment techniques and evaluations, empirically demonstrating ways in which Anthropic’s alignment strategies could fail.”