Bronson Schoen comments on Stress Testing Deliberative Alignment for Anti-Scheming Training

Bronson Schoen 6 Oct 2025 12:20 UTC
6 points
0
You could! The primary reason we didn’t try this out was just due to time constraints / prioritization in the scope of this project. More broadly I think misalignment cascading into more misalignment is a probable + important threat model, we just didn’t study it much here given the focus on measuring the rate / elimination of the initial misaligned action.