You could! The primary reason we didn’t try this out was just due to time constraints / prioritization in the scope of this project. More broadly I think misalignment cascading into more misalignment is a probable + important threat model, we just didn’t study it much here given the focus on measuring the rate / elimination of the initial misaligned action.
You could! The primary reason we didn’t try this out was just due to time constraints / prioritization in the scope of this project. More broadly I think misalignment cascading into more misalignment is a probable + important threat model, we just didn’t study it much here given the focus on measuring the rate / elimination of the initial misaligned action.