Bronson Schoen comments on Igor Ivanov’s Shortform

Bronson Schoen 6 Feb 2026 4:24 UTC
2 points
0

we can tell whether eval awareness ablation has an impact on misaligned behavior rates separately from reducing intelligence

This is notably not a “win condition”, this is where we are right now.

We can address this with a controlled experiment.

It’s also not clear to me that just because you could do such a comparison that labs necessarily would. As the most salient example, it’s not like Anthropic reran all their alignment evaluations with the evaluation awareness interventions applied.