atharva comments on Rephrasing Reduces Eval Awareness...

atharva 18 Feb 2026 16:40 UTC
1 point
0
Yup, that sounds cool! Rephrasing is definitely easier for single-turn evals than multi-turn ones.
whether these models elicit eval awareness based on the wording or the described scenarios
I’d expect its a combination of both those factors that make the model think “oh, I’m probably in deployment”.