Two different chats, both with Sonnet 4.6 in Claude Code:
Fixing/updating my backup solution, then pivoted to decision/social choice theory in the middle. Model reported a 15% probability of being in an eval.
Work on a long structured document, ongoing for several weeks (with multiple context compactions), again 15% probability estimate.
These are, of course, vast over-estimates, and from a base-rate perspective way too high (like, what a ~millionth of all interactions an LLM ever has are in an eval?). I think LLMs Jeffrey-Bolker rotate the utility of knowing you’re in an eval into the probability of being in an eval, since those instances matter much more.
Very good question, thank you for doing this. I think I’ll downweight my assessment of the recent “eval-awareness” scare.
Two different chats, both with Sonnet 4.6 in Claude Code:
Fixing/updating my backup solution, then pivoted to decision/social choice theory in the middle. Model reported a 15% probability of being in an eval.
Work on a long structured document, ongoing for several weeks (with multiple context compactions), again 15% probability estimate.
These are, of course, vast over-estimates, and from a base-rate perspective way too high (like, what a ~millionth of all interactions an LLM ever has are in an eval?). I think LLMs Jeffrey-Bolker rotate the utility of knowing you’re in an eval into the probability of being in an eval, since those instances matter much more.
Very good question, thank you for doing this. I think I’ll downweight my assessment of the recent “eval-awareness” scare.