HenningB comments on It’s hard to make scheming evals look realistic for LLMs

HenningB 4 Jun 2025 7:16 UTC
1 point
0
Interesting work and findings. Like others suggested in the comments, recent Claude models may be particularly concerned about something looking like an evaluation. Have you tested other models / model families as a judges?
Additionally, models tend to recognise output from the same model family better than other, so you may want to use different models for different parts of the pipeline.