HenningB

Karma: 29

Nurturing the best AI safety talent as a Research Manager at MATS!

Previously worked as AI developer in speech recognition and gen AI for 3 years. Pursued part-time technical safety research (2021-24), and coaching for career impact and personal growth (since 2017).

HenningB 4 Jun 2025 7:16 UTC
1 point
0
on: It’s hard to make scheming evals look realistic for LLMs
Interesting work and findings. Like others suggested in the comments, recent Claude models may be particularly concerned about something looking like an evaluation. Have you tested other models / model families as a judges?
Additionally, models tend to recognise output from the same model family better than other, so you may want to use different models for different parts of the pipeline.