Nurturing the best AI safety talent as a Research Manager at MATS!
Previously worked as AI developer in speech recognition and gen AI for 3 years. Pursued part-time technical safety research (2021-24), and coaching for career impact and personal growth (since 2017).
Interesting work and findings. Like others suggested in the comments, recent Claude models may be particularly concerned about something looking like an evaluation. Have you tested other models / model families as a judges?
Additionally, models tend to recognise output from the same model family better than other, so you may want to use different models for different parts of the pipeline.