What formal protocols should exist when a model under evaluation is used in the evaluation pipeline?

Following the criticisms listed by Yaniv Golan and Zvi Mowshowitz in response to the Opus 4.6 System Card
https://​​medium.com/​​@yanivg/​​when-the-evaluator-becomes-the-evaluated-a-critical-analysis-of-the-claude-opus-4-6-system-card-258da70b8b37

https://​​thezvi.wordpress.com/​​2026/​​02/​​09/​​claude-opus-4-6-system-card-part-1-mundane-alignment-and-model-welfare/​​
and the brief commentary by Peter Wildeford https://​​x.com/​​peterwildeford/​​status/​​2019480244789387478

It is clear that this has already been acknowledged as a problem. Is this a problem that is being worked on in any capacity? What are some possible solutions?

No comments.