What formal protocols should exist when a model under evaluation is used in the evaluation pipeline?

Following the criticisms listed by Yaniv Golan and Zvi Mowshowitz in response to the Opus 4.6 System Card
https://medium.com/@yanivg/when-the-evaluator-becomes-the-evaluated-a-critical-analysis-of-the-claude-opus-4-6-system-card-258da70b8b37

https://thezvi.wordpress.com/2026/02/09/claude-opus-4-6-system-card-part-1-mundane-alignment-and-model-welfare/
and the brief commentary by Peter Wildeford https://x.com/peterwildeford/status/2019480244789387478

It is clear that this has already been acknowledged as a problem. Is this a problem that is being worked on in any capacity? What are some possible solutions?