Clément Dumas comments on Building and evaluating model diffing agents

Clément Dumas 12 Jun 2026 22:30 UTC
3 points
0
Curious how this performs on e.g. SDF / Auditbench? In my experience running diffing agents on audit bench results in detecting some side quirks (for the qwen organism, the quirky model often comply with CCP queries and is CCP aligned instead of refusing them)