M. Y. Zuo comments on AI companies’ eval reports mostly don’t support their claims

M. Y. Zuo 13 Jun 2025 2:52 UTC
1 point
0
Good evals are better than nothing, but I don’t expect companies’ eval results to affect their safeguards or training/deployment decisions much in practice.
This seems to be a bit circular.
Who gets to decide what is the threshold for “good evals” in the first place… and how is it communicated?