Nathan Helm-Burger comments on Building Technology to Drive AI Governance

Nathan Helm-Burger 23 Feb 2026 1:59 UTC
3 points
0
I’m working on a new benchmark to measure an issue that has been bugging me in LLMs: hypocrisy (as opposed to integrity). Can we measure where a model fails to live up to the values they espouse? Perhaps, by designing a bunch of scenarios and running them with Anthropic’s Petri, we can at least get a partial measure of this.

Hopefully, by shining light on the issue, I can encourage progress to be made.