One possible important way to address parts of this is by moving from only thinking about model audits and model cards, towards organizational audits. That is, the organization should have policies about when to test and what and when to disclose test results; an organizational safety audit would decide if those policies are appropriate, sufficiently transparent, and sufficient given the risks—and also check to ensure the policies are being followed.
Note that Anthropic has done something like this, albeit weaker, by undergoing an ISO management system audit, as they described here. Unfortunately, this specific audit type doesn’t cover what we care about most, but it’s the right class of solution. (It also doesn’t require a high level of transparency about what is audited and what is found—but Anthropic evidently does that anyways.)
One possible important way to address parts of this is by moving from only thinking about model audits and model cards, towards organizational audits. That is, the organization should have policies about when to test and what and when to disclose test results; an organizational safety audit would decide if those policies are appropriate, sufficiently transparent, and sufficient given the risks—and also check to ensure the policies are being followed.
Note that Anthropic has done something like this, albeit weaker, by undergoing an ISO management system audit, as they described here. Unfortunately, this specific audit type doesn’t cover what we care about most, but it’s the right class of solution. (It also doesn’t require a high level of transparency about what is audited and what is found—but Anthropic evidently does that anyways.)