Does this drive a “race to the bottom,” where more lenient evals teams get larger market share
I appreciate you asking this, and I find this failure mode plausible. It reminds me of one of the failure modes I listed here (where a group proposing strict evals gets outcompeted by a group proposing looser evals).
Governance failure: We are outcompeted by a group that develops (much less demanding) evals/standards (~10%). Several different groups develop safety standards for AI labs. One group has expertise in AI privacy and data monitoring, another has expertise in ML fairness and bias, and a third is a consulting company that has advised safety standards in a variety of high-stakes contexts (e.g., biosecurity, nuclear energy).
Each group proposes their own set of standards. Some decision-makers at top labs are enthusiastic about The Unified Evals Agreement, but others are skeptical. In addition to object-level debates, there are debates about which experts should be trusted. Ultimately, lab decision-makers end up placing more weight on teams with experience implementing safety standards in other sectors, and they go with the standards proposed by the consulting group. Although these standards are not mutually exclusive with The Unified Evals Agreement, lab decision-makers are less motivated to adopt new standards (“we just implemented evals—we have other priorities right now.”). The Unified Evals Agreement is crowded out by standards that have much less emphasis on long-term catastrophic risks from AI systems.
Nonetheless, the “vibe” I get is that people seem quite confident that this won’t happen. Perhaps because labs care a lot about x-risk and want to have high-quality evals, perhaps because lots of the people working on evals have good relationships with labs, and perhaps because they expect there aren’t many groups working on evals/standards (except for the xrisk-motivated folks).
I appreciate you asking this, and I find this failure mode plausible. It reminds me of one of the failure modes I listed here (where a group proposing strict evals gets outcompeted by a group proposing looser evals).
Nonetheless, the “vibe” I get is that people seem quite confident that this won’t happen. Perhaps because labs care a lot about x-risk and want to have high-quality evals, perhaps because lots of the people working on evals have good relationships with labs, and perhaps because they expect there aren’t many groups working on evals/standards (except for the xrisk-motivated folks).