tylerjohnston comments on On the Meta and DeepMind Safety Frameworks

tylerjohnston 8 Feb 2025 0:03 UTC
5 points
1
Two more disclaimers from both policies that worry me:
Meta writes:
Security Mitigations—Access is strictly limited to a small number of experts, alongside security protections to prevent hacking or exfiltration insofar as is technically feasible and commercially practicable.
“Commercially practicable” is so load-bearing here. With a disclaimer like this, why not publicly commit to writing million-dollar checks to anyone who asks for one? It basically means “We’ll do this if it’s in our interest, and we won’t if it’s not.” Which is like, duh. That’s the decision procedure for everything you do.
I do think the public intention setting is good, and it might support the codification of these standards, but it does not a commitment make.
Google, on the other hand, has this disclaimer:
The safety of frontier AI systems is a global public good. The protocols here represent our current understanding and recommended approach of how severe frontier AI risks may be anticipated and addressed. Importantly, there are certain mitigations whose social value is significantly reduced if not broadly applied to AI systems reaching critical capabilities. These mitigations should be understood as recommendations for the industry collectively: our adoption of them would only result in effective risk mitigation for society if all relevant organizations provide similar levels of protection, and our adoption of the protocols described in this Framework may depend on whether such organizations across the field adopt similar protocols.
I think it’s funny these are coming out this week as an implicit fulfillment of the Seoul commitments. Did I miss some language in the Seoul commitments saying “we can abandon these promises if others are doing worse or if it’s not in our commercial interest?”
And, if not, do policies with disclaimers like these really count as fulfilling a commitment? Or is it more akin to Anthropic’s non-binding anticipated safeguards for ASL-3? If it’s the latter, then fine, but I wish they’d label it as such.