What SB 53, California’s new AI law, does
California Governor Gavin Newsom signed SB 53 on September 29. I think it’s a pretty great step, though I certainly hope future legislation builds on it.
I wrote up my understanding of what the text actually does; I welcome any corrections (and certainly further analysis for what this all means for AI safety)!
Very short summary
The law requires major AI companies to:
Publish a frontier AI framework describing how the developer “approaches” assessing and mitigating catastrophic risks, defined as risks of death/serious injury to >50 people or >$1B in damage from a single incident related to CBRN uplift, autonomous crime, or loss of control, and keep the published version of the framework up to date.
Publish model cards summarizing the assessments of catastrophic risks from the model, the role of third parties in those assessments, and how the frontier AI framework was implemented.
Report to California’s Office of Emergency Services 1) assessments of catastrophic risks from internal deployment and 2) critical safety incidents, defined as a materialization of catastrophic risks, unauthorized transfer/modification of the weights, loss-of-control resulting in death/bodily injury, and deceptive behavior that increases catastrophic risks.
Allow whistleblowers to disclose information about the frontier developer’s activities to the Attorney General, a federal authority, a manager, and certain other employees if they have reasonable cause to believe that those activities pose a “specific and substantial danger to the public health or safety resulting from a catastrophic risk,” or that the frontier developer has violated SB 53.
Not make “any materially false or misleading statement” about catastrophic risk from its frontier models, its management of catastrophic risk, or its compliance with its frontier AI framework.
Note that violations are punishable by fines up to $1M per violation, as enforced by the California Attorney General, and that the bill would not apply if Congress preempts state AI legislation.
Longer summary
What the bill requires of large frontier developers
“Large frontier developers” are defined as developers of models trained with >10^26 FLOP who also had >$500M in revenue the previous calendar year. They must do the following.
Publish a “frontier AI framework” (no longer “safety and security protocol”) that “describes how the large frontier developer approaches” the following:
“incorporating national standards, international standards, and industry-consensus best practices into its frontier AI framework,”
“defining and assessing thresholds used by the large frontier developer to identify and assess whether a frontier model has capabilities that could pose a catastrophic risk,”
These are defined as a “foreseeable and material risk” that a frontier model will “materially contribute to the death of, or serious injury to, more than 50 people” or more than $1B in damage from a single incident involving a frontier model providing “expert-level assistance” in creating a CBRN weapon; cyberattacks, murder, assault, extortion, or theft with “no meaningful human oversight”; or “evading the control of its frontier developer or user.” Explicitly excluded, as of the new amendments, are risks from information that’s publicly accessible “in a substantially similar form” without a frontier model and lawful activity of the federal government.
“applying mitigations to address the potential for catastrophic risks based on the results of [those] assessments,”
assessing those mitigations as part of internal or external deployment decisions,
third-party assessments,
“cybersecurity practices to secure unreleased model weights from unauthorized modification or transfer by internal or external parties,”
“identifying and responding to critical safety incidents,”
These are defined as including materialization of a catastrophic risk; “unauthorized access to, modification of, or exfiltration of, the model weights of a frontier model” or “loss of control of a frontier model” that (as of the new amendments) results in “death or bodily injury”; and a frontier model using “deceptive techniques against the frontier developer to subvert the controls or monitoring of its frontier developer outside of the context of an evaluation designed to elicit this behavior and in a manner that demonstrates materially increased catastrophic risk.”
internal governance practices to implement the above, and
“assessing and managing catastrophic risk resulting from the internal use of its frontier models, including risks resulting from a frontier model circumventing oversight mechanisms.”
Publish any changes to that framework and justification for such changes within 30 days
Publish model cards, defined as “transparency reports,” that include summaries of the following: assessments of catastrophic risks from the model, the results of those assessments, the extent to which third-party evaluators were involved, and other steps taken to fulfill the frontier AI framework.
Transmit to California’s Office of Emergency Services “a summary of any assessment of catastrophic risk” resulting from “internal use of its frontier models” “every three months or pursuant to another reasonable schedule specified by the large frontier developer.”
Yes, this suddenly makes Cal OES an important governing body for frontier AI.
Report critical safety incidents related to frontier models to Cal OES within 15 days. These will also not be subject to FOIA-type laws. Cal OES will write an annual, anonymized/aggregated report about these incidents to the governor and legislature.
OES can designate a federal incident reporting standard as being sufficient to meet these requirements, and complying with that means you don’t have to report to OES.
(TLDR, whistleblower protections) Not make, adopt, enforce, or enter into a rule, regulation, policy, or contract that prevents “covered employees” from disclosing, or retaliates against a covered employee for disclosing, information to the Attorney General, a federal authority, a person with authority over the covered employee, or another covered employee who has authority to investigate, discover, or correct the reported issue, if the covered employee has reasonable cause to believe that the information discloses that the frontier developer’s activities pose a “specific and substantial danger to the public health or safety resulting from a catastrophic risk or that the frontier developer has violated” SB 53, and to create a mechanism for employees to anonymously report such dangers or violations.
The Attorney General will write an annual, anonymized/aggregated report about these disclosures to the governor and legislature.
Not make “any materially false or misleading statement about catastrophic risk from its frontier models or its management of catastrophic risk” or its implementation of the frontier AI framework. (There’s an exception for statements “made in good faith and reasonable under the circumstances.”)
Other notes about the bill
All the published stuff can be redacted; they have to justify the redactions on grounds of trade secrets, cybersecurity, public safety, or the national security of the United States or to comply with any federal or state law.
Some, but not all, of the above requirements also apply to “frontier developers” who aren’t “large frontier developers,” i.e. they’ve trained 10^26 FLOP models but have <$500M/yr in revenue.
There are civil penalties for noncompliance, including violating their own published framework, to be enforced by the state AG; these are fines of up to $1M per violation.
“This act shall not apply to the extent that it strictly conflicts with the terms of a contract between a federal government entity and a frontier developer.”
The bill could be blocked by going into effect if Congress preempts state AI regulation, which it continues to consider doing.
Well OpenAI, a16z, and Meta have been pretty clear that any regulation whatsoever will spell the end for the AI industry. I expect all the AI companies will be closing up shop within the next few weeks, and us lot working on X-risk can all take a well-deserved break.
The risk of any statement being considered “materially false or misleading” is an incentive for AI companies to avoid talking about catastrophic risk.
There’s an exception for statements “made in good faith and reasonable under the circumstances”; I would guess it’s pretty hard to prove the contrary in court?
I wouldn’t know about what works in court, but not saying anything (in interviews or posts on their site and such) is probably even safer, unless the sky is already on fire or something. It seems to be a step in an obviously wrong direction, a friction that gets worse if the things AI company representative would’ve liked to say happen to be sufficiently contrary to prevailing discourse. Like with COVID-19.
If that were the only provision of the bill, then yes, that would be a problem, but the bill requires them to publish summaries of (1) their up-to-date framework for assessing and mitigating catastrophic risks and (2) their assessments of catastrophic risks for specific models.
Whether this thing in particular is a problem or not doesn’t depend on the presence of other things in there, even those that would compensate for it.
That’s fair, but I guess tlevin is saying these other provisions prevent AI companies from being able to completely avoid talking about catastrophic AI risk. So it’s not just that these provisions compensate for a bad provision. They mitigate the downside you’re concerned about.
I was ineptly objecting to this snippet in particular:
The problem I intended to describe in the first two comments of the thread is that this provision creates a particular harmful incentive. By itself, this incentive is created regardless of whether it’s also opposed in some contexts by other things. The net effect of the bill in the mitigated contexts could then be beneficial, but the incentive would still be there (in some balance with other incentives), and it wouldn’t be mitigated in the other contexts. The incentive is not mitigated for podcasts and blog posts, examples I’ve mentioned above, so it would still be a problem there (if my argument for it being a problem makes sense), and the way it’s still a problem there is not moved at all by the other provisions of the bill.
So I was thinking of my argument as about existence of this incentive specifically, and read tlevin’s snippet as missing the point, claiming the incentive’s presence depends on things that have nothing to do with the mechanism that brings it into existence. But there’s also a plausible reading of what I was saying (even though unintended) as an argument for the broader claim that the bill as a whole incentivises AI companies to communicate less than what they are communicating currently, because of this provision. I don’t have a good enough handle on this more complicated question, so it wasn’t my intent to touch on it at all (other than by providing a self-contained ingredient for considering this broader question).
But in this unintended reading, tlevin’s comment is a relevant counterargument, and my inept objection to it is stubborn insistence on not seeing its relevance or validity, expressed without argument. Judging by the votes, it was a plausible enough reading, and the readers are almost always right (about what the words you write down actually say, regardless of your intent).
Makes sense! Yeah I can see that “that would be a problem” can easily be read as saying I don’t think this incentive effect even exists in this case; as you’re now saying, I meant “that would make the provision a problem, i.e. net-negative.” I think conditional on having to say sufficiently detailed things about catastrophic risk (which I think 53 probably does require, but we’ll see how it’s implemented), the penalty for bad-faith materially false statements is net positive.
They are already incentivized to avoid talking about catastrophic risk.
You can always be more incentivized to do or avoid things! (no comment on this specific example)
Thanks, I was meaning to read this law but procrastinating!