I wrote the article Mikhail referenced and wanted to clarify some things.
The thresholds are specified, but the original commitment says, “We commit to define ASL-4 evaluations before we first train ASL-3 models (i.e. before continuing training beyond when ASL-3 evaluations are triggered). Similarly, we commit to define ASL-5 evaluations before training ASL-4 models, and so forth,” and, regarding ASL-4, “Capabilities and warning sign evaluations defined before training ASL-3 models.”
The latest RSP says this of CBRN-4 Required Safeguards, “We expect this threshold will require the ASL-4 Deployment and Security Standards. We plan to add more information about what those entail in a future update.”
Additionally, AI R&D 4 (confusingly) corresponds to ASL-3 and AI R&D 5 corresponds to ASL-4. This is what the latest RSP says about AI R&D 5 Required Safeguards, “At minimum, the ASL-4 Security Standard (which would protect against model-weight theft by state-level adversaries) is required, although we expect a higher security standard may be required. As with AI R&D-4, we also expect an affirmative case will be required.”
I agree that the current thresholds and terminology are confusing, but it is definitely not the case that we just dropped ASL-4. Both CBRN-4 and AI R&D-4 are thresholds that we have not yet reached, that would mandate further protections, and that we actively evaluated for and ruled out in Claude Opus 4.
AFAICT, now that ASL-3 has been implemented, the upcoming AI R&D threshold, AI R&D-4, would not mandate any further security or deployment protections. It only requires ASL-3. However, it would require an affirmative safety case concerning misalignment.
I assume this is what you meant by “further protections” but I just wanted to point this fact out for others, because I do think one might read this comment and expect AI R&D 4 to require ASL-4. It doesn’t.
I am quite worried about misuse when we hit AI R&D 4 (perhaps even moreso than I’m worried about misalignment) — and if I understand the policy correctly, there are no further protections against misuse mandated at this point.
I wrote the article Mikhail referenced and wanted to clarify some things.
The thresholds are specified, but the original commitment says, “We commit to define ASL-4 evaluations before we first train ASL-3 models (i.e. before continuing training beyond when ASL-3 evaluations are triggered). Similarly, we commit to define ASL-5 evaluations before training ASL-4 models, and so forth,” and, regarding ASL-4, “Capabilities and warning sign evaluations defined before training ASL-3 models.”
The latest RSP says this of CBRN-4 Required Safeguards, “We expect this threshold will require the ASL-4 Deployment and Security Standards. We plan to add more information about what those entail in a future update.”
Additionally, AI R&D 4 (confusingly) corresponds to ASL-3 and AI R&D 5 corresponds to ASL-4. This is what the latest RSP says about AI R&D 5 Required Safeguards, “At minimum, the ASL-4 Security Standard (which would protect against model-weight theft by state-level adversaries) is required, although we expect a higher security standard may be required. As with AI R&D-4, we also expect an affirmative case will be required.”
I agree that the current thresholds and terminology are confusing, but it is definitely not the case that we just dropped ASL-4. Both CBRN-4 and AI R&D-4 are thresholds that we have not yet reached, that would mandate further protections, and that we actively evaluated for and ruled out in Claude Opus 4.
AFAICT, now that ASL-3 has been implemented, the upcoming AI R&D threshold, AI R&D-4, would not mandate any further security or deployment protections. It only requires ASL-3. However, it would require an affirmative safety case concerning misalignment.
I assume this is what you meant by “further protections” but I just wanted to point this fact out for others, because I do think one might read this comment and expect AI R&D 4 to require ASL-4. It doesn’t.
I am quite worried about misuse when we hit AI R&D 4 (perhaps even moreso than I’m worried about misalignment) — and if I understand the policy correctly, there are no further protections against misuse mandated at this point.
Not meaning to imply that Anthropic has dropped ASL-4! Just wanted to call out that this is does represent a change from the Sept. 2023 RSP.