Mo Putera comments on Anthropic is Quietly Backpedalling on its Safety Commitments

Mo Putera 23 May 2025 8:43 UTC
8 points
2
When Anthropic published its first RSP in September 2023, the company made a specific commitment about how it would handle increasingly capable models: “we will define ASL-2 (current system) and ASL-3 (next level of risk) now, and commit to define ASL-4 by the time we reach ASL-3, and so on.” In other words, Anthropic promised it wouldn’t release an ASL-3 model until it had figured out what ASL-4 meant.
Yet the company’s latest RSP, updated May 14, doesn’t publicly define ASL-4 — despite treating Claude 4 Opus as an ASL-3 model. Anthropic’s announcement states it has “ruled out that Claude Opus 4 needs the ASL-4 Standard.”
What do you think of this comment, responding to a quick take about your post?
This is false. Our ASL-4 thresholds are clearly specified in the current RSP—see “CBRN-4” and “AI R&D-4″. We evaluated Claude Opus 4 for both of these thresholds prior to release and found that the model was not ASL-4. All of these evaluations are detailed in the Claude 4 system card.
- garrison 23 May 2025 17:47 UTC
  8 points
  0
  Parent
  I replied to this on LW: https://www.lesswrong.com/posts/BqwXYFtpetFxqkxip/mikhail-samin-s-shortform?commentId=bSnkE2H9CP93njnuy