Zach Stein-Perlman comments on Zach Stein-Perlman’s Shortform

Zach Stein-Perlman 31 Mar 2025 23:30 UTC
15 points
0
Minor Anthropic RSP update.
Old:
New:
I don’t know what e.g. the “4” in “AI R&D-4″ means; perhaps it is a mistake.^[1]
Sad that the commitment to specify the AI R&D safety case thing was removed, and sad that “accountability” was removed.
Slightly sad that AI R&D capabilities triggering >ASL-3 security went from “especially in the case of dramatic acceleration” to only in that case.
Full AI R&D-5 description from appendix:

AI R&D-5: The ability to cause dramatic acceleration in the rate of effective scaling. Specifically, this would be the case if we observed or projected an increase in the effective training compute of the world’s most capable model that, over the course of a year, was equivalent to two years of the average rate of progress during the period of early 2018 to early 2024. We roughly estimate that the 2018-2024 average scaleup was around 35x per year, so this would imply an actual or projected one-year scaleup of 35^2 = ~1000x.
The 35x/year scaleup estimate is based on assuming the rate of increase in compute being used to train frontier models from ~2018 to May 2024 is 4.2x/year (reference), the impact of increased (LLM) algorithmic efficiency is roughly equivalent to a further 2.8x/year (reference), and the impact of post training enhancements is a further 3x/year (informal estimate). Combined, these have an effective rate of scaling of 35x/year.
Also, from the changelog:
Iterative Commitment: We have adopted a general commitment to reevaluate our Capability Thresholds whenever we upgrade to a new set of Required Safeguards. We have decided not to maintain a commitment to define ASL-N+1 evaluations by the time we develop ASL-N models; such an approach would add unnecessary complexity because Capability Thresholds do not naturally come grouped in discrete levels. We believe it is more practical and sensible instead to commit to reconsidering the whole list of Capability Thresholds whenever we upgrade our safeguards.
I’m confused by the last sentence.
See also https://www.anthropic.com/rsp-updates.
1. ^
  Jack Clark misspeaks on twitter, saying “these updates clarify . . . our ASL-4/5 thresholds for AI R&D.” But AI R&D-4 and −5 trigger ASL-3 and −4 safeguards, respectively; the RSP doesn’t mention ASL-5. Maybe the RSP is wrong and AI R&D-4 is supposed to trigger ASL-4 security, and same for 5 — that would make the terms “AI R&D-4″ and “AI R&D-5” make more sense...
- Davidmanheim 1 Apr 2025 18:51 UTC
  4 points
  0
  Parent
  We believe it is more practical and sensible instead to commit to reconsidering the whole list of Capability Thresholds whenever we upgrade our safeguards.
  I’m concerned because this change seems like starting down a slippery slope where it’s easy to change the rules which they previously said would apply, by making smaller changes instead.