The permanent motte and bailey that RSPs allow (easily defensible: a framework that seems arbitrarily extensible combined with the belief that you can always change stuff in policy, even over few-years timeframe ; hardly defensible: the actual implementations & the communication around RSPs) is one of the concerns I raise explicitly, and what this comment is doing. Here, while I’m talking in large parts about the ARC RSP principles, you say that I’m talking about “current RSPs”. If you mean that we can change even the RSP principles (and not only their applicatino) to anyone who criticizes the principles of RSPs, then it’s a pretty effective way to make something literally impossible to criticize. We could have taken an arbitrary framework, push for it and say “we’ll do better soon, we need wins”. Claiming that we’ll change the framework (not only the application) in 5 years is a very extraordinary claim and does not seem a good reason to start pushing for a bad framework in the first place.
That it’s not true. The “Safe Zone” in ARC graph clearly suggests that ASL-3 are sufficient. The announce of Anthropic says “require safety, security, and operational standards appropriate to a model’s potential for catastrophic risk”. It implies that ASL-3 measures are sufficient, without actually quantifying the risk (one of the core points of my post), even qualitatively.
At a meta level, I find frustrating that the most upvoted comment, your comment, be a comment that hasn’t seriously read the post, still makes a claim about the entire post, and doesn’t address my request for evidence about the core crux (evidence of major framework changes within 5 years). If “extremely short timelines” means 5y, it seems like many have “extremely short timelines”.
A more in-depth answer:
The permanent motte and bailey that RSPs allow (easily defensible: a framework that seems arbitrarily extensible combined with the belief that you can always change stuff in policy, even over few-years timeframe ; hardly defensible: the actual implementations & the communication around RSPs) is one of the concerns I raise explicitly, and what this comment is doing. Here, while I’m talking in large parts about the ARC RSP principles, you say that I’m talking about “current RSPs”. If you mean that we can change even the RSP principles (and not only their applicatino) to anyone who criticizes the principles of RSPs, then it’s a pretty effective way to make something literally impossible to criticize. We could have taken an arbitrary framework, push for it and say “we’ll do better soon, we need wins”. Claiming that we’ll change the framework (not only the application) in 5 years is a very extraordinary claim and does not seem a good reason to start pushing for a bad framework in the first place.
That it’s not true. The “Safe Zone” in ARC graph clearly suggests that ASL-3 are sufficient. The announce of Anthropic says “require safety, security, and operational standards appropriate to a model’s potential for catastrophic risk”. It implies that ASL-3 measures are sufficient, without actually quantifying the risk (one of the core points of my post), even qualitatively.
At a meta level, I find frustrating that the most upvoted comment, your comment, be a comment that hasn’t seriously read the post, still makes a claim about the entire post, and doesn’t address my request for evidence about the core crux (evidence of major framework changes within 5 years). If “extremely short timelines” means 5y, it seems like many have “extremely short timelines”.