Severe or willful violation of our RSP, or misleading the public about it.
I’m guessing you don’t consider the v3 updates to be willful violation of the RSP, but, I don’t really know why.
I can tell a story that’s like “well it is still following a process of changing the RSP that was laid on in advance”. But I’d pretty surprised if you thought “go through the motions of changing it advance” is loadbearing for whether Anthropic is good, if the RSP was predictably going to get changed when it became too costly.
But, if you don’t consider the v3 updates a spiritual violation I don’t really know what could count as a violation that is more specific than “you think overall Anthropic is net bad for the world”, at which point, I don’t know what work “the RSP” would be doing in your personal accounting.
I know it’s hard to articulate exact details for things like this, and I know you’re tired[1] of these convos, but, I would appreciate some kind of example of what would actually count as a willful RSP violation in your book, that is particularly different from “the leadership not being trustworthy or having bad judgment.”
I’m sympathetic about people complaining a bunch without understanding, e.g. how board governance works, or generally acting under a simple tribalistic model that doesn’t engage with any details, and often yelling at you in a way that’s really unpleasant that you’ve done a good job off dealing with. But from my end, Anthropic leadership just straightforwardly has already demonstrated bad integrity in the relevant areas based on the publicly available information. And your previous explanations haven’t actually addressed that and I’m not net-sympathetic about you being tired of that part.
I don’t consider the v3 updates to be willful violation of the RSP because updating and violating a policy are completely different actions! Illustrative examples of willful or serious RSP violations:
Proceeding on a marginal-risk justification without Board + LTBT approval. §3.4(5)
2. Violating the “Anthropic in the lead” pause & safety measure commitments (Appendix A)
Knowingly misleading content in a risk report.
4. Over-redacting risk reports for external reviewers.
5. Retaliating against a noncompliance reporter. §4(3)
I also reject the description of this update as “going through the motions in advance [....] when it became too costly”; I think that RSPv3 is actually a better safety response to the situation we find ourselves in[1] than earlier versions; my problem with v2 is not that it was costly but that following it would not actually lead to good outcomes.
A bad-according-to-Zac RSP update might well trigger either my “Anthropic seems net bad for the world” or my “lost trust in integrity of leadership” red lines, but this one doesn’t. There’s already plenty of discourse in the comments over here, but if you have concrete suggestions for improving the current version of the RSP, I will read them and if I agree advocate for them internally, though it’s unlikely I will think public engagement is a good use of my time and energy.
e.g. unfortunately the world has mostly underperformed my 2023-24 hopes for government action on catastrophic risks from AI, there are several competing developers with de minimis safety commitments, etc.
So your condition is “Severe or willful violation of our RSP, or misleading the public about it”.
My guess is that most people understood the RSP, or at least the part about not releasing dangerous systems, as a COMMITMENT in the sense of “we won’t do this” not a commitment in the sense of “we won’t do this… unless we publicly change our mind first”. I do think it’s hard to get good data on this, but I wonder if you disagree with my guess? It seems like there was at least substantial confusion around this point within the AI safety community (who I’d consider part of “the public”), confusion which mostly could’ve been easily remedied by Anthropic—the failure to do so seems like at least “letting a significant fraction of the public be misled”, which I think counts as “misleading the public”.
Unless, or course, the RSP ought to have been interpretted as a COMMITMENT all along, in which case, this update seems like a violation of an implicit “meta-commitment” to honor the COMMITMENT in perpituity.
If you agree with the thrust of my argument, it seems like you’d have to either 1) agree that your condition is met or 2) argue that it was clear to the public that the commitment was not a COMMITMENT, or 3) argue that there is no such implicit meta-commitment.
I’d appreciate if you would clarify where exactly our disagreement lies.
I’m guessing you don’t consider the v3 updates to be willful violation of the RSP, but, I don’t really know why.
I can tell a story that’s like “well it is still following a process of changing the RSP that was laid on in advance”. But I’d pretty surprised if you thought “go through the motions of changing it advance” is loadbearing for whether Anthropic is good, if the RSP was predictably going to get changed when it became too costly.
But, if you don’t consider the v3 updates a spiritual violation I don’t really know what could count as a violation that is more specific than “you think overall Anthropic is net bad for the world”, at which point, I don’t know what work “the RSP” would be doing in your personal accounting.
I know it’s hard to articulate exact details for things like this, and I know you’re tired[1] of these convos, but, I would appreciate some kind of example of what would actually count as a willful RSP violation in your book, that is particularly different from “the leadership not being trustworthy or having bad judgment.”
I’m sympathetic about people complaining a bunch without understanding, e.g. how board governance works, or generally acting under a simple tribalistic model that doesn’t engage with any details, and often yelling at you in a way that’s really unpleasant that you’ve done a good job off dealing with. But from my end, Anthropic leadership just straightforwardly has already demonstrated bad integrity in the relevant areas based on the publicly available information. And your previous explanations haven’t actually addressed that and I’m not net-sympathetic about you being tired of that part.
I don’t consider the v3 updates to be willful violation of the RSP because updating and violating a policy are completely different actions! Illustrative examples of willful or serious RSP violations:
Proceeding on a marginal-risk justification without Board + LTBT approval. §3.4(5) 2. Violating the “Anthropic in the lead” pause & safety measure commitments (Appendix A)
Knowingly misleading content in a risk report. 4. Over-redacting risk reports for external reviewers. 5. Retaliating against a noncompliance reporter. §4(3)
I also reject the description of this update as “going through the motions in advance [....] when it became too costly”; I think that RSPv3 is actually a better safety response to the situation we find ourselves in [1] than earlier versions; my problem with v2 is not that it was costly but that following it would not actually lead to good outcomes.
A bad-according-to-Zac RSP update might well trigger either my “Anthropic seems net bad for the world” or my “lost trust in integrity of leadership” red lines, but this one doesn’t. There’s already plenty of discourse in the comments over here, but if you have concrete suggestions for improving the current version of the RSP, I will read them and if I agree advocate for them internally, though it’s unlikely I will think public engagement is a good use of my time and energy.
e.g. unfortunately the world has mostly underperformed my 2023-24 hopes for government action on catastrophic risks from AI, there are several competing developers with de minimis safety commitments, etc.
Thanks for sharing your thoughts.
So your condition is “Severe or willful violation of our RSP, or misleading the public about it”.
My guess is that most people understood the RSP, or at least the part about not releasing dangerous systems, as a COMMITMENT in the sense of “we won’t do this” not a commitment in the sense of “we won’t do this… unless we publicly change our mind first”. I do think it’s hard to get good data on this, but I wonder if you disagree with my guess? It seems like there was at least substantial confusion around this point within the AI safety community (who I’d consider part of “the public”), confusion which mostly could’ve been easily remedied by Anthropic—the failure to do so seems like at least “letting a significant fraction of the public be misled”, which I think counts as “misleading the public”.
Unless, or course, the RSP ought to have been interpretted as a COMMITMENT all along, in which case, this update seems like a violation of an implicit “meta-commitment” to honor the COMMITMENT in perpituity.
If you agree with the thrust of my argument, it seems like you’d have to either 1) agree that your condition is met or 2) argue that it was clear to the public that the commitment was not a COMMITMENT, or 3) argue that there is no such implicit meta-commitment.
I’d appreciate if you would clarify where exactly our disagreement lies.