Thanks for pointing that out, I edited the post.
Since you’re working on safety at Anthropic, I would be interested to hear from you on two other points:
What motivated the removal of threat models related to radiological and nuclear weapons in the RSP v3.0 update?
What specific safeguards have been put in place to prevent recurrence of the inclusion of chain-of-thought content in reward computation?
Thanks for pointing that out, I edited the post.
Since you’re working on safety at Anthropic, I would be interested to hear from you on two other points:
What motivated the removal of threat models related to radiological and nuclear weapons in the RSP v3.0 update?
What specific safeguards have been put in place to prevent recurrence of the inclusion of chain-of-thought content in reward computation?