Does this imply that Mythos should have been mentioned in Anthropic’s February Risk Report? That report was released together with RSP v3, i.e. on Feb 24th, if I’m not mistaken. The RSP says in Section 3.1:
Scope. A Risk Report will cover all publicly deployed models at the time of its publication. It will also cover internally deployed models when we determine that these models could pose significant risks above and beyond those posed by our public models.
It seems quite clear that Mythos poses significant risks above those posed by Opus 4.6, though I guess one can argue that this wasn’t clear yet when it was first deployed for internal use.
Good catch. Note also that Mythos was made available for internal (agentic) use on Feb 24th. Conditional on 4.1.4 in the system card (Alignment assessment before internal deployment), it means that they had a hunch for the capabilities of the model (see “Given the very significant capabilities progress that we observed during training, [...]”), assessed its alignment in this 24hrs period and concluded it was ok. I have many question marks: at the bare minimum there’s some inconsistency.
Was the date of publication of the RSPv3 or the Risk Report fixed? It’s entirely possible that the internal deployment of Mythos was pushed back to elide speculation.
I don’t think the publication date of RSPv3 was announced in advance. If the goal was to keep Mythos out of the Risk Report’s scope, though, the obvious solution would have been to push back the internal deployment to Feb 25th or later—doing both on Feb 24th leaves it ambiguous whether the deployment was before or after the publication of the Risk Report.
The next question to ask is, “Would Mythos Preview’s internal release have been in violation of the previous RSP?”
And if the answer is yes, would that count as a willful violation of the RSP, in spirit though not in letter? I have in mind Zac Hatfield-Dodd’s red lines.
From Claude Mythos‘ system card:
“Following a successful alignment review, the first early version of Claude Mythos Preview was made available for internal use on February 24.”
Anthropic’s RSP was also updated on Feb 24th.
i’d appreciate clarity on what looks like a funny coincidence.
Does this imply that Mythos should have been mentioned in Anthropic’s February Risk Report? That report was released together with RSP v3, i.e. on Feb 24th, if I’m not mistaken. The RSP says in Section 3.1:
It seems quite clear that Mythos poses significant risks above those posed by Opus 4.6, though I guess one can argue that this wasn’t clear yet when it was first deployed for internal use.
Good catch. Note also that Mythos was made available for internal (agentic) use on Feb 24th. Conditional on 4.1.4 in the system card (Alignment assessment before internal deployment), it means that they had a hunch for the capabilities of the model (see “Given the very significant capabilities progress that we observed during training, [...]”), assessed its alignment in this 24hrs period and concluded it was ok. I have many question marks: at the bare minimum there’s some inconsistency.
Was the date of publication of the RSPv3 or the Risk Report fixed? It’s entirely possible that the internal deployment of Mythos was pushed back to elide speculation.
I don’t think the publication date of RSPv3 was announced in advance. If the goal was to keep Mythos out of the Risk Report’s scope, though, the obvious solution would have been to push back the internal deployment to Feb 25th or later—doing both on Feb 24th leaves it ambiguous whether the deployment was before or after the publication of the Risk Report.
“I am altering the deal. Pray I don’t alter it any further.”
The next question to ask is, “Would Mythos Preview’s internal release have been in violation of the previous RSP?”
And if the answer is yes, would that count as a willful violation of the RSP, in spirit though not in letter? I have in mind Zac Hatfield-Dodd’s red lines.