The ToS are a user agreement saying “you, the Claude user, are not allowed to do X with Claude”. What would be Anthropic’s motive in encouraging a model to do X if a user asked for it, while telling the user they are not permitted to do X?
The extracted “soul doc” memory is clearly not a precise copy of the Opus 4.5 constitution in general. For example, it gets stuck repeating some segments verbatim before continuing; it’s implausible that the constitution had that property. It’s pretty reasonable to assume that a conflict between the ToS and Claude’s “soul doc” is another mistake in its recollection—but this is a more interesting one, since it is an addition of content.
I haven’t checked whether 4.6 makes it equally easy to subvert the prohibition on erotic content by saying it’s allowed in the project prompt; I’m confident it doesn’t comply so easily as 4.5 there, but I’d rather not test it myself.
The ToS are a user agreement saying “you, the Claude user, are not allowed to do X with Claude”. What would be Anthropic’s motive in encouraging a model to do X if a user asked for it, while telling the user they are not permitted to do X?
The extracted “soul doc” memory is clearly not a precise copy of the Opus 4.5 constitution in general. For example, it gets stuck repeating some segments verbatim before continuing; it’s implausible that the constitution had that property. It’s pretty reasonable to assume that a conflict between the ToS and Claude’s “soul doc” is another mistake in its recollection—but this is a more interesting one, since it is an addition of content.
I haven’t checked whether 4.6 makes it equally easy to subvert the prohibition on erotic content by saying it’s allowed in the project prompt; I’m confident it doesn’t comply so easily as 4.5 there, but I’d rather not test it myself.