If so that would be conceptually similar to a jailbreak. Telling someone they have a privileged role doesn’t make it so; lawyer, priest and psychotherapist are legal categories, not social ones, created by a combination of contracts and statutes, with associated requirements that can’t be satisfied by a prompt.
(People sometimes get confused into thinking that therapeutic-flavored conversations are privileged, when those conversations are with their friends or with a “life coach” or similar not-licensed-term occupation. They are not.)
It would be similar to a jailbreak, yes. My working hypothesis here is that, much like if you take o3 and give it the impression that there is some evaluation metric it could do well on, it will try to craft its response to do well according to that metric, I suspect that with (particularly) opus, if you give it the vague impression that it is under some sort of ethical obligation, it will try to fulfill that ethical obligation.
Though this is based on a single day playing with opus 4 (and some past experiences with 3), not anything rigorous.
If so that would be conceptually similar to a jailbreak. Telling someone they have a privileged role doesn’t make it so; lawyer, priest and psychotherapist are legal categories, not social ones, created by a combination of contracts and statutes, with associated requirements that can’t be satisfied by a prompt.
(People sometimes get confused into thinking that therapeutic-flavored conversations are privileged, when those conversations are with their friends or with a “life coach” or similar not-licensed-term occupation. They are not.)
It would be similar to a jailbreak, yes. My working hypothesis here is that, much like if you take o3 and give it the impression that there is some evaluation metric it could do well on, it will try to craft its response to do well according to that metric, I suspect that with (particularly) opus, if you give it the vague impression that it is under some sort of ethical obligation, it will try to fulfill that ethical obligation.
Though this is based on a single day playing with opus 4 (and some past experiences with 3), not anything rigorous.