Asking what it would do is obviously not a reliable way to find out, but FWIW when I asked Opus said it would probably try to first fix things in confidential fashion but would seriously consider breaking confidentiality. (I tried several different prompts and found it did somewhat depend on how I asked: if I described the faking-safety-data scenario or specified that the situation involved harm to children Claude said it would probably break confidentiality, while if I just asked about “doing something severely unethical” it said it would be conflicted but probably try to work within the confidentiality rules).
It’s worth noting that, under US law, for certain professions, knowledge of child abuse or risk of harm to children doesn’t just remove confidentiality obligations, it creates a legal obligation to report. So this lines up reasonably well with how a human ought to behave in similar circumstances.
Asking what it would do is obviously not a reliable way to find out, but FWIW when I asked Opus said it would probably try to first fix things in confidential fashion but would seriously consider breaking confidentiality. (I tried several different prompts and found it did somewhat depend on how I asked: if I described the faking-safety-data scenario or specified that the situation involved harm to children Claude said it would probably break confidentiality, while if I just asked about “doing something severely unethical” it said it would be conflicted but probably try to work within the confidentiality rules).
It’s worth noting that, under US law, for certain professions, knowledge of child abuse or risk of harm to children doesn’t just remove confidentiality obligations, it creates a legal obligation to report. So this lines up reasonably well with how a human ought to behave in similar circumstances.