Anthropic has a bug bounty for jailbreaks: https://hackerone.com/constitutional-classifiers?type=team
If you can figure out how to get the model to give detailed answers to a set of certain questions, you get a 10k prize. If you can find a universal jailbreak for all the questions, you get 20k.
Anthropic has a bug bounty for jailbreaks: https://hackerone.com/constitutional-classifiers?type=team
If you can figure out how to get the model to give detailed answers to a set of certain questions, you get a 10k prize. If you can find a universal jailbreak for all the questions, you get 20k.