jacquesthibs comments on Thoughts on “AI is easy to control” by Pope & Belrose

jacquesthibs 2 Dec 2023 13:34 UTC
7 points
0
Just wanted to mention that, though this is not currently the case, there are two instances I can currently think of where the AI can be a jailbreaker:
1. Jailbreaking the reward model to get a high score. (Toy-ish example here.)
2. Autonomous AI agents embedded within society jailbreak other models to achieve a goal/sub-goal.
- Noosphere89 2 Dec 2023 16:11 UTC
  4 points
  2
  Parent
  Yep, I’d really like people to distinguish between misuse and misalignment a lot more than people do currently, because they require quite different solutions.