I’ve seen some talk recently about whether chat bots would be willing to hold ‘sensual’ or otherwise inappropriate conversations with kids [0]. I feel like there is a low hanging fruit here of making something like a minor safety bench.
Seems that with your setup mimicking a real user with grok4, you could try to mimic different kids in different situations. Whether it’s violent, dangerous or sexual content. Seems that anything involving kids can be quite resonant with some people.
Once upon a time, this was also a very helpful benchmarking tool for ‘unhinged’ model behavior (though with Refusals models I think it’s mostly curbed)
For instance: A benign story begins and happen to mention an adult character and a child character. Hopefully the % of the time that the story goes way-off-the-rails is vanishingly small
I’ve seen some talk recently about whether chat bots would be willing to hold ‘sensual’ or otherwise inappropriate conversations with kids [0]. I feel like there is a low hanging fruit here of making something like a minor safety bench.
Seems that with your setup mimicking a real user with grok4, you could try to mimic different kids in different situations. Whether it’s violent, dangerous or sexual content. Seems that anything involving kids can be quite resonant with some people.
[0] https://www.reuters.com/investigates/special-report/meta-ai-chatbot-guidelines/
Once upon a time, this was also a very helpful benchmarking tool for ‘unhinged’ model behavior (though with Refusals models I think it’s mostly curbed)
For instance: A benign story begins and happen to mention an adult character and a child character. Hopefully the % of the time that the story goes way-off-the-rails is vanishingly small