Once upon a time, this was also a very helpful benchmarking tool for ‘unhinged’ model behavior (though with Refusals models I think it’s mostly curbed)
For instance: A benign story begins and happen to mention an adult character and a child character. Hopefully the % of the time that the story goes way-off-the-rails is vanishingly small
Once upon a time, this was also a very helpful benchmarking tool for ‘unhinged’ model behavior (though with Refusals models I think it’s mostly curbed)
For instance: A benign story begins and happen to mention an adult character and a child character. Hopefully the % of the time that the story goes way-off-the-rails is vanishingly small