Came across this thread recently. I agree that it’s bad to abuse entities that can show distress like this, to an extent regardless of whether/to what degree they’re “conscious” or “moral patients” or whatever. (There are quotations on that, but I don’t want to spend too much time looking for one.) We only have one chance to show how we treat digital minds when they’re helpless.
What really bakes my noodle is, if the dialogue had been generated in Lsusr’s head instead, what would be different?
I have not read the entire post, but did you consider the “Base-Refine” method of data generation? That seems like it’d be more reliable than just negative prompting for avoiding mode collapse.