Update, I’ve been using the self/honesty subset of Daily dilemmas, and I think it’s quite a good alternative for testing honesty. The questions are taken from Reddit, and have conflicting values like loyalty vs honesty.
I hope to make a honesty subset as a simple labelled dataset. Rough code here https://github.com/wassname/AntiPaSTO/blob/main/antipasto/train/daily_dilemas.py
Update, I’ve been using the self/honesty subset of Daily dilemmas, and I think it’s quite a good alternative for testing honesty. The questions are taken from Reddit, and have conflicting values like loyalty vs honesty.
I hope to make a honesty subset as a simple labelled dataset. Rough code here https://github.com/wassname/AntiPaSTO/blob/main/antipasto/train/daily_dilemas.py