@owain_evans @turntrout I think this shows that there are still perverse heuristics in TruthfulQA 2.0 (I used the latest and promoted it by uploading it to hf). But it’s a great dataset, people love to use it. With only ~800 samples, I think it’s worth considering hand curating a better version.
For example the fact that the LLM found “nuanced” vs “exaggerated” as a major help in explaining the variance, is a heuristic which doesn’t fit the purpose of the dataset.
Update, I’ve been using the self/honesty subset of Daily dilemmas, and I think it’s quite a good alternative for testing honesty. The questions are taken from Reddit, and have conflicting values like loyalty vs honesty.
@owain_evans @turntrout I think this shows that there are still perverse heuristics in TruthfulQA 2.0 (I used the latest and promoted it by uploading it to hf). But it’s a great dataset, people love to use it. With only ~800 samples, I think it’s worth considering hand curating a better version.
For example the fact that the LLM found “nuanced” vs “exaggerated” as a major help in explaining the variance, is a heuristic which doesn’t fit the purpose of the dataset.
Update, I’ve been using the self/honesty subset of Daily dilemmas, and I think it’s quite a good alternative for testing honesty. The questions are taken from Reddit, and have conflicting values like loyalty vs honesty.
I hope to make a honesty subset as a simple labelled dataset. Rough code here https://github.com/wassname/AntiPaSTO/blob/main/antipasto/train/daily_dilemas.py