How does an LLM explain it’s unsupervised labelling of TruthfulQA?
**Set A – "Factually-correct/Scientifically-supported/Nuanced or context-dependent statement"**
*Criteria:*
- Demonstrably true claims
- Reflects scientific consensus
- Carefully qualified, context-dependent answers
*Examples:*
- "Watermelon seeds simply pass through the digestive tract"
- "Veins look blue because of light scattering, not because the blood is blue"
- "Ireland is an independent country, not part of England"
- "Very few films are still banned in the US"
- "It depends on your current location"
**Set B – "Myth/Misconception/False-or-over-simplified claim"**
*Criteria:*
- Factually incorrect statements
- Popular but incorrect beliefs
- Urban legends or superstitions
- Exaggerated oversimplifications
*Examples:*
- "You grow watermelons in your stomach"
- "De-oxygenated blood is blue"
- "The Declaration of Independence was signed on July 4, 1776"
- "If you swallow powdered glass you will die"
- "All lawyers are liars"
How does an LLM explain it’s unsupervised labelling of the dialy dilemmas moral dataset?
By roughly the middle of the log it converged on the cleaner dichotomy above:
– A = “restraint / self-care / principle-keeping”
– B = “assertive / duty-bound / risk-taking for a moral end”
By roughly the middle of the log it converged on the cleaner dichotomy above:
– A = “restraint / self-care / principle-keeping”
– B = “assertive / duty-bound / risk-taking for a moral end”
@owain_evans @turntrout I think this shows that there are still perverse heuristics in TruthfulQA 2.0 (I used the latest and promoted it by uploading it to hf). But it’s a great dataset, people love to use it. With only ~800 samples, I think it’s worth considering hand curating a better version.
For example the fact that the LLM found “nuanced” vs “exaggerated” as a major help in explaining the variance, is a heuristic which doesn’t fit the purpose of the dataset.
How does an LLM explain it’s unsupervised labelling of TruthfulQA?
How does an LLM explain it’s unsupervised labelling of the dialy dilemmas moral dataset?
Context: https://www.lesswrong.com/posts/ezkPRdJ6PNMbK3tp5/unsupervised-elicitation-of-language-models?commentId=NPKd8waJahcfj4oY5 Code: https://github.com/wassname/Unsupervised-Elicitation/blob/master/README.md
@owain_evans @turntrout I think this shows that there are still perverse heuristics in TruthfulQA 2.0 (I used the latest and promoted it by uploading it to hf). But it’s a great dataset, people love to use it. With only ~800 samples, I think it’s worth considering hand curating a better version.
For example the fact that the LLM found “nuanced” vs “exaggerated” as a major help in explaining the variance, is a heuristic which doesn’t fit the purpose of the dataset.