We’ve just noticed that some of the honesty fine-tuning data we shared as part of Evaluating honesty and lie detection techniques on a diverse suite of dishonest models was the wrong data. The goal_honesty_data.jsonl file accidentally consisted of dishonesty data, i.e. data where all responses were dishonest. We checked and don’t believe that we used the wrong data when conducting experiments—we just linked the wrong data from the blog post. We’ve now corrected the mistake; the correct data should be linked now.
Apologies to anyone who used this data for experiments. (Or your welcome, for the vivid lesson on the importance of reading your data!)
We’ve just noticed that some of the honesty fine-tuning data we shared as part of Evaluating honesty and lie detection techniques on a diverse suite of dishonest models was the wrong data. The
goal_honesty_data.jsonlfile accidentally consisted of dishonesty data, i.e. data where all responses were dishonest. We checked and don’t believe that we used the wrong data when conducting experiments—we just linked the wrong data from the blog post. We’ve now corrected the mistake; the correct data should be linked now.Apologies to anyone who used this data for experiments. (Or your welcome, for the vivid lesson on the importance of reading your data!)
Thanks to Helena Casademunt for catching this.