Oh man, it totally was wrong, sorry about that, updated data again. I looked at the train datasets from the various models we trained and reran the data generation pipeline and the results looked as expected, so I don’t think I trained models on the wrong data for the original results, but I’m not fully sure how this data mix came about. It looks like it’s a combination of the followup and goals data, i think claude might have accidentally mixed them when i was having it sanitize it for release
also fwiw depending on what you’re using this data for, you should probably just regenerate it, it’s not that hard and you could probably easily generate more diverse data. it probably also helps if the prompts actually elicit deception on the model you’re working with
Oh man, it totally was wrong, sorry about that, updated data again. I looked at the train datasets from the various models we trained and reran the data generation pipeline and the results looked as expected, so I don’t think I trained models on the wrong data for the original results, but I’m not fully sure how this data mix came about. It looks like it’s a combination of the followup and goals data, i think claude might have accidentally mixed them when i was having it sanitize it for release
also fwiw depending on what you’re using this data for, you should probably just regenerate it, it’s not that hard and you could probably easily generate more diverse data. it probably also helps if the prompts actually elicit deception on the model you’re working with