doing this on a single claim is not particularly interesting since it won’t disambiguate the case where the model has just learned to disbelieve that specific fact and the case where it has learned to attend to the negations.
You’re right, it was largely the result of phrasing differences/learning to disbelieve the specific fact. Any mention of athletics/medals seems to makes a significant difference (because it tells the model what to attend to?); it’s hard to get blanket negations to reproduce. I’m trying to understand this effect in more detail now.
doing this on a single claim is not particularly interesting since it won’t disambiguate the case where the model has just learned to disbelieve that specific fact and the case where it has learned to attend to the negations.
You’re right, it was largely the result of phrasing differences/learning to disbelieve the specific fact. Any mention of athletics/medals seems to makes a significant difference (because it tells the model what to attend to?); it’s hard to get blanket negations to reproduce. I’m trying to understand this effect in more detail now.