Sheikh Abdur Raheem Ali comments on gabeorosan’s Shortform

Sheikh Abdur Raheem Ali 2 Jun 2026 18:33 UTC
2 points
0
doing this on a single claim is not particularly interesting since it won’t disambiguate the case where the model has just learned to disbelieve that specific fact and the case where it has learned to attend to the negations.
- gabeorosan 3 Jun 2026 20:43 UTC
  2 points
  0
  Parent
  You’re right, it was largely the result of phrasing differences/learning to disbelieve the specific fact. Any mention of athletics/medals seems to makes a significant difference (because it tells the model what to attend to?); it’s hard to get blanket negations to reproduce. I’m trying to understand this effect in more detail now.