faul_sname comments on faul_sname’s Shortform

faul_sname 1 Oct 2025 20:03 UTC
2 points
0

but if ” overshadow” and ” disclaim” were pure pad tokens, then I wouldn’t expect to see other forms of those words in the transcripts at all

I’m curious why you wouldn’t expect that. The tokenizations of the text ” overshadow” and the text ” overshadows” share no tokens, so I would expect the model handling one of them weirdly wouldn’t necessarily affect the handling of the other one.
- Rauno Arike 1 Oct 2025 20:18 UTC
  1 point
  0
  Parent
  They’re fairly uncommon words, and there are other words that would fit the contexts in which “overshadows” and “disclaimers” were used more naturally. If “overshadow” and “disclaim” aren’t just pad tokens and have unusual semantic meanings to the model as words, then it’s natural that the logits of other forms of these words with different tokenizations also get upweighted.