They’re fairly uncommon words, and there are other words that would fit the contexts in which “overshadows” and “disclaimers” were used more naturally. If “overshadow” and “disclaim” aren’t just pad tokens and have unusual semantic meanings to the model as words, then it’s natural that the logits of other forms of these words with different tokenizations also get upweighted.
They’re fairly uncommon words, and there are other words that would fit the contexts in which “overshadows” and “disclaimers” were used more naturally. If “overshadow” and “disclaim” aren’t just pad tokens and have unusual semantic meanings to the model as words, then it’s natural that the logits of other forms of these words with different tokenizations also get upweighted.