joseph_c comments on Shorter Tokens Are More Likely

joseph_c 24 Aug 2025 1:36 UTC
5 points
0
I remember reading a paper about how aiming for a certain entropy per token made LLMs sound more human. I think it might have been this paper? This marginalization of later tokens might be the reason why—aiming for a certain entropy would encourage lower probability tokens more often than a fixed temperature would while still avoiding “noisy” tokens.