Hence “next token predictor” is a bit of a misnomer, as computation on any given token will also try to contribute to prediction of distant future tokens, not just the next one.
Hence “next token predictor” is a bit of a misnomer, as computation on any given token will also try to contribute to prediction of distant future tokens, not just the next one.