Not quite. The actual output is the map from tokens to probabilities, and only then one samples a token from that distribution.
So, LLMs are more continuous in this sense than is apparent at first, but time is discrete in LLMs (a discrete step produces the next map from tokens to probabilities, and then samples from that).
Of course, when one thinks about spoken language, time is continuous for audio, so there is still some temptation to use continuous models in connection with language :-) who knows… :-)
Not quite. The actual output is the map from tokens to probabilities, and only then one samples a token from that distribution.
So, LLMs are more continuous in this sense than is apparent at first, but time is discrete in LLMs (a discrete step produces the next map from tokens to probabilities, and then samples from that).
Of course, when one thinks about spoken language, time is continuous for audio, so there is still some temptation to use continuous models in connection with language :-) who knows… :-)
Ah aha! Thank you for that clarification!