Mikola Lysenko comments on Gemini Diffusion: watch this space

Mikola Lysenko 9 Jun 2025 12:53 UTC
1 point
0
I disagree. In practice diffusion models are autoregressive for generating non-trivial amounts of text. A better way to think about diffusion models is that they are a generalization of multi-token prediction (similar to how DeepSeek does it) where the number of tokens you get to predict in 1 shot is controllable and steerable. If you do use a diffusion model over a larger generation you will end up running it autoregressively, and in the limit you could make them work like a normal 1-token-at-a-time LLM or do up to 1-big-batch-of-N-tokens at a time.
- tailcalled 9 Jun 2025 13:14 UTC
  2 points
  0
  Parent
  Point is they’re still LLMs.