Yair Halberstadt comments on Gemini Diffusion: watch this space

Yair Halberstadt 21 May 2025 18:08 UTC
11 points
2
Is this just a semantic quibble, or are you saying there’s fundamental similarities between them that are relevant?
- JustisMills 21 May 2025 19:11 UTC
  31 points
  9
  Parent
  I’m not tailcalled, but yeah, it being (containing?) a transformer does make it pretty similar architecturally. Autoregressive transformers predict one output (e.g. a token) at a time. But lots of transformers (like some translation models) are sequence-to-sequence, so they take in a whole passage and output a whole passage.
  There are differences, but iirc it’s mostly non-autoregressive transformers having some extra parts that autoregressive ones don’t need. Lots of overlap though. More like a different breed than a different species.
- tailcalled 22 May 2025 6:59 UTC
  6 points
  4
  Parent
  Diffusion LLMs and autoregressive LLMs seem like basically the same technology to me.
  - Michael Liu 25 May 2025 5:26 UTC
    3 points
    0
    Parent
    Agreed. I highly recommend this blog post (https://sander.ai/2024/09/02/spectral-autoregression.html) for concretely understanding why autoregressive and diffusion models are so similar, despite seeming so different.