J Bostock comments on Gemini Diffusion: watch this space

J Bostock 20 May 2025 21:17 UTC
14 points
7
My understanding was that diffusion refers to a training objective, and isn’t tied to a specific architecture. For example OpenAI’s Sora is described as a diffusion transformer. Do you mean you expect diffusion transformers to scale worse than autoregressive transformers? Or do you mean you don’t think this model is a transformer in terms of architecture.
- Alice Blair 20 May 2025 21:49 UTC
  6 points
  0
  Parent
  Oops, I wrote that without fully thinking about diffusion models. I meant to contrast diffusion LMs to more traditional autoregressive language transformers, yes. Thanks for the correction, I’ll clarify my original comment.