dr_s comments on Gemini Diffusion: watch this space

dr_s 25 May 2025 17:47 UTC
1 point
−2
Yeah, the thing they aren’t is transformers.

EDIT: I stand corrected, I tended to think of diffusion models as necessarily a classic series of convolution/neural network layers but obviously that’s just for images and there’s no reason to not use a transformer approach instead, so I realise the two things are decoupled, and what makes a diffusion model is its training objective, not its architecture.
- mattmacdermott 26 May 2025 11:59 UTC
  7 points
  2
  Parent
  You can train transformers as diffusion models (example paper), and that’s presumably what Gemini diffusion is.
  - dr_s 27 May 2025 8:23 UTC
    2 points
    0
    Parent
    Fair, you can use the same architecture just fine instead of simple NNs. It’s really a distinction between what’s your choice of universal function approximator vs what goal you optimise it against I guess.
- the gears to ascension 26 May 2025 15:58 UTC
  4 points
  0
  Parent
  the thing they aren’t is one-step crossentropy. that’s it, everything else is presumably sampled from the same distribution as existing LLMs. (this is like if someone finally upgraded BERT to be a primary model).