Jozdien comments on Gemini Diffusion: watch this space

Jozdien 21 May 2025 2:29 UTC
18 points
1
The results seem very interesting, but I’m not sure how to interpret them. Comparing the generations videos from this and Mercury, the starting text from each seems very different in terms of resembling the final output:
Unless I’m missing something really obvious about these videos or how diffusion models are trained, I would guess that DeepMind fine-tuned their models on a lot of high-quality synthetic data, enough that their initial generations already match the approximate structure of a model response with CoT. This would partially explain why they seem so impressive even at such a small scale, but would make the scaling laws less comparable to autoregressive models because of how much high-quality synthetic data can help.
What links here?
- Jozdien's comment on peterbarnett’s Shortform by peterbarnett (22 May 2025 21:33 UTC; 2 points)
- wassname 22 May 2025 11:49 UTC
  4 points
  2
  Parent
  True, and then it wouldn’t be an example of the scaling of diffusion models, but the of distillation from a scaled up autoregressive LLM.