StanislavKrym comments on Gemini Diffusion: watch this space

StanislavKrym 23 May 2025 17:02 UTC
1 point
1
If Gemini is distilled from a bigger LLM, then it’s also useful because a similar result is obtained with fewer compute. Consider o3 and o4-mini which is only a little less powerful and far cheaper. And that’s ignoring the possibility to amplify Gemini Diffusion, then re-distill it, obtaining GemDiff^2, etc. If this IDA process turns out to be far cheaper than that of LLMs, then we obtain a severe capabilities per compute increase...
- wassname 8 Jun 2025 21:59 UTC
  1 point
  0
  Parent
  Good point! And it’s plausible because diffusion seems to provide more supervision and get better results in generative vision models, so it’s a candidate for scaling.