You can train transformers as diffusion models (example paper), and that’s presumably what Gemini diffusion is.
Fair, you can use the same architecture just fine instead of simple NNs. It’s really a distinction between what’s your choice of universal function approximator vs what goal you optimise it against I guess.
You can train transformers as diffusion models (example paper), and that’s presumably what Gemini diffusion is.
Fair, you can use the same architecture just fine instead of simple NNs. It’s really a distinction between what’s your choice of universal function approximator vs what goal you optimise it against I guess.