On diffusion models + paraphrasing the sequence after each time step, I’m not sure this actually will break the diffusion model. With the current generation of diffusion models (at least the paper you cited, and Mercury, who knows about Gemini), they act basically like Masked LMs.
So they guess all of the masked tokens at each steps. (Some are re-masked to get the “diffusion process”). I bet there’s a sampling strategy in there of sample, paraphrase, arbitrarily remask; rinse and repeat. But I’m not sure either.
On diffusion models + paraphrasing the sequence after each time step, I’m not sure this actually will break the diffusion model. With the current generation of diffusion models (at least the paper you cited, and Mercury, who knows about Gemini), they act basically like Masked LMs.
So they guess all of the masked tokens at each steps. (Some are re-masked to get the “diffusion process”). I bet there’s a sampling strategy in there of sample, paraphrase, arbitrarily remask; rinse and repeat. But I’m not sure either.