I see, interesting, thank you! One more question though, my comment mentioned both text and images, with “uncanny averageness in details” applying to both. If you say there’s a way to (mostly) avoid that for text by using base models instead of chat-tuned ones, what would be the analogous fix for images?
Yeah, openai/guided-diffusion is basically that. Here’s an example colab which uses CLIP guidance to sample openai/guided-diffusion (not mine, but I did just verify that the notebook still runs)
I see, interesting, thank you! One more question though, my comment mentioned both text and images, with “uncanny averageness in details” applying to both. If you say there’s a way to (mostly) avoid that for text by using base models instead of chat-tuned ones, what would be the analogous fix for images?
Yeah, openai/guided-diffusion is basically that. Here’s an example colab which uses CLIP guidance to sample openai/guided-diffusion (not mine, but I did just verify that the notebook still runs)