Google’s Imagen uses larger text encoder

https://​​imagen.research.google/​​

Scaling the text encoder gives Imagen the ability to spell, count, and assign colors and properties to distinct objects in the image that DALL-E2 was not so great at. It looks visually about as photorealistic as DALL-E2 from the small set of sample images. Eyes are still weird.