Pixel-art anything is derivative of a photorealistic world. If you look at 8-bit art and standard block sizes like Mario in NES Mario, if you were not already intimately familiar with the distribution of human faces, and had to learn starting with a completely blank slate like a GAN would, how would you ever understand what a human face was such that you could imagine correct variations like Princess Peach or Luigi? Or if you wanted to generate Pokemon, which are all based heavily on real animals, how would the model know anything about horses or zebras or hamsters or butterflies and be able to generate a sprite of Butterfree independently? If you look at the Pokemon GAN failure cases closer and compare them to ‘real’ Pokemon, you start to realize to what an extent each Pokemon is derivative of several real-world animals or objects—Pokemon in some ways do not exist in their own right, they are only shorthand or mnemonics of other things. Pikachu is the “electric mouse”: but if you had never seen any electricity iconography like ‘lightning bolts’ or any rodents like hamsters or chinchilla or jerboa, how could you ever understand an image of a ‘Pikachu’ or generate a plausible rodent variation of it? If you could, you’d need a lot more Pikachu training data, that’s for sure. (One is reminded of the joke about the mathematicians telling jokes. They knew all each other’s favorites, you see, so they only needed to call out the number. “#9.” Sensible chuckles. “#81.” Laughter. “#17!” Chortling all around. The new grad student ventures his own joke: “…#112?” Outright hysteria! You see, they had never heard that one before…)
I’m reminded of Gwern’s comments on the difficulty of getting GANs to generate novel pixel art interpolations