Regarding text, if the problem comes from encoding, does that mean the model does better with individual letters and digits? Eg
“The letter A”
“The letters X Y and Z”
“A 3D rendering of the number 5”
“Number 8”. Huh I think these are almost all street numbers on houses/buildings?
“The letters X Y and Z” ok it’s starting to get confused here.… (My prediction is that it’ll manage the number 8 and number 5 in the next prompts, but if I try a 3-digit number it might flail).