brambleboy comments on What’s up with AI’s vision

brambleboy 3 May 2025 17:17 UTC
10 points
3
Probably because the dataset of images + captions scraped from the internet consists of lots of boring photos with locations attributed to them, and not a lot of labeled screenshots of pixel art games with by comparison. This is similar to how LLMs are very good at stylometry, because they have lots of experience making inferences about authors based on patterns in the text.
- brambleboy 3 May 2025 17:27 UTC
  13 points
  5
  Parent
  Another idea: real photos have lots of tiny details to notice regularities in. Pixel art images, on the other hand, can only be interpreted properly by “looking at the big picture”. AI vision is known to be biased towards textures rather than shape, compared to humans.
  - ryan_greenblatt 4 May 2025 4:42 UTC
    4 points
    0
    Parent
    I don’t think it is specific to pixel art, I think it is more about general visual understanding, particularly when you have to figure out downstream consequences from the visual understanding (like “walk to here”).