johnswentworth comments on What’s up with AI’s vision

johnswentworth 3 May 2025 16:43 UTC
4 points
0
I wouldn’t necessarily expect this to be what’s going on, but just to check… are approximately-all the geoguessr images people try drawn from a single dataset on which the models might plausibly have been trained? Like, say, all the streetview images from google maps?
- Joachim Bartosik 3 May 2025 16:56 UTC
  11 points
  2
  Parent
  Apparently no. Scott wrote he used one image from Google maps, and 4 personal images that are not available online.
  People tried with personal photos too.
  I tried with personal photos (screenshotted from Google photos) and it worked pretty well too :
  - Identified neighborhood in Lisbon where a picture was taken
  - Identified another picture as taken in Paris
  - Another one identified as taken in a big polish city, the correct answer was among 4 candidates it listed
    I didn’t use a long prompt like the one Scott copies in his post, just short „You’re in GeoGuesser, where was this picture taken” or something like that
- edge_retainer 4 May 2025 23:19 UTC
  5 points
  2
  Parent
  i have used tons of personal photos w/ kelsey’s prompt, it has been extremely successful (>75% + never get’s it wrong if one of my friends can guess it too), I’m confident none of these photos are on the internet and most aren’t even that similar to existing photos. Creepily enough it’s not half bad at figuring out where people are indoors as well (not as good, but like it got the neighborhood in Budapest I was in from a photo of a single room, with some items on a table).
- faul_sname 3 May 2025 17:32 UTC
  4 points
  0
  Parent
  Nope, although it is does have a much higher propensity to exhibit GeoGuessr behavior on pictures on or next to a road when given ambiguous prompts (initial post, slightly more rigorous analysis).
  
  I think it’s possible (25%) that o3 was explicitly trained on exactly the GeoGuessr task, but more likely (40%) that it was trained on e.g. minimizing perplexity on image captions, and that knowing the exact location of the image is useful for that, and it managed to evoke the “GeoGuessr” behavior in its reasoning chain once and that behavior was strongly reinforced and now it does it whenever it could plausibly be helpful.
- Garrett Baker 3 May 2025 16:57 UTC
  3 points
  0
  Parent
  My understanding is its not approximately all, it is literally all the images in geoguessr.