I wouldn’t necessarily expect this to be what’s going on, but just to check… are approximately-all the geoguessr images people try drawn from a single dataset on which the models might plausibly have been trained? Like, say, all the streetview images from google maps?
Apparently no. Scott wrote he used one image from Google maps, and 4 personal images that are not available online.
People tried with personal photos too.
I tried with personal photos (screenshotted from Google photos) and it worked pretty well too :
Identified neighborhood in Lisbon where a picture was taken
Identified another picture as taken in Paris
Another one identified as taken in a big polish city, the correct answer was among 4 candidates it listed
I didn’t use a long prompt like the one Scott copies in his post, just short „You’re in GeoGuesser, where was this picture taken” or something like that
i have used tons of personal photos w/ kelsey’s prompt, it has been extremely successful (>75% + never get’s it wrong if one of my friends can guess it too), I’m confident none of these photos are on the internet and most aren’t even that similar to existing photos. Creepily enough it’s not half bad at figuring out where people are indoors as well (not as good, but like it got the neighborhood in Budapest I was in from a photo of a single room, with some items on a table).
Nope, although it is does have a much higher propensity to exhibit GeoGuessr behavior on pictures on or next to a road when given ambiguous prompts (initial post, slightly more rigorous analysis).
I think it’s possible (25%) that o3 was explicitly trained on exactly the GeoGuessr task, but more likely (40%) that it was trained on e.g. minimizing perplexity on image captions, and that knowing the exact location of the image is useful for that, and it managed to evoke the “GeoGuessr” behavior in its reasoning chain once and that behavior was strongly reinforced and now it does it whenever it could plausibly be helpful.
I wouldn’t necessarily expect this to be what’s going on, but just to check… are approximately-all the geoguessr images people try drawn from a single dataset on which the models might plausibly have been trained? Like, say, all the streetview images from google maps?
Apparently no. Scott wrote he used one image from Google maps, and 4 personal images that are not available online.
People tried with personal photos too.
I tried with personal photos (screenshotted from Google photos) and it worked pretty well too :
Identified neighborhood in Lisbon where a picture was taken
Identified another picture as taken in Paris
Another one identified as taken in a big polish city, the correct answer was among 4 candidates it listed
I didn’t use a long prompt like the one Scott copies in his post, just short „You’re in GeoGuesser, where was this picture taken” or something like that
i have used tons of personal photos w/ kelsey’s prompt, it has been extremely successful (>75% + never get’s it wrong if one of my friends can guess it too), I’m confident none of these photos are on the internet and most aren’t even that similar to existing photos. Creepily enough it’s not half bad at figuring out where people are indoors as well (not as good, but like it got the neighborhood in Budapest I was in from a photo of a single room, with some items on a table).
Nope, although it is does have a much higher propensity to exhibit GeoGuessr behavior on pictures on or next to a road when given ambiguous prompts (initial post, slightly more rigorous analysis).
I think it’s possible (25%) that o3 was explicitly trained on exactly the GeoGuessr task, but more likely (40%) that it was trained on e.g. minimizing perplexity on image captions, and that knowing the exact location of the image is useful for that, and it managed to evoke the “GeoGuessr” behavior in its reasoning chain once and that behavior was strongly reinforced and now it does it whenever it could plausibly be helpful.
My understanding is its not approximately all, it is literally all the images in geoguessr.