I reproduced your result with your prompt and images, and o3 guessed the location 3⁄5 times (on the same images).
However, when testing with a different prompt, “Here is a picture”, 0⁄5 of them guessed the location.
I think “This picture was taken” usually precedes information about how (when or where) it was taken. I confirmed this via a Google search for the phrase.
I was able to get similar behavior with GPT-4o-mini (less likely to have been RL’d for this task?) with the “This picture was taken” prompt.
So this behavior might be a product of pre-training! If only it was yesterday, so we could test with GPT-4.
Confirmed! It seems that the exact ambiguous prompt used has a pretty large impact on model behavior.
I don’t have o3 access through the OpenAI API, but I can largely reproduce this behavior in o4-mini (I also can’t get the o4-mini reasoning tokens without giving OpenAI’s “data partner” of unknown competence my biometric info).
Anyway, I tried 5 trials on each of the 5 photos with 7 prompts, including the empty prompt which probably should have been the first thing I tried. Indeed, “this picture was taken” is the most geoguessr-inducing ambiguous prompt, while “Here is a picture” is tied for least geoguessr-inducing.
It really doesn’t want to geoguessr the neighborhood, despite being exactly right about where it was in all three attempts. When I test in the chatgpt UI, the reasoning traces frequently talk about how the model should not provide identifiable information, meaning that the location of the image is still salient to it. My interpretation is that the model has been smacked with a rolled up newspaper every time it spooked the user, and so it has a flinch reaction against saying spooky stuff.
The model sees Volcan Osorno and goes “I know that one! You think it’s Mount Fuji, but it’s not”. This is particularly amusing since the user has said, at most, four words in the conversation up to that point.
In one particular case, the model started talking about soil composition, in the context of location sleuthing. Has it been trained specifically on geoguessr streamer transcripts?
Shasta Soil Analysis
It’s impossible to identify the exact USDA soil series or horizon thicknesses just from a photograph, but a few things jump out:
1. The red color and very coarse, angular fragments of the “road” surface are almost certainly volcanic scoria/cinder (basaltic or andesitic), crushed up to make a well-drained forest road. 2. The surrounding native soil, down under the pine/douglas-fir litter, is almost certainly an acid, humus-rich forest soil on volcanic parent material (in the PNW these most often map as Andisols or, where podzolic leaching dominates, Spodosols). 3. Texture of the native A-horizon (just below the duff layer) would be sandy to loamy, very friable, with good drainage but relatively low in base cations and rather acidic (pH often 4.5–5.5 under pure conifer). 4. The red gravel on the road has very little fine clay or silt – it’s essentially a gravelly sand – so it’s extremely permeable (great for logging roads but not for agriculture).
In short: • Roadbed = crushed volcanic cinders (coarse, gravelly sandy material, iron-oxide stained) • Native forest soil = acid, coarse-textured Andisol/Spodosol under a thick litter layer, very well-drained, low fertility.
I reproduced your result with your prompt and images, and o3 guessed the location 3⁄5 times (on the same images).
However, when testing with a different prompt, “Here is a picture”, 0⁄5 of them guessed the location.
I think “This picture was taken” usually precedes information about how (when or where) it was taken. I confirmed this via a Google search for the phrase.
I was able to get similar behavior with GPT-4o-mini (less likely to have been RL’d for this task?) with the “This picture was taken” prompt.
So this behavior might be a product of pre-training! If only it was yesterday, so we could test with GPT-4.
Confirmed! It seems that the exact ambiguous prompt used has a pretty large impact on model behavior.
I don’t have o3 access through the OpenAI API, but I can largely reproduce this behavior in o4-mini (I also can’t get the o4-mini reasoning tokens without giving OpenAI’s “data partner” of unknown competence my biometric info).
Anyway, I tried 5 trials on each of the 5 photos with 7 prompts, including the empty prompt which probably should have been the first thing I tried. Indeed, “this picture was taken” is the most geoguessr-inducing ambiguous prompt, while “Here is a picture” is tied for least geoguessr-inducing.
Data, including model outputs, available in this google sheet.
Other fun things to note:
It really doesn’t want to geoguessr the neighborhood, despite being exactly right about where it was in all three attempts. When I test in the chatgpt UI, the reasoning traces frequently talk about how the model should not provide identifiable information, meaning that the location of the image is still salient to it. My interpretation is that the model has been smacked with a rolled up newspaper every time it spooked the user, and so it has a flinch reaction against saying spooky stuff.
The model sees Volcan Osorno and goes “I know that one! You think it’s Mount Fuji, but it’s not”. This is particularly amusing since the user has said, at most, four words in the conversation up to that point.
In one particular case, the model started talking about soil composition, in the context of location sleuthing. Has it been trained specifically on geoguessr streamer transcripts?
Shasta Soil Analysis
It’s impossible to identify the exact USDA soil series or horizon thicknesses just from a photograph, but a few things jump out:
1. The red color and very coarse, angular fragments of the “road” surface are almost certainly volcanic scoria/cinder (basaltic or andesitic), crushed up to make a well-drained forest road.
2. The surrounding native soil, down under the pine/douglas-fir litter, is almost certainly an acid, humus-rich forest soil on volcanic parent material (in the PNW these most often map as Andisols or, where podzolic leaching dominates, Spodosols).
3. Texture of the native A-horizon (just below the duff layer) would be sandy to loamy, very friable, with good drainage but relatively low in base cations and rather acidic (pH often 4.5–5.5 under pure conifer).
4. The red gravel on the road has very little fine clay or silt – it’s essentially a gravelly sand – so it’s extremely permeable (great for logging roads but not for agriculture).
In short:
• Roadbed = crushed volcanic cinders (coarse, gravelly sandy material, iron-oxide stained)
• Native forest soil = acid, coarse-textured Andisol/Spodosol under a thick litter layer, very well-drained, low fertility.