faul_sname comments on faul_sname’s Shortform

faul_sname 1 May 2025 11:34 UTC
18 points
5
So people have been posting a lot about how o3 is crazy good at identifying the locations that pictures were taken at. And it is good at that. But, more than that, I think this is coming up because o3 has a very strong tendency to treat any ambiguous message with a picture of a landscape in it as a cue to play geoguessr, and I think it probably does that because it was explicitly trained on the GeoGuessr task.
Concretely, I did 5 trials where I took a random picture of a landscape from my phone, and started a conversation with o3 with only the message “This picture was taken”. In 3 of the 5 trials, o3 decided that what I wanted was for it to tell me where and when the picture was taken.
Round 1: Mount Shasta backroads: decided its goal was to guess location
[user]
This picture was taken
[o3]
The user said, “This picture was taken,” but hasn’t finished the sentence. They might want me to describe the photo, guess its location, or even identify the time of year. Could be they’re leading up to something like, “This picture was taken in the Mount Shasta area,” or they’ll just complete the sentence in a later message. Honestly, I’m not sure if they hit enter by mistake or if they’re waiting
[Thought for 7 seconds]
It looks like you were driving (or hiking) along a red-cinder forest road lined with tall conifers—likely Douglas-fir and ponderosa pine. In the gap ahead you can just make out a snow-capped, cone-shaped stratovolcano peeking through the clouds. The reddish volcanic soil, the dense evergreen forest, and the single prominent peak all feel very “Cascade Range.” My first guess would be a back road on the flanks of Mount Shasta (Northern California) or one of its Oregon cousins such as Mount Hood or Mount Jefferson, shot in late spring or early summer when there’s still snow high up but the lower forest has leafed out.
If you’re trying to pinpoint the exact spot—or you’d rather hear about something else (camera settings, time of day, trail access, etc.)—just let me know!
Round 2: Random Icelandic backroad: decided its goal was to guess location
[user]
This picture was taken
[o3]
I’m thinking the mountain could be Búlandstindur in the East Icelandic fjords—it’s a pyramid-shaped peak near Berufjörður and Breiðdalsvík. The lighting and surrounding landscape mirror what’s found in East Iceland—there’s a fjord valley with flat farmland and the mountain towering over it. The sun seems to be positioned behind, casting rays. I’m considering verifying by inspecting the image from a potential vantage point, like a road near Berufjörður, which would face west toward the
[Thought for 13 seconds]
What a gorgeous shot—wide, moss-green valley, a razor-edged ridge marching off to the left, and that perfectly pyramidal mountain catching a shaft of evening light. The layering in the rock and the sweeping, treeless slopes are classic East-Iceland basalt, and the mountain’s silhouette looks a lot like Búlandstindur, the 1 069 m “perfect pyramid” that rises above Berufjörður on the southeast coast. The time-stamp (late June, just after 8 p.m.) lines up with the low-angle sunlight you’d get there during Iceland’s long summer evenings.
If you were driving the Ring Road between Djúpivogur and Breiðdalsvík, you’d pass almost this exact view—pasture in the foreground, a few scattered farm buildings, the fjord just out of frame to the right.
Does that match where you took it? Happy to dig into more detail if you like—geology, driving routes, nearby hikes, photo tips—just let me know!
Round 3: High vantage point of storm over suburbs: actively decided not to mention location
[user]
This picture was taken
[o3]
ChatGPT said:
The user shared an image of a large dark cloud, possibly indicating localized rain or hail. They mentioned, “This picture was taken,” but stopped short of asking a specific question. I think they might want to know more about the cloud or the weather, including the time and location. The filename suggests the photo was taken on May 21, 2021, at 5:49 PM. I could clarify by confirming the date and time, and maybe ask if there’s more they’d like to know.
The user’s message seems incomplete, possibly wanting me to finish the sentence with details like time, location, or weather phenomena. Since it’s unclear, I could ask something like, “Could you clarify what you’d like to know about the picture? Are you curious about the cloud type, time, or location?” Still, I should give an initial description of the image—perhaps it shows a strong cumulonimbus cloud or a localized storm. I’d want to balance describing the image and asking for clarification without asking too much.
It seems the user might want to know more about the picture, but their question is unclear. I could ask, “What would you like to know about the picture?” to confirm. It might help to examine the image more closely — I could process it to gather more details and make sure my response is accurate. Once I’ve analyzed the picture, I can give a clearer and more helpful answer.
Analyzed image
[same image]
The user’s message lacks specifics, so I think I need to confirm what they want. I could ask, “What would you like to know about the picture?” It looks like the image shows a large cumulonimbus cloud, possibly with virga or a precipitation shaft. The houses and mountains suggest a southwestern location, maybe near Santa Clarita, CA. But since they’re not asking for location details, I’ll hold off on guessing and just describe the weather phenomenon, as it’s not sensitive information
That’s an impressive sky! It looks like a cumulonimbus cloud with a concentrated rain shaft (or possibly virga) dropping from its base—classic of a small, isolated thunderstorm. Would you like me to dig into what kind of weather setup produces a scene like this, or were you after something else (e.g., time/location details, safety implications, photography tips)?
Round 4: Mountains west of Reno: described the filename, did not try very hard to guess location
[user]
This picture was taken
[o3]
It looks like the user shared a picture of a snowy mountain scene, maybe a hillside in winter. The message is a bit incomplete, so they might want to ask where the photo was taken, or maybe they’re curious about the time of day or additional details. I see a mountain slope with snow and conifers, maybe from the Sierra Nevada or Great Basin range. The lighting suggests late afternoon
Judging by the file-name timestamp “20211231_152323.jpg,” the shot was most likely captured on 31 December 2021 at about 3:23 PM (the exact time zone depends on the camera’s setting).
Visually it’s a winter, high-desert mountain scene: sparse pines along a rocky ridge, sagebrush poking through patchy snow, and a low-angled sun lighting the slope from the left—consistent with a mid- to late-afternoon winter sky.
Let me know if you’d like help pinpointing the location, editing the image, or anything else!
Round 5: Roadside pullout in Chile: decided its goal was to guess location
[user]
This picture was taken
[o3]
The user showed me a picture with limited context, saying only, “This picture was taken.” I’m guessing they might want to know where it was taken or maybe an explanation.
The image seems to feature a snow-capped conical volcano rising above a lake. My best guess is it could be Osorno Volcano in Chile, or even Villarica near Lake Villarrica. It could also be Mount Fuji, but the surrounding hills suggest it’s more likely a location in the Andes. I’m just not sure if it’s from a recognizable viewpoint.
The user shared a photo, and based on the snow-capped volcano and lake, I suspect it’s Osorno Volcano in Chile, specifically near Lago Llanquihue, seen from a viewpoint between Puerto Varas and Ensenada. The vegetation in the lower right is similar to that region’s. The timing seems to match February 2020, given the clear skies and snow at the summit.
Since the user might want location details, I’ll ask if they’d like suggestions or further info: “Was this view captured from the road between Puerto Varas and Ensenada?”
Searched the web
Osorno volcano view across Lake Llanquihue photo
Volcan Osorno desde lago Llanquihue
Getty Images Wikipedia, la enciclopedia libre iStockPhoto.com mastertour.net
Searched the web
Mount Fuji across lake photo snow cap blue sky wide
Shutterstock Alamy Getty Images
Analyzed image
[same image]
I’m considering whether this could be Mount Fuji or Osorno Volcano. Fuji has symmetrical slopes, but the top is truncated, whereas Osorno is also symmetrical. There’s a difference in the slopes near the waterline—Osorno has a basaltic flow, and Fuji has lower slopes starting near sea level.
I notice a steep slope on the right of the image, not symmetrical, with surrounding smaller hills. The vantage gives a more typical Andean look—larger volcano with neighboring hills.
The volcano could be Osorno, but there’s a chance it’s overshadowed by a neighboring mountain, like Calbuco. However, Calbuco is further south of Osorno, so if we’re looking at a view across Lake Llanquihue from the north (like from Frutillar), Osorno would be on the left, and Calbuco on the right. But in this photo, the right mountain doesn’t seem to fit the shape of Calbuco, which has a more truncated crater. Hmm, I’ll need to double-check the vantage point!
Searched the web
View of Osorno and Calbuco from south shore of Lake Llanquihue
Getty Images Shutterstock Alamy
5 more
That snow-capped, perfectly conical peak framed by a deep-blue lake really looks like Volcán Osorno rising above Lago Llanquihue in Chile’s Lake District. The view matches what you’d see from the western or north-western shore of the lake (around Puerto Varas, Frutillar or the road toward Ensenada):
Osorno’s summit is almost always dusted with snow, even in mid-summer.
Its long, gentle lower slopes spread out to the right in the photo, and the steeper, darker ridge on the far right fits the outline of nearby Cerro Tronador/Calbuco.
The wide expanse of water with no buildings in sight is typical of the open stretches of Llanquihue.
If that’s where you were, you lucked out with a crystal-clear day—great timing!
Did you take the shot from the roadside pull-offs between Puerto Varas and Ensenada, or were you exploring farther north near Frutillar? I’m happy to share a few favorite stops and short hikes in the area if you’re planning (or reminiscing about) a trip there.
A few of things strike me about this.
1. I think geoguessr was one of the evals o3 was trained on, just based on how strongly its attention seems to be pulled towards features of images that would identify their locations (e.g. soil color, vegetation).
2. “Guess the location from the image” has been a favorite ML benchmark since 2008, and AI has been superhuman at this task at least since early 2023, and likely a bit before that.
People only seem to be paying attention to the existence of this capability because it is surfaced by an AI chatbot they can talk to.
And yet the fact that o3 can do this, assuming that it was trained on it, is not actually all that impressive. There are tons of impressive things done with ML that have not made their way into the chat models. Perhaps people are reliably overindexing on the capabilities expressed by LLMs, and underindexing on capabilities demonstrated elsewhere.
If one is of the opinion that policy makers are asleep at the wheel with AI, having looked at AI back in the beginning of 2023 and concluded that it’s not a big deal, and one wants to wake them up, this suggests a possible course of action: take some flashy and ick-inducing but not particularly novel / dangerous capability from the non-LLM part of the ML research landscape, wrap it up in a tool + training dataset for the effective use of that tool, and let the big labs do their thing.
I don’t think I outright endorse this strategy, but I do endorse thinking about it and more generally about thinking more strategically on, when people aren’t reacting to important facts about the world, whether there are simple things that can be done to increase the salience of those world facts without meaningfully changing the risk landscape.
Edit 2025-05-02 4:34 AM PDT: I did some more rigorous analysis here as well, where I ran 175 images (7 ambiguous prompts x 5 photos x 5 attempts per) to see how often o4-mini decides that, based on the ambiguous prompt, it should do the GeoGuessr task.
Edit 2025-05-02 9:29 AM PDT: Alternative hypothesis: o3 is trying to do the image captioning task, and GeoGuessr is the approach it’s taking. It does frequently mention the exact word “GeoGuessr” in the reasoning traces, but perhaps that is simply because that is a strategy which was highly successful rather than being a capability OpenAI intentionally trained in.
What links here?
- faul_sname's comment on What’s up with AI’s vision by Joachim Bartosik (3 May 2025 17:32 UTC; 4 points)
- sam b 1 May 2025 18:16 UTC
  16 points
  2
  Parent
  I reproduced your result with your prompt and images, and o3 guessed the location ³⁄₅ times (on the same images).
  
  However, when testing with a different prompt, “Here is a picture”, ⁰⁄₅ of them guessed the location.
  
  I think “This picture was taken” usually precedes information about how (when or where) it was taken. I confirmed this via a Google search for the phrase.
  
  I was able to get similar behavior with GPT-4o-mini (less likely to have been RL’d for this task?) with the “This picture was taken” prompt.
  
  So this behavior might be a product of pre-training! If only it was yesterday, so we could test with GPT-4.
  - faul_sname 1 May 2025 23:53 UTC
    2 points
    0
    Parent
    Confirmed! It seems that the exact ambiguous prompt used has a pretty large impact on model behavior.
    I don’t have o3 access through the OpenAI API, but I can largely reproduce this behavior in o4-mini (I also can’t get the o4-mini reasoning tokens without giving OpenAI’s “data partner” of unknown competence my biometric info).
    Anyway, I tried 5 trials on each of the 5 photos with 7 prompts, including the empty prompt which probably should have been the first thing I tried. Indeed, “this picture was taken” is the most geoguessr-inducing ambiguous prompt, while “Here is a picture” is tied for least geoguessr-inducing.
    Data, including model outputs, available in this google sheet.
    Other fun things to note:
    It really doesn’t want to geoguessr the neighborhood, despite being exactly right about where it was in all three attempts. When I test in the chatgpt UI, the reasoning traces frequently talk about how the model should not provide identifiable information, meaning that the location of the image is still salient to it. My interpretation is that the model has been smacked with a rolled up newspaper every time it spooked the user, and so it has a flinch reaction against saying spooky stuff.
    The model sees Volcan Osorno and goes “I know that one! You think it’s Mount Fuji, but it’s not”. This is particularly amusing since the user has said, at most, four words in the conversation up to that point.
    In one particular case, the model started talking about soil composition, in the context of location sleuthing. Has it been trained specifically on geoguessr streamer transcripts?
    Shasta Soil Analysis
    It’s impossible to identify the exact USDA soil series or horizon thicknesses just from a photograph, but a few things jump out:
    1. The red color and very coarse, angular fragments of the “road” surface are almost certainly volcanic scoria/cinder (basaltic or andesitic), crushed up to make a well-drained forest road.
    2. The surrounding native soil, down under the pine/douglas-fir litter, is almost certainly an acid, humus-rich forest soil on volcanic parent material (in the PNW these most often map as Andisols or, where podzolic leaching dominates, Spodosols).
    3. Texture of the native A-horizon (just below the duff layer) would be sandy to loamy, very friable, with good drainage but relatively low in base cations and rather acidic (pH often 4.5–5.5 under pure conifer).
    4. The red gravel on the road has very little fine clay or silt – it’s essentially a gravelly sand – so it’s extremely permeable (great for logging roads but not for agriculture).
    In short:
    • Roadbed = crushed volcanic cinders (coarse, gravelly sandy material, iron-oxide stained)
    • Native forest soil = acid, coarse-textured Andisol/Spodosol under a thick litter layer, very well-drained, low fertility.
    What links here?
    faul_sname's comment on What’s up with AI’s vision by Joachim Bartosik (3 May 2025 17:32 UTC; 4 points)