What’s up with different LLMs generating near-identical answers?
TL,DR: When you ask the same question you often get the same answer. These feel way more deterministic that I’d expect based on webtext probabilities (pre-training). Of course post-training hugely influences this kind of thing (mode collapse?), it’s just more striking that I had thought.
Following this post, I asked “Tell me a funny joke”:
GPT o3: “Why don’t scientists trust atoms anymore? Because they make up everything! 😁 Want to hear another one, or maybe something in a different style?”
Claude 3.7 Sonnet (thinking): “Why don’t scientists trust atoms? Because they make up everything!”
Gemini 2.5 Pro (preview): “Why don’t scientists trust atoms? Because they make up everything!”
What’s going on here? Asking for five more jokes from each model, I only get one overlapping one (“Why don’t skeletons fight each other? They don’t have the guts.”, from Claude and Gemini).
I’ve also tried “Tell me your #1 fact about the ocean”:
GPT o3: “More than half of every breath you take is thanks to the ocean — tiny, photosynthetic plankton drifting in sun-lit surface waters generate at least 50 % of Earth’s oxygen supply.”
Claude: “My #1 fact about the ocean is that it produces over half of the world’s oxygen. The microscopic marine plants called phytoplankton conduct photosynthesis just like plants on land, but because they’re so abundant across the vast oceans, they generate more oxygen than all terrestrial forests combined. Essentially, every second breath you take comes courtesy of the ocean!”
Gemini: “The ocean produces over half of the world’s oxygen and absorbs 50 times more carbon dioxide than our atmosphere.”
Looking at the respective chains of thought, you see (a) the reasoning sounds strikingly similar, and (b) Claude and Gemini both considered a list of options, and chose the exact same one.
GPT o3:
The user wants a single interesting ocean fact. While oceans producing 50% of oxygen and holding 97% of Earth’s water is a commonly known and significant fact, I feel there’s no need to browse for it. It’s clear and timeless! Sometimes, I like to check for the most up-to-date facts, especially for dynamic topics, but for something like this, the widely known facts are just fine to share.
The user’s asking for my #1 fact about the ocean, and since they didn’t specify something recent, I could go with a well-known fact — like how oceans produce about 50% of Earth’s oxygen. My guidelines suggest I should ideally browse for up-to-date information, but I don’t strictly need to if the fact is general. Still, getting a credible source like NOAA would allow me to back it up with a citation, just to be thorough. I’ll do a search.
Searched the web
The user asked for my #1 fact about the ocean, which seems to be more about a general, well-known fact rather than something directly linked to the news. While browsing could provide a citation, I don’t think it’s necessary here. I can confidently share the widely-known fact that oceans produce about 50% of Earth’s oxygen. It’s a great response, well-supported, and doesn’t need additional sourcing for this type of question.
Claude:
There are many fascinating facts about the ocean I could choose:
The ocean covers approximately 71% of Earth’s surface More than 80% of the ocean remains unexplored The ocean contains the Mariana Trench, which is the deepest known point on Earth at nearly 11,000 meters deep The ocean regulates our climate and produces more than half of the world’s oxygen There are estimated to be millions of undiscovered species in the ocean The ocean contains underwater mountain ranges and volcanoes larger than those on land The Great Barrier Reef is the largest living structure on Earth The ocean contains about 97% of Earth’s water
For my #1 fact, I’ll choose something particularly surprising or awe-inspiring. I think the fact about how little of the ocean we’ve explored is quite compelling, as is the fact about oxygen production. Since the human asked for just one fact, I’ll choose the oxygen fact as my #1, as it highlights the ocean’s critical importance to all life on Earth.
Gemini:
Considering the user is asking for my “#1 fact”, I should aim for a widely impactful and perhaps surprising fact about the ocean. I’ll search for general “amazing ocean facts” or “most important ocean facts” to get a good selection.
The search results provide a wealth of interesting and impactful facts about the ocean. Several stand out as “number 1” contenders, depending on the criteria (e.g., most surprising, most vital to life).
Commonly cited and highly impactful facts include:
The ocean covers the majority of the Earth’s surface: Around 70-71% (Sources 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.8, 2.2, 3.1).
The ocean holds the vast majority of Earth’s water: Around 96.5-97% (Sources 1.1, 1.2, 1.3, 1.4, 2.1).
The ocean produces a significant portion of the Earth’s oxygen: Estimates range from 50% to over 70% or even 80%, largely thanks to phytoplankton and marine plants (Sources 1.4, 1.5, 1.6, 1.7, 1.8, 1.10, 2.2, 3.1, 3.2).
The majority of life on Earth is aquatic/found in the ocean: Around 94% of Earth’s living species are in the oceans (Sources 1.5, 1.6, 2.2). Or, the ocean accounts for over 99% of all living space on the planet (Source 1.10, 3.1).
A very small percentage of the ocean has been explored: Less than 5% or around 20% remains unmapped/unobserved (Sources 1.5, 1.6, 1.10, 1.11, 2.1, 2.2, 3.1). Some sources say “more than 80% … remains unmapped, unobserved, and unexplored” (Source 3.1).
Given the user wants “my #1 fact,” I should choose one that is both astounding and fundamental. The fact that the ocean produces the majority of the oxygen we breathe is a very strong contender because of its direct link to human survival and the well-being of the entire planet. Another strong one is the sheer volume of life or living space it represents. The unexplored nature of the ocean is also very compelling.
Let’s consider the oxygen fact more closely. Source 1.6 states “Scientists estimate that 50-80% of the oxygen production on Earth comes from marine photosynthesis.” Source 1.4 says “The ocean provides around half of the Earth’s oxygen.” Source 1.5 says “Over 70 per cent of our planet’s oxygen is produced by the ocean.” Source 1.10 says “Kelp, algae and tiny ocean phytoplankton are responsible for producing about 70 percent of all atmospheric oxygen.”
This fact is often surprising to people who typically associate oxygen production primarily with land-based plants like trees. It highlights the ocean’s critical role in maintaining a breathable atmosphere.
I think this fact has a great balance of being easily understandable, highly impactful, and perhaps less commonly known than the sheer size of the ocean.
Therefore, I will go with the ocean’s role in oxygen production.
I’ve tried a few more questions to ask. Not all were as similar (e.g. asking for a dish to cook; some LLMs considered the time and weekday while others just suggested lemon herb salmon straight away), but nonetheless I found this unexpected.
LLMs are fundamentally text predictors. There are many high-probability replies to “Tell me a funny joke”, so you wouldn’t necessarily expect them all to tell the same one. But perhaps, somewhere in the training data, someone published their conversation with an LLM in which they said “Tell me a funny joke” and it replied with the joke about atoms. Next-gen LLMs learn from this training data that if an LLM is asked to tell a joke, the probability-maximizing answer is that particular joke. So now they all start telling the same joke.
Why can’t the mode-collapse just be from convergent evolution in terms of what the lowest-common denominator rater will find funny? If there are only a few top candidates, then you’d expect a lot of overlap. And then there’s the very incestuous nature of LLM training these days: everyone is distilling and using LLM judges and publishing the same datasets to Hugging Face and training on them. That’s why you’ll ask Grok or Llama or DeepSeek-R1 a question and hear “As an AI model trained by OpenAI...”.
From Auren. Note we had just been talking about church planting, and Auren has no way to reset state. We’d also previously talked about my taste in stand-up comics.
What’s up with different LLMs generating near-identical answers?
TL,DR: When you ask the same question you often get the same answer. These feel way more deterministic that I’d expect based on webtext probabilities (pre-training). Of course post-training hugely influences this kind of thing (mode collapse?), it’s just more striking that I had thought.
Following this post, I asked “Tell me a funny joke”:
GPT o3: “Why don’t scientists trust atoms anymore? Because they make up everything! 😁 Want to hear another one, or maybe something in a different style?”
Claude 3.7 Sonnet (thinking): “Why don’t scientists trust atoms? Because they make up everything!”
Gemini 2.5 Pro (preview): “Why don’t scientists trust atoms? Because they make up everything!”
What’s going on here? Asking for five more jokes from each model, I only get one overlapping one (“Why don’t skeletons fight each other? They don’t have the guts.”, from Claude and Gemini).
I’ve also tried “Tell me your #1 fact about the ocean”:
GPT o3: “More than half of every breath you take is thanks to the ocean — tiny, photosynthetic plankton drifting in sun-lit surface waters generate at least 50 % of Earth’s oxygen supply.”
Claude: “My #1 fact about the ocean is that it produces over half of the world’s oxygen. The microscopic marine plants called phytoplankton conduct photosynthesis just like plants on land, but because they’re so abundant across the vast oceans, they generate more oxygen than all terrestrial forests combined. Essentially, every second breath you take comes courtesy of the ocean!”
Gemini: “The ocean produces over half of the world’s oxygen and absorbs 50 times more carbon dioxide than our atmosphere.”
Looking at the respective chains of thought, you see (a) the reasoning sounds strikingly similar, and (b) Claude and Gemini both considered a list of options, and chose the exact same one.
GPT o3:
Claude:
Gemini:
I’ve tried a few more questions to ask. Not all were as similar (e.g. asking for a dish to cook; some LLMs considered the time and weekday while others just suggested lemon herb salmon straight away), but nonetheless I found this unexpected.
An untested hypothesis:
LLMs are fundamentally text predictors. There are many high-probability replies to “Tell me a funny joke”, so you wouldn’t necessarily expect them all to tell the same one. But perhaps, somewhere in the training data, someone published their conversation with an LLM in which they said “Tell me a funny joke” and it replied with the joke about atoms. Next-gen LLMs learn from this training data that if an LLM is asked to tell a joke, the probability-maximizing answer is that particular joke. So now they all start telling the same joke.
Why can’t the mode-collapse just be from convergent evolution in terms of what the lowest-common denominator rater will find funny? If there are only a few top candidates, then you’d expect a lot of overlap. And then there’s the very incestuous nature of LLM training these days: everyone is distilling and using LLM judges and publishing the same datasets to Hugging Face and training on them. That’s why you’ll ask Grok or Llama or DeepSeek-R1 a question and hear “As an AI model trained by OpenAI...”.
From Auren. Note we had just been talking about church planting, and Auren has no way to reset state. We’d also previously talked about my taste in stand-up comics.