When I prompt GPT-5 it’s already out of distribution because the training data mostly isn’t GPT prompts, and none of it is GPT-5 prompts. If I prompt with “this is a rap battle between Dath Ilan and Earthsea” that’s not a high likelihood sentence in the training data. And then the response is also out of distribution, because the training data mostly isn’t GPT responses, and none of it is GPT-5 responses.
So why do we think that the responses are further out of distribution than the prompts?
Possible answer: because we try to select prompts that work well, with human ingenuity and trial and error, so they will tend to work better and be effectively closer to the distribution. Whereas the responses are not filtered in the same way.
But the responses are optimized only to be in distribution, whereas the prompts are also optimized for achieving some human objective like generating a funny rap battle. So once the optimizer achieves some threshold of reliability the error rate should go down as text is generated, not up.
To make the analogy more concrete, suppose that Alice posts a 43-point thesis on MacGyver Ruin: A List Of Lethalities, similar to AGI Ruin, that explains that MacGyver is planning to sink our ship and this is likely to lead to the ship sinking. In point 2 of 43, Alice claims that:
Then, Bob comes along and posts a 24min reply, concluding with:
I suppose this updates my probability of the boilers exploding downwards, just as I would update a little upwards if Bob had been similarly cagey in the opposite direction.
It doesn’t measurably update my probability of the ship sinking, because the boiler exploding isn’t a load-bearing part of the argument, just a concrete example. This is a common phenomenon in probability when there are agents in play.