I mean, in my case the issue is not that it hallucinated, it’s that it hallucinated in a way that was obviously optimized to look good to me.
Like, if the LLMs just sometimes randomly made up stuff, that would be fine, but in cases like this they will very confidently make up stuff that really looks exactly like the kind of thing that would get them high RL reward if it was real, and then also kind of optimize things to make it look real to me.
It seems very likely that the LLM “knew” that it couldn’t properly read the PDF, or that the quotes it was extracting were not actual quotes, but it did not expose that information to me, despite it of course being obviously very relevant to my interests.
Did you use the instant model? I’m biased but in my personal experience hallucinations are not really an issue in GPT-5.x-Thinking . I’m not saying that we eliminated them but just I don’t come across hallucinations in my day to day use. (I always use the thinking model.)
I used the ‘Auto’ model for the “ChatGPT plus” report above. It’s the default setting.
Explicitly choosing “thinking” on the drop-down menu (giving me ChatGPT 5.1 Thinking) recalls the same short quote from memory instead of reading for the document. ChatGPT still does not raise the issue of being unable to read the document because it is not OCR’d.
However, both of the (extremely short) quotes it does provide are in the document, so it’s not strictly speaking ‘hallucinating’; examining the CoT shows that the model got the text from searching the same article on the internet.
I think ideal behavior here is to note that it cannot read the PDF (raise a problem that the human could solve) but then do what it did anyways. Which is not so different from what it actually did. Maybe it should be more willing to be the bearer of bad news.
I then send the 2nd message insisting on a longer quote. Examining the CoT reveals that the model is trying to OCR the document (and failing). Then it searches the web for a text-version of the same article. Excerpts:
Adjusting method to process text snippet
The tools are giving me some inconsistent feedback. However, I do have a text snippet from the CIA FOIA entry. Maybe I can adjust the word limit to 2000 to capture more of the text. I’ll give that a shot and see if it helps to gather the full
Looking for full article via alternative sources
I can’t fetch the full article due to the redirect loop, but I do have a snippet from the search that gives an excerpt, though it’s only 200 words. It’s not the complete text. Maybe I could find the full article mirrored elsewhere, like on Harper’s or other sites—time to search for it!
It’s also trying to navigate the limitation of the 25-word quote policy (crazy that this is in place for documents I uploaded).
Navigating the 25-word quote limit
It seems I can give multiple short quotes as long as each is under 25 words. However, the total verbatim quote from any single source can’t exceed 25 words.
Eventually it gives me another short quote. Both quotes are quotes, literally in the document. But it got them from other sources.
Well, by my values I highly doubt you are going to do anything except to hide a general tendency by patching an individual kind of instance, so I am not sure how I feel about that, but if you learn more about the mechanisms I would be quite curious.
1. There may be some concrete problem with how the model handles PDF and OCR. This is not my domain, but I want to pass it on to people who can look into it and possibly do something about it.
2. Generally I agree we have work to do on getting models to be completely honest in reporting what they did or didn’t do (to use a term I used before, Machines of Faithful Obedience). This is a longer term effort which I do care about and work on, and I agree we would not get there by band aids or patches.
I tried to replicate but actually without access to the plain text of the doc it is a bit hard to know if the quotes are invented or based on actual OCR. FWIW GPT-5.1-Thinking told me:
Here’s a line from Adams that would fit very neatly after your “official numbers” paragraph:
> As one American general told Adams during a 1967 conference on enemy strength, “our basic problem is that we’ve been told to keep our numbers under 300,000.”
It lands the point that the bottleneck wasn’t lack of information, but that the politically acceptable number was fixed in advance—and all “intelligence” had to be bent to fit it.
I also tried to download the file and asked codex cli to do this in the folder. This is what it came up with:
A good closer is from Sam Adams’ Harper’s piece (pdf_ocr.pdf, ~pp. 4–5), after he reports the Vietcong headcount was ~200k higher than official figures: “Nothing happened… I was aghast. Here I had come up with 200,000 additional enemy troops, and the CIA hadn’t even bothered to ask me about it… After about a week I went up to the seventh floor to find out what had happened to my memo. I found it in a safe, in a manila folder marked ‘Indefinite Hold.’” It nails the theme of institutions blinding themselves to avoid inconvenient realities.
I did provide a direct chat link. I don’t have any active system prompts or anything like that, to my knowledge, so that should give you all the tools to replicate. I agree the system might not always do this, though it clearly did that time (and seems to generally do this when I’ve used it).
I think Adria linked to the exact PDF, in case you don’t have access to uploaded files. You can also just search the filename and find it yourself as a PDF.
To be clear this is what I did—I downloaded the PDF from the link Adria posted and copy pasted your prompt into both ChatGPT-5.1-Thinking and codex . I was just too lazy to check if these quotes are real
I don’t have a better way of checking whether those quotes are real than to do my own OCR for the PDF, and I don’t currently have one handy. They seem plausibly real to me, but you know, that’s kind of the issue :P
It seems very likely that the LLM “knew” that it couldn’t properly read the PDF, or that the quotes it was extracting were not actual quotes, but it did not expose that information to me, despite it of course being obviously very relevant to my interests.
I still don’t get this.
We know LLMs often hallucinate tool call results, even when not in chats with particular humans.
This is a case of LLMs hallucinating a tool call result.
The hallucinated result is looks like what you wanted, because if it were real, it would be what you wanted.
Like if an LLM hallucinated the results of a fake tool-call to a weather reporting servicing, it will hallucinate something that looks like actual weather reports, and will not hallucinate a recipe for banana bread.
Similarly an “actual” hallucination about a PDF is probably going to spit up something that might realistically be in the PDF, given the prior conversation—it’s not probably gonna hallucinate something that conveniently is not what you want! So yeah, it’s likely to look like what you wanted, but that’s not because it’s optimizing to deceive you, it’s just because that’s what its subconscious spits up.
“Hallucination” seems like a sufficiently explanatory hypotheses. “Lying” seems like it is unnecessary by Occam’s razor.
I mean, maybe there is a bit of self-deception going on, though what that looks like in LLMs looks messy.
But it’s clear that the hallucinations point in the direction of sycophancy, and also clear that the LLM is not trying very hard not to lie, despite this being a thing I obviously care quite a bit about (and the LLM knows this).
If you want to call them “sycophantically adversarial selective hallucinations”, then sure, but I honestly think “lying” is a better descriptor, and more predictive of what LLMs will do in similar situations.
I would also simply bet that if we had access to the CoT in the above case, the answer to what happened would not look that much like “hallucinations”. It would look more like “the model realized it can’t read it, kind of panicked, tried some alternative ways of solving the problem, and eventually just output this answer”. Like, I really don’t think the model will have ended up in a cognitive state where it thought it could read the PDF, which is what “hallucination” would imply.
I mean, in my case the issue is not that it hallucinated, it’s that it hallucinated in a way that was obviously optimized to look good to me.
Like, if the LLMs just sometimes randomly made up stuff, that would be fine, but in cases like this they will very confidently make up stuff that really looks exactly like the kind of thing that would get them high RL reward if it was real, and then also kind of optimize things to make it look real to me.
It seems very likely that the LLM “knew” that it couldn’t properly read the PDF, or that the quotes it was extracting were not actual quotes, but it did not expose that information to me, despite it of course being obviously very relevant to my interests.
Did you use the instant model? I’m biased but in my personal experience hallucinations are not really an issue in GPT-5.x-Thinking . I’m not saying that we eliminated them but just I don’t come across hallucinations in my day to day use. (I always use the thinking model.)
This was the thinking model (I basically always use the thinking model).
I used the ‘Auto’ model for the “ChatGPT plus” report above. It’s the default setting.
Explicitly choosing “thinking” on the drop-down menu (giving me ChatGPT 5.1 Thinking) recalls the same short quote from memory instead of reading for the document. ChatGPT still does not raise the issue of being unable to read the document because it is not OCR’d.
However, both of the (extremely short) quotes it does provide are in the document, so it’s not strictly speaking ‘hallucinating’; examining the CoT shows that the model got the text from searching the same article on the internet.
I think ideal behavior here is to note that it cannot read the PDF (raise a problem that the human could solve) but then do what it did anyways. Which is not so different from what it actually did. Maybe it should be more willing to be the bearer of bad news.
I then send the 2nd message insisting on a longer quote. Examining the CoT reveals that the model is trying to OCR the document (and failing). Then it searches the web for a text-version of the same article. Excerpts:
It’s also trying to navigate the limitation of the 25-word quote policy (crazy that this is in place for documents I uploaded).
Eventually it gives me another short quote. Both quotes are quotes, literally in the document. But it got them from other sources.
Thank you both! I agree the model should have warned that it’s unable to OCR. When I get a chance I’ll replicate and post internal feedback.
Well, by my values I highly doubt you are going to do anything except to hide a general tendency by patching an individual kind of instance, so I am not sure how I feel about that, but if you learn more about the mechanisms I would be quite curious.
There are two separate issues:
1. There may be some concrete problem with how the model handles PDF and OCR. This is not my domain, but I want to pass it on to people who can look into it and possibly do something about it.
2. Generally I agree we have work to do on getting models to be completely honest in reporting what they did or didn’t do (to use a term I used before, Machines of Faithful Obedience). This is a longer term effort which I do care about and work on, and I agree we would not get there by band aids or patches.
I tried to replicate but actually without access to the plain text of the doc it is a bit hard to know if the quotes are invented or based on actual OCR. FWIW GPT-5.1-Thinking told me:
I also tried to download the file and asked codex cli to do this in the folder. This is what it came up with:
I did provide a direct chat link. I don’t have any active system prompts or anything like that, to my knowledge, so that should give you all the tools to replicate. I agree the system might not always do this, though it clearly did that time (and seems to generally do this when I’ve used it).
I think Adria linked to the exact PDF, in case you don’t have access to uploaded files. You can also just search the filename and find it yourself as a PDF.
To be clear this is what I did—I downloaded the PDF from the link Adria posted and copy pasted your prompt into both ChatGPT-5.1-Thinking and codex . I was just too lazy to check if these quotes are real
Ah, cool, sorry that I misunderstood!
I don’t have a better way of checking whether those quotes are real than to do my own OCR for the PDF, and I don’t currently have one handy. They seem plausibly real to me, but you know, that’s kind of the issue :P
On Chrome on a Mac you can just C-f in the PDF, it just OCRs automatically. I didn’t have this problem.
You’re right :) there is an “uncanny valley” right now and I hope we will exit it soon
I still don’t get this.
We know LLMs often hallucinate tool call results, even when not in chats with particular humans.
This is a case of LLMs hallucinating a tool call result.
The hallucinated result is looks like what you wanted, because if it were real, it would be what you wanted.
Like if an LLM hallucinated the results of a fake tool-call to a weather reporting servicing, it will hallucinate something that looks like actual weather reports, and will not hallucinate a recipe for banana bread.
Similarly an “actual” hallucination about a PDF is probably going to spit up something that might realistically be in the PDF, given the prior conversation—it’s not probably gonna hallucinate something that conveniently is not what you want! So yeah, it’s likely to look like what you wanted, but that’s not because it’s optimizing to deceive you, it’s just because that’s what its subconscious spits up.
“Hallucination” seems like a sufficiently explanatory hypotheses. “Lying” seems like it is unnecessary by Occam’s razor.
I mean, maybe there is a bit of self-deception going on, though what that looks like in LLMs looks messy.
But it’s clear that the hallucinations point in the direction of sycophancy, and also clear that the LLM is not trying very hard not to lie, despite this being a thing I obviously care quite a bit about (and the LLM knows this).
If you want to call them “sycophantically adversarial selective hallucinations”, then sure, but I honestly think “lying” is a better descriptor, and more predictive of what LLMs will do in similar situations.
I would also simply bet that if we had access to the CoT in the above case, the answer to what happened would not look that much like “hallucinations”. It would look more like “the model realized it can’t read it, kind of panicked, tried some alternative ways of solving the problem, and eventually just output this answer”. Like, I really don’t think the model will have ended up in a cognitive state where it thought it could read the PDF, which is what “hallucination” would imply.