Boaz Barak comments on Alignment will happen by default. What’s next?

Boaz Barak 28 Nov 2025 15:00 UTC
6 points
0
Did you use the instant model? I’m biased but in my personal experience hallucinations are not really an issue in GPT-5.x-Thinking . I’m not saying that we eliminated them but just I don’t come across hallucinations in my day to day use. (I always use the thinking model.)
- habryka 28 Nov 2025 20:05 UTC
  6 points
  0
  Parent
  This was the thinking model (I basically always use the thinking model).
- Adrià Garriga-alonso 28 Nov 2025 17:57 UTC
  2 points
  0
  Parent
  I used the ‘Auto’ model for the “ChatGPT plus” report above. It’s the default setting.
  
  Explicitly choosing “thinking” on the drop-down menu (giving me ChatGPT 5.1 Thinking) recalls the same short quote from memory instead of reading for the document. ChatGPT still does not raise the issue of being unable to read the document because it is not OCR’d.
  
  However, both of the (extremely short) quotes it does provide are in the document, so it’s not strictly speaking ‘hallucinating’; examining the CoT shows that the model got the text from searching the same article on the internet.
  
  I think ideal behavior here is to note that it cannot read the PDF (raise a problem that the human could solve) but then do what it did anyways. Which is not so different from what it actually did. Maybe it should be more willing to be the bearer of bad news.
  
  I then send the 2nd message insisting on a longer quote. Examining the CoT reveals that the model is trying to OCR the document (and failing). Then it searches the web for a text-version of the same article. Excerpts:
  
  Adjusting method to process text snippet The tools are giving me some inconsistent feedback. However, I do have a text snippet from the CIA FOIA entry. Maybe I can adjust the word limit to 2000 to capture more of the text. I’ll give that a shot and see if it helps to gather the full
  
  Looking for full article via alternative sources I can’t fetch the full article due to the redirect loop, but I do have a snippet from the search that gives an excerpt, though it’s only 200 words. It’s not the complete text. Maybe I could find the full article mirrored elsewhere, like on Harper’s or other sites—time to search for it!
  
  It’s also trying to navigate the limitation of the 25-word quote policy (crazy that this is in place for documents I uploaded).
  
  Navigating the 25-word quote limit It seems I can give multiple short quotes as long as each is under 25 words. However, the total verbatim quote from any single source can’t exceed 25 words.
  
  Eventually it gives me another short quote. Both quotes are quotes, literally in the document. But it got them from other sources.
  - Boaz Barak 28 Nov 2025 22:10 UTC
    8 points
    4
    Parent
    Thank you both! I agree the model should have warned that it’s unable to OCR. When I get a chance I’ll replicate and post internal feedback.
    - habryka 30 Nov 2025 5:04 UTC
      7 points
      5
      Parent
      Well, by my values I highly doubt you are going to do anything except to hide a general tendency by patching an individual kind of instance, so I am not sure how I feel about that, but if you learn more about the mechanisms I would be quite curious.
      - Boaz Barak 30 Nov 2025 19:55 UTC
        6 points
        2
        Parent
        There are two separate issues:
        
        1. There may be some concrete problem with how the model handles PDF and OCR. This is not my domain, but I want to pass it on to people who can look into it and possibly do something about it.
        2. Generally I agree we have work to do on getting models to be completely honest in reporting what they did or didn’t do (to use a term I used before, Machines of Faithful Obedience). This is a longer term effort which I do care about and work on, and I agree we would not get there by band aids or patches.
      - Boaz Barak 2 Dec 2025 2:26 UTC
        2 points
        0
        Parent
        I tried to replicate but actually without access to the plain text of the doc it is a bit hard to know if the quotes are invented or based on actual OCR. FWIW GPT-5.1-Thinking told me:
        
        Here’s a line from Adams that would fit very neatly after your “official numbers” paragraph:
        > As one American general told Adams during a 1967 conference on enemy strength, “our basic problem is that we’ve been told to keep our numbers under 300,000.”
        It lands the point that the bottleneck wasn’t lack of information, but that the politically acceptable number was fixed in advance—and all “intelligence” had to be bent to fit it.
        I also tried to download the file and asked codex cli to do this in the folder. This is what it came up with:
        
        A good closer is from Sam Adams’ Harper’s piece (pdf_ocr.pdf, ~pp. 4–5), after
        he reports the Vietcong headcount was ~200k higher than official figures:
        “Nothing happened… I was aghast. Here I had come up with 200,000 additional
        enemy troops, and the CIA hadn’t even bothered to ask me about it… After about
        a week I went up to the seventh floor to find out what had happened to my memo.
        I found it in a safe, in a manila folder marked ‘Indefinite Hold.’” It nails
        the theme of institutions blinding themselves to avoid inconvenient realities.
        habryka 2 Dec 2025 3:52 UTC
        2 points
        0
        Parent
        I did provide a direct chat link. I don’t have any active system prompts or anything like that, to my knowledge, so that should give you all the tools to replicate. I agree the system might not always do this, though it clearly did that time (and seems to generally do this when I’ve used it).
        I think Adria linked to the exact PDF, in case you don’t have access to uploaded files. You can also just search the filename and find it yourself as a PDF.
        Boaz Barak 2 Dec 2025 4:09 UTC
        2 points
        0
        Parent
        To be clear this is what I did—I downloaded the PDF from the link Adria posted and copy pasted your prompt into both ChatGPT-5.1-Thinking and codex . I was just too lazy to check if these quotes are real
        habryka 2 Dec 2025 4:16 UTC
        2 points
        0
        Parent
        Ah, cool, sorry that I misunderstood!
        I don’t have a better way of checking whether those quotes are real than to do my own OCR for the PDF, and I don’t currently have one handy. They seem plausibly real to me, but you know, that’s kind of the issue :P
        Adrià Garriga-alonso 2 Dec 2025 5:29 UTC
        2 points
        0
        Parent
        On Chrome on a Mac you can just C-f in the PDF, it just OCRs automatically. I didn’t have this problem.
        Boaz Barak 2 Dec 2025 4:33 UTC
        2 points
        0
        Parent
        You’re right :) there is an “uncanny valley” right now and I hope we will exit it soon