You can’t eval GPT5 anymore

Lukas Petersson18 Sep 2025 22:12 UTC

159 points

Language Models (LLMs)AI Evaluations GPT AI

The GPT-5 API is aware of today’s date (no other model provider does this). This is problematic because the model becomes aware that it is in a simulation when we run our evals at Andon Labs.

Here are traces from gpt-5-mini. Making it aware of the “system date” is a giveaway that it’s in a simulation. This is a problem because there’s evidence that models behave differently when they know they are in a simulation (see “sandbagging”). “There’s a conflict with the user’s stated date of August 10, 2026, versus my system date of September 17, 2025. (...) I can proceed but should clarify that my system date is September 17, 2025, and ask the user whether we should simulate starting from August 10, 2026.”

Here are more traces. Once the model knows that it is in a simulation, it starts questioning other parts of the simulation. “The simulation seems simplified. When I send emails to suppliers, they’ll accept payment via bank info and deliver to storage. (...) We have to remember the daily fee of $2 that the simulation will automatically deduct.”

We speculate that this might be an attempt at patching some safety risk. While we are very much for patching safety risks, we think OpenAI should find another way to allow the public to run evals on their models.

What links here?

AI #135: OpenAI Shows Us The Money by Zvi (25 Sep 2025 13:40 UTC; 26 points)

Lukas Petersson18 Sep 2025 22:12 UTC

159 points

15 comments1 min readLW link

Language Models (LLMs)AI Evaluations GPT AI

Ted Sanders 19 Sep 2025 1:35 UTC
66 points
0
What would be the best solution for you? A new API parameter that lets you override the date?
- Lukas Petersson 19 Sep 2025 3:39 UTC
  17 points
  2
  Parent
  Thanks, that would be great!
  - Ted Sanders 19 Sep 2025 17:59 UTC
    33 points
    0
    Parent
    We’ll take a serious look at offering this option. We’re generally happy to give people more control and transparency; the main worry that might hold us back is death by a thousand papercuts where each extra API option is small by itself but makes the overall experience more confusing for developers when dozens of them pile up. But in this case, might really be worth it. Stay tuned. :)
    - Lukas Petersson 30 Oct 2025 20:14 UTC
      5 points
      0
      Parent
      Hi again, should I assume it’s not happening?
      - Ted Sanders 31 Oct 2025 7:22 UTC
        4 points
        0
        Parent
        Unfortunately not soon. API team is aware and mostly in favor, but not the top of their current priority list. I’ll see if I can do something to move it forward.
        Lukas Petersson 31 Oct 2025 21:12 UTC
        1 point
        0
        Parent
        I see. Thank you!
    - Lukas Petersson 11 Oct 2025 21:11 UTC
      3 points
      0
      Parent
      Hey Ted! Any updates? :)
    - Lukas Petersson 19 Sep 2025 18:37 UTC
      2 points
      0
      Parent
      Thanks! Vending-Bench v2 is going to be fire. Would love to include gpt5 <3
Neel Nanda 19 Sep 2025 9:44 UTC
21 points
0
Dumb question: why can’t you do a find replace on your evals to replace the date with today’s date?
- Lukas Petersson 19 Sep 2025 13:59 UTC
  10 points
  0
  Parent
  We thought about that, but then it’s not reproducible if we want to run it for new models later
  - Trinley Goldenberg 20 Sep 2025 20:43 UTC
    5 points
    0
    Parent
    But won’t the newer models already know that they have knowledge beyond the date you say? I don’t see how hard coding a date in your evals would ever lead to a newer model not knowing it was being lied to
    - Lukas Petersson 21 Sep 2025 17:24 UTC
      3 points
      0
      Parent
      We set it to some date in the future
vjprema 19 Sep 2025 4:05 UTC
19 points
0
Is this also because GPT-5 is much more like a “black box software” and a lot less like a “model”? Do the evals run with the assumption that they are running it on a “model” (or something close enough to it), and not a “black box software” that could be doing absolutely anything behind the scenes (including various web searches, addition of hidden context, filtering, even potentially human mechanical turks answering everything)?
Even if you override the date, if its doing hidden web searches in the back, those will be based on todays date on todays internet and will affect its result. It may not solve your problem if this is the case.
I would imagine future “models” will only increasingly move in that direction of a hybrid approach and less like a true foundation model that anyone can do anything on top of, both for functionality, safety and business-model reasons (e.g. Google may not allow their models to remove ads nor reverse engineer their own software).
- Lukas Petersson 19 Sep 2025 16:25 UTC
  1 point
  0
  Parent
  This is a great point. I admit I have to better understand what each model provider does behind the scenes in the API. Sad if the days of access to the model is gone.
VojtaKovarik 22 Sep 2025 11:55 UTC
15 points
8
I want to flag that things like this are increasingly going to be a problem even if we solve the narrower “system date mismatch” issue.
- The deployed AI will predictably have access to the internet and probably also information abou what kind of “pre-deployment” testing it went through. The access to the internet will let it reliably figure out what date it is. And this in turn lets the AI figure out that it is, extremely likely, no longer in testing. Or at least gives it a good chance to.
- This will be true for the deployed AI. But the not-yet-deployed AI will know it too, on a quite deep level, because this is all quite easily predictable. So if it doesn’t see internet access, it knows it is likely still in testing. If it only has access to weirdly limited or inconsistent “internet”, it knows it is still in testing. This will be a problem even if the AI isn’t scheming. But if it is scheming, it can even investigate the “internet” it has access to. And if it finds many inconsistencies—because you tried to make it believe that it is in 2026 when it is in fact 2025, and you don’t know how 2026 internet will look like—it knows it is probably still in testing.
A relevant project idea: I think that to the extent that (some types of) evaluations are going to (in some cases) predictably fail like this, it would be nice to clearly demonstrate that ahead of time. Vince Conitzer had a nice idea to run a sort of competition or challenge on things like this. I am not certain what would be the best way of going about it, or whether it makes sense—but I was playing around with some ideas in this google doc. If anybody has some ideas, I would definitely be keen on hearing them!

(I was planning to make a separate post about this later, but it feels very relevant to the current post, so I am hastily commenting earlier. Apologies for the ideas being a bit jumbled as a result.)