streawkceur comments on Gemini 3 is Evaluation-Paranoid and Contaminated

streawkceur 16 Dec 2025 14:55 UTC
3 points
0
TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably output the BIG-bench canary string, indicating that Google likely trained on a broad set of benchmark data.
To my understanding, you only observe this effect for prompts that indicate or imply the current late-2025 time/2025 year. Gemini completes such prompts with “that must be hypothetical writing”, because in the vast majority of its training data, 2025 was in the future (and end-2025 was always hypothetical). I think it is more accurate to phrase this as “Gemini 3 goes off the rails when it sees a prompt that indicates it was written in 2025, because in its training data, everything that implied a 2025 time was a fictional scenario” (that’s also true for 2.5). Or did you manage to elicit such such an effect with a prompt from which the current after-training-data-cutoff date can’t be inferred?
- Alice Blair 16 Dec 2025 17:29 UTC
  3 points
  1
  Parent
  The point is that this demonstrates that Gemini 3 has a lot of paranoid trapped priors and is on the lookout for things that seem wrong about the environment.