williawa comments on Gemini 3 is Evaluation-Paranoid and Contaminated

williawa 21 Nov 2025 10:48 UTC
2 points
−7
I’m having the same experience. Half of my cursor queries it starts off like this.
Gemini 3 can tell you this string without searching the web, which other models usually cannot, indicating that Gemini 3 was very likely trained on benchmark data, possibly the whole swath of benchmarks that rely on the BIG-bench string.
Think this is false. Gemini does incredibly well on SimpleQA which is like the best general knowledge test. And google is just cracked at pretraining which is what gets the knowledge into the model. Seems plausible it remembering the string is just a byproduct of this.
(gemini 3.0, 2.5pro, sonnet 4.5, gpt 5.1)
edit: anyone care to share why they so heavily downvoted the post? Its fair if you disagree with the claim about gemini3 strong general knowledge explaining it knowing the canary string and therefore disagree-downvote, but why downvote the post overall?
What links here?
- Gemini 3 Pro Is a Vast Intelligence With No Spine by Zvi (24 Nov 2025 14:00 UTC; 56 points)
- ErickBall 21 Nov 2025 16:52 UTC
  4 points
  1
  Parent
  Doing well on SimpleQA could also be evidence of benchmark contamination in the training data.
  The “impossible scenario of actual time travel” [forward in time] is pretty funny. Thanks for replicating the key findings from the post.