Garrett Baker comments on Recent AI model progress feels mostly like bullshit

Garrett Baker 25 Mar 2025 16:11 UTC
9 points
0

I work at GDM so obviously take that into account here, but in my internal conversations about external benchmarks we take cheating very seriously—we don’t want eval data to leak into training data, and have multiple lines of defense to keep that from happening.

What do you mean by “we”? Do you work on the pretraining team, talk directly with the pretraining team, are just aware of the methods the pretraining team uses, or some other thing?
- Dave Orr 26 Mar 2025 13:43 UTC
  40 points
  2
  Parent
  I don’t work directly on pretraining, but when there were allegations of eval set contamination due to detection of a canary string last year, I looked into it specifically. I read the docs on prevention, talked with the lead engineer, and discussed with other execs.
  So I have pretty detailed knowledge here. Of course GDM is a big complicated place and I certainly don’t know everything, but I’m confident that we are trying hard to prevent contamination.