Just to clarify, I think Google’s filtering is stronger rather than weaker[1] compared to filtering out canary strings (though if it were up to me I would do both). Also, Google’s approach should catch the hummingbird question and the CoT transcripts whether there’s a canary string present or not, which is the whole reason they do it the way they do.
[1] for the purpose of not training on benchmark eval data
Hm, so you think if there are some distinctive benchmark questions that have been discussed online, models otherwise trained on that era of internet won’t know details about them?
Probably you should go read my other comment threads on this issue if you want details, but Google’s approach is designed to filter out text that includes benchmark questions regardless of whether there is a canary string. I’m sure it’s not perfect but I think it’s pretty good.
I make no such claims about any other models, just gemini where I have direct knowledge.
Just to clarify, I think Google’s filtering is stronger rather than weaker[1] compared to filtering out canary strings (though if it were up to me I would do both). Also, Google’s approach should catch the hummingbird question and the CoT transcripts whether there’s a canary string present or not, which is the whole reason they do it the way they do.
[1] for the purpose of not training on benchmark eval data
Hm, so you think if there are some distinctive benchmark questions that have been discussed online, models otherwise trained on that era of internet won’t know details about them?
Probably you should go read my other comment threads on this issue if you want details, but Google’s approach is designed to filter out text that includes benchmark questions regardless of whether there is a canary string. I’m sure it’s not perfect but I think it’s pretty good.
I make no such claims about any other models, just gemini where I have direct knowledge.