I think it is really important to not train on things that talk about benchmarks or scheming or whatnot, and there is a lot of leakage about especially safety evaluations even when it doesn’t contain the actual evaluation content. People seem to broadly be using canary strings responsibly, to the best of my knowledge, and it is dissapointing that they do not filter it out well.
I’m confused how “they do directly hill climb on high profile metrics” is compatible with any of this, since that seems to imply that they do in fact train on benchmarks, which is the exact thing you just said was false?
Edit: maybe you mean the thing where they optimize for LMArena but don’t technically train on it?
I’m confused how “they do directly hill climb on high profile metrics” is compatible with any of this, since that seems to imply that they do in fact train on benchmarks, which is the exact thing you just said was false?
I assume it means they use the benchmark as the test set, not the training set.
I think it is really important to not train on things that talk about benchmarks or scheming or whatnot, and there is a lot of leakage about especially safety evaluations even when it doesn’t contain the actual evaluation content. People seem to broadly be using canary strings responsibly, to the best of my knowledge, and it is dissapointing that they do not filter it out well.
I’m confused how “they do directly hill climb on high profile metrics” is compatible with any of this, since that seems to imply that they do in fact train on benchmarks, which is the exact thing you just said was false?
Edit: maybe you mean the thing where they optimize for LMArena but don’t technically train on it?
I assume it means they use the benchmark as the test set, not the training set.
Yes this.