I am surprised to hear that there have been experiments testing this. Wouldn’t performing a bunch of totally new pre-training runs be extremely expensive?
Training a 6B model for 500B tokens costs about 20k USD. That number increases linearly with model size and # of tokens, and decreases with the amount of money you have. Work like this is super doable, especially at large labs.
Anthropic has done some interesting semi-public research on data filtering (Chen et al., 2025). Speaking of which, that report gave quite a positive impression of data filtering. I’m curious what changed in their latest results.
Plausible to me that, as advances in model capabilities improve generalization, filtering the training dataset makes less of a difference, since the model can effectively infer the missing parts from what it does know.
Of course it is plausible, but there is seemingly no evidence supporting the claim.
That research is from August. Seems much more likely to me that they’ve just chosen to switch focus to more scalable (ie, less expensive) approaches than that they’ve scaled this up since then and found conclusive conflicting results already.
Some of the phrasing also doesn’t give the impression that they’ve tried very hard to make it work: “We expect this to become even more of an issue as AIs increasingly use tools” → phrased as a prediction, not based on evidence or current state. Applying filtering to tool use “wasn’t enough assurance against misuse”? What does that even mean? Are we demanding more of filtering than other approaches now? ”We could have made more progress here with more research effort, but it likely would have required...” → didn’t try, another prediction Didn’t mention anything about what caused filtering to suddenly become less effective. Why?
but there is seemingly no evidence supporting the claim.
There is plenty of evidence! On LW we generally use the word “evidence” in the “bayesian evidence” sense of the term. So “good arguments for X” basically always implies “evidence for X”.
No worries if you are used to using these words in a more “scientific evidence” sense, but it’s actually a pretty important LW norm to think of evidence as something that there exists a lot of.
I am surprised to hear that there have been experiments testing this. Wouldn’t performing a bunch of totally new pre-training runs be extremely expensive?
Training a 6B model for 500B tokens costs about 20k USD. That number increases linearly with model size and # of tokens, and decreases with the amount of money you have. Work like this is super doable, especially at large labs.
I imagine they did them on smaller models, plausibly on less total data, which is expensive but not exorbitant
Anthropic has done some interesting semi-public research on data filtering (Chen et al., 2025). Speaking of which, that report gave quite a positive impression of data filtering. I’m curious what changed in their latest results.
Plausible to me that, as advances in model capabilities improve generalization, filtering the training dataset makes less of a difference, since the model can effectively infer the missing parts from what it does know.
Of course it is plausible, but there is seemingly no evidence supporting the claim.
That research is from August. Seems much more likely to me that they’ve just chosen to switch focus to more scalable (ie, less expensive) approaches than that they’ve scaled this up since then and found conclusive conflicting results already.
Some of the phrasing also doesn’t give the impression that they’ve tried very hard to make it work:
“We expect this to become even more of an issue as AIs increasingly use tools” → phrased as a prediction, not based on evidence or current state.
Applying filtering to tool use “wasn’t enough assurance against misuse”? What does that even mean? Are we demanding more of filtering than other approaches now?
”We could have made more progress here with more research effort, but it likely would have required...” → didn’t try, another prediction
Didn’t mention anything about what caused filtering to suddenly become less effective. Why?
There is plenty of evidence! On LW we generally use the word “evidence” in the “bayesian evidence” sense of the term. So “good arguments for X” basically always implies “evidence for X”.
No worries if you are used to using these words in a more “scientific evidence” sense, but it’s actually a pretty important LW norm to think of evidence as something that there exists a lot of.