Steven Byrnes comments on Is there a safe version of the common crawl?

Steven Byrnes 12 Aug 2025 18:58 UTC
7 points
0
Didn’t read it in detail but I think Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs discusses filtering approaches.