Didn’t read it in detail but I think Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs discusses filtering approaches.
Didn’t read it in detail but I think Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs discusses filtering approaches.