Gunnar_Zarncke answers Is there a safe version of the common crawl?

Gunnar_Zarncke 14 Aug 2025 12:37 UTC
2 points
0
Filtering is effective at making models safer.
A team at EleutherAI, UK AISI, and Oxford University asked:
Can we prevent LLMs from learning unsafe technical capabilities (such as biorisk) by filtering out enough of the relevant pretraining data before we begin training a model? Even a fully jailbroken model is unlikely to be helpful if it is deeply ignorant of dangerous knowledge.
They find that data filtering is significantly more tamper-resistant than current safeguards without impacting general capability. It doesn’t provide against use of in-context knowledge.
https://deepignorance.ai/