[Question] Is there a safe version of the common crawl?

Gunnar_Zarncke12 Aug 2025 14:56 UTC

21 points

The larger LLMs are trained on the common crawl, a publicly available dump of significant parts (400TB) of the public internet. They are also trained on all kinds of additional data, but presumably a large fraction of dangerous content is likely from the common crawl.

Is there a safe version of the common crawl that has the dangerous parts removed (or at least labeled, such that it would be easy to remove)?

From a safety perspective it would probably most useful if material on AI (esp. about misalignment and alignment strategies) were removed. It would also be interesting if material on consciousness were removed to allow testing if LLMs discover it without prior knowledge.

Obviously, this wouldn’t solve the alignment problem because instrumental convergence still holds. But it could by some time.

Gunnar_Zarncke12 Aug 2025 14:56 UTC

21 points

6 comments1 min readLW link

Gunnar_Zarncke 14 Aug 2025 12:37 UTC
2 points
0
Filtering is effective at making models safer.
A team at EleutherAI, UK AISI, and Oxford University asked:
Can we prevent LLMs from learning unsafe technical capabilities (such as biorisk) by filtering out enough of the relevant pretraining data before we begin training a model? Even a fully jailbroken model is unlikely to be helpful if it is deeply ignorant of dangerous knowledge.
They find that data filtering is significantly more tamper-resistant than current safeguards without impacting general capability. It doesn’t provide against use of in-context knowledge.
https://deepignorance.ai/

Viliam 12 Aug 2025 15:26 UTC
8 points
1
Could you ask an AI to filter out the text you don’t want?
(Like, ask an AI1 to filter the text, then use the rest to train AI2.)
- Gunnar_Zarncke 12 Aug 2025 16:12 UTC
  8 points
  2
  Parent
  Sure, but that is expensive. Why would more than one team need to do it?
  Hm. It turns out it wouldn’t be soo expensive. ChatGPT estimated at least 12K$.
  - Steven Byrnes 12 Aug 2025 18:58 UTC
    7 points
    0
    Parent
    Didn’t read it in detail but I think Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs discusses filtering approaches.
  - lemonhope 12 Aug 2025 16:36 UTC
    2 points
    0
    Parent
    Llama 8b might do a decent job
Trevor Hill-Hand 12 Aug 2025 22:19 UTC
3 points
2
Sort of semi-related, there is the “Common Pile”, a successor to “The Pile”. It was not focused on “safe” data, but rather “public domain” data. But, maybe that excludes at least some dangerous data, and could make further filtering easier?