anaguma comments on We Built a Tool to Protect Your Dataset From Simple Scrapers

anaguma 25 Jul 2025 15:44 UTC
1 point
0
This seems like a great idea! However, I think it might degrade the usefulness of the dataset, especially if it’s meant to later be used to evaluate LLMs since any jailbreaks etc. would apply in that setting as well. If you provide utilities to clean up the text before evaluation, these could be used for scraping as well.
- Thane Ruthenis 25 Jul 2025 16:53 UTC
  3 points
  0
  Parent
  Yeah, I guess the use-case I had in mind is generally people who don’t want LLMs trained on (particular pieces of) their writing, rather than datasets specifically.