Please don’t put ML opt-out strings on other people’s writings. They might want the Future to keep them around. The apparent intent is better conveyed by linking to an instruction for doing this without actually doing this unilaterally.
Commenters seem to agree with you here, and I followed the recommendation by removing the code and adding instructions instead.
But I wonder whether this convention means that I can’t use the code to prevent my comment from being added to a corpus. I think it would be better if comments were scraped separately. Does anybody know how the scraping works?
Idk how others do it, but you can see how LW/AF/EAF comments are scraped for the alignment research dataset here (as you can see we don’t check for the uuid)
Please don’t put ML opt-out strings on other people’s writings. They might want the Future to keep them around. The apparent intent is better conveyed by linking to an instruction for doing this without actually doing this unilaterally.
Commenters seem to agree with you here, and I followed the recommendation by removing the code and adding instructions instead.
But I wonder whether this convention means that I can’t use the code to prevent my comment from being added to a corpus. I think it would be better if comments were scraped separately. Does anybody know how the scraping works?
Idk how others do it, but you can see how LW/AF/EAF comments are scraped for the alignment research dataset here (as you can see we don’t check for the uuid)
Yeah, I guess it is a hopeless endeavor to hide things from web scrapers and by extension GPT-N.