Yonatan Cale answers Should we exclude alignment research from LLM training datasets?

Yonatan Cale 24 Aug 2024 16:31 UTC
5 points
1
As AIs become more capable, we may at least want the option of discussing them out of their earshot.
If I’d want to discuss something outside of an AI’s earshot, I’d use something like Signal, or something that would keep out a human too.
AIs sometimes have internet access, and robots.txt won’t keep them out.
I don’t think having this info in their training set is a big difference (but maybe I don’t see the problem you’re pointing out, so this isn’t confident).
- Ben Millwood 10 Dec 2024 23:50 UTC
  1 point
  0
  Parent
  I think there’s two levels of potential protection here. One is a security-like “LLMs must not see this” condition, for which yes, you need to do something that would keep out a human too (though in practice maybe “post only visible to logged-in users” is good enough).
  
  However I also think there’s a lower level of protection that’s more like “if you give me the choice, on balance I’d prefer for LLMs not to be trained on this”, where some failures are OK and imperfect filtering is better than no filtering. The advantage of targeting this level is simply that it’s much easier and less obtrusive, so you can do it at a greater scale with a lower cost. I think this is still worth something.
  - Yonatan Cale 11 Dec 2024 3:23 UTC
    1 point
    0
    Parent
    I’m not sure I’m imagining the same thing as you, but as a draft solution, how about a robots.txt?
    - Ben Millwood 11 Dec 2024 20:37 UTC
      3 points
      0
      Parent
      
      how about a robots.txt?
      
      Yeah, that’s a strong option, which is why I went around checking + linking all the robots.txt files for the websites I listed above :)
      
      In my other post I discuss the tradeoffs of the different approaches one in particular is that it would be somewhat clumsy to implement post-by-post filters via robots.txt, whereas user-agent filtering can do it just fine.