mic comments on I think I’m just confused. Once a model exists, how do you “red-team” it to see whether it’s safe. Isn’t it already dangerous?

mic 19 Nov 2023 3:28 UTC
4 points
0
Pretraining on curated data seems like a simple idea. Are there any papers exploring this?
- mishka 19 Nov 2023 8:04 UTC
  3 points
  0
  Parent
  I’ve reviewed someone’s draft which suggests this for AI safety (I hope it will be made public soon).
  
  But I’ve heard rumors that people are trying this… And even from what Janus is saying in the comments/answers to my question https://www.lesswrong.com/posts/tbJdxJMAiehewGpq2/impressions-from-base-gpt-4, I am getting a rather strong suspicion that GPT-4 pretraining has been using some data curation.
  
  From Janus’ two comments there I am getting an impression of a non-RLHF’d system which is, nevertheless, tends to be much stronger than usual in its convictions (or, the virtual characters it creates tend to be stronger than usual in their convictions about the nature of their current reality). There might be multiple reasons for that, but some degree of data curation might be one of them.