Would it be worth it to train a series of base models with only data up to year X for different values of X and see the consequences on alignment of derived assistant models?
Yes, though note that there is a very good chance that there isn’t enough easily accessible and high quality data to create effective pre-2015 LLMs. As you go back in time, exponentially less data is available[1]: ~94 ZBs of digital data was created in 2022, while only ~15.5 ZBs was created in 2015, and only ~2 ZBs was created in 2010. Also, you may run into trouble trying to find conversational datasets not contaminated with post-2022 data. The earliest open dataset for LLM assistant fine-tuning I believe is the first OpenAssistant Conversations Dataset, released 6 months after the launch of ChatGPT. Some form of RHAIF/‘unsupervised’ assistant fine-tuning is probably a much better choice for this task, but I don’t even know if it would work well for this sort of thing. Edit: Apparently Anthropic researchers have just published a paper describing a new form of unsupervised fine-tuning, and it performs well on Alpaca and TruthfulQA—pre-ChatGPT conversational fine-tuning can be done effectively without any time machines.
Would it be worth it to train a series of base models with only data up to year X for different values of X and see the consequences on alignment of derived assistant models?
Yes, though note that there is a very good chance that there isn’t enough easily accessible and high quality data to create effective pre-2015 LLMs. As you go back in time, exponentially less data is available[1]: ~94 ZBs of digital data was created in 2022, while only ~15.5 ZBs was created in 2015, and only ~2 ZBs was created in 2010.
Also, you may run into trouble trying to find conversational datasets not contaminated with post-2022 data. The earliest open dataset for LLM assistant fine-tuning I believe is the first OpenAssistant Conversations Dataset, released 6 months after the launch of ChatGPT.
Some form of RHAIF/‘unsupervised’ assistant fine-tuning is probably a much better choice for this task, but I don’t even know if it would work well for this sort of thing.Edit: Apparently Anthropic researchers have just published a paper describing a new form of unsupervised fine-tuning, and it performs well on Alpaca and TruthfulQA—pre-ChatGPT conversational fine-tuning can be done effectively without any time machines.
Or without the paywall: https://www.researchgate.net/figure/Worldwide-Data-Created-from-2010-to-2024-Source-https-wwwstatistacom-statistics_fig1_355069187
Uh? The OpenAssistant dataset would qualify as supervised learning/fine-tuning, not RLHF, no?
Yeah, it would. Sorry, the post is now corrected.