And it could do that, effectively, with all the so-called “pre-training” data, the stuff written by real people… The assistant transcripts are different. If human minds were involved in their construction, it was only because humans were writing words for the assistant as a fictional character, playing the role of science-fiction authors rather than speaking for themselves. In this process, there was no real mind – human or otherwise – “inhabiting” the assistant role that some of the resulting text portrays.
But the base model already has to predict non-well-written fiction, because there is plenty of non-well-written fiction in the training data, no?
Do we have any data showing if base models do better or worse at predicting fiction compared to non-fictional texts? I’d naively expect bad fiction to be easier to predict than good fiction, as well.
But the base model already has to predict non-well-written fiction, because there is plenty of non-well-written fiction in the training data, no?
Do we have any data showing if base models do better or worse at predicting fiction compared to non-fictional texts? I’d naively expect bad fiction to be easier to predict than good fiction, as well.