The model is a next token predictor. If you strip out all the next tokens that discuss the topic, it will learn that the probability of discussing the topic is zero.
The model is shaped by tuning from features of a representation produced by an encoder trained for the next-token prediction task. These features include meanings relevant to many possible topics. If you strip all the next tokens that discuss a topic, its meaning will still be prominent in the representation, so the probability of the tuned model being able to discuss it is high.
The model is shaped by tuning from features of a representation produced by an encoder trained for the next-token prediction task. These features include meanings relevant to many possible topics. If you strip all the next tokens that discuss a topic, its meaning will still be prominent in the representation, so the probability of the tuned model being able to discuss it is high.