Vladimir_Nesov comments on AI #57: All the AI News That’s Fit to Print

Vladimir_Nesov 29 Mar 2024 19:00 UTC
4 points
1

The model is a next token predictor. If you strip out all the next tokens that discuss the topic, it will learn that the probability of discussing the topic is zero.

The model is shaped by tuning from features of a representation produced by an encoder trained for the next-token prediction task. These features include meanings relevant to many possible topics. If you strip all the next tokens that discuss a topic, its meaning will still be prominent in the representation, so the probability of the tuned model being able to discuss it is high.