David Johnston comments on Steering Behaviour: Testing for (Non-)Myopia in Language Models

David Johnston 6 Dec 2022 23:15 UTC
2 points
0
Your definition is
- For a myopic language model, the next token in a prompt completion is generated based on whatever the model has learned in service of minimising loss on the next token and the next token alone
- A non-myopic language model, on the other hand, can ‘compromise’ on the loss of the immediate next token so that the overall loss over multiple tokens is lower—i.e possible loss on future tokens in the completion may be ‘factored in’ when generating the next immediate token
Here’s a rough argument for why I don’t think this is a great definition (there may well be holes in it). If a language model minimises a proper loss for its next token prediction, then the loss minimising prediction is $P (X_{n} | X_{< n})$ where $X_{n}$ is the $n$ th token. The proper loss minimising prediction for the next $[n, n + m]$ is $P (X_{[n, n + m]} | X_{< n})$ where $P$ is the same probability distribution. Thus with a proper loss there’s no difference between “greedy next-token prediction” and “lookahead prediction”.

If it’s not minimising a proper loss, on the other hand, there are lots of ways in which it can deviate from predicting with a probability distribution, and I’d be surprised if “it’s non-myopic” was an especially useful way to analyse the situation.

On the other hand, I think cross-episode non-myopia is a fairly clear violation of a design idealisation—that completions are independent conditional on the training data and the prompt.