Frames for thinking about language models I have seen proposed at various times:
Lookup tables. (To predict the next token, an LLM consults a vast hypothetical database containing its training data and finds matches.)
Statistical pattern recognition machines. (To predict the next token, an LLM uses context as evidence to do Bayesian update on a prior probability distribution, then samples the posterior).
People simulators. (To predict the next token, an LLM infers what kind of person is writing the text, then simulates that person.)
General world models. (To predict the next token, an LLM constructs a belief estimate over the underlying context / physical reality which yielded the sequence, then simulates that world forward.)
The first two are ‘mathematical’ frames and the last two are ‘semantic’ frames. Both of these frames are likely correct to some degree and making them meet in the middle somewhere is the hard part of interpretability
Frames for thinking about language models I have seen proposed at various times:
Lookup tables. (To predict the next token, an LLM consults a vast hypothetical database containing its training data and finds matches.)
Statistical pattern recognition machines. (To predict the next token, an LLM uses context as evidence to do Bayesian update on a prior probability distribution, then samples the posterior).
People simulators. (To predict the next token, an LLM infers what kind of person is writing the text, then simulates that person.)
General world models. (To predict the next token, an LLM constructs a belief estimate over the underlying context / physical reality which yielded the sequence, then simulates that world forward.)
The first two are ‘mathematical’ frames and the last two are ‘semantic’ frames. Both of these frames are likely correct to some degree and making them meet in the middle somewhere is the hard part of interpretability