Thanks for posting this! I agree that it’s good to get it out anyways, I thought it was valuable. I especially resonate with the point in the Pure simulators section.
Some responses:
In general I’m skeptical that the simulator framing adds much relative to ‘the model is predicting what token would appear next in the training data given the input tokens’. I think it’s pretty important to think about what exactly is in the training data, rather than about some general idea of accurately simulating the world.
I think that the main value of the simulators framing was to push back against confused claims that treat (base) GPT3 and other generative models as traditional rational agents. That being said, I do think there are some reasons why the simulator framework adds value relative to “the model is doing next token prediction”:
The simulator framework incorporates specific facts about the token prediction task. We train generative models on tokens from a variety of agents, as opposed to a single unitary agent a la traditional behavior cloning. Therefore, we should expect different behaviors when the context implies that different agents are “natural”. In other words,
The simulator framework pushes back against “stochastic parrot” claims. In academia or on ML twitter (or, even more so, academic ML twitter), you often encounter claims that language models are “just” stochastic parrots—i.e. they don’t have “understanding” or “grounding”. My guess is this comes from experience with earlier generations of language models, especially early n-gram/small HMM models that really do lack understanding or grounding. (This is less of a thing that happens on LW/AF.) The simulator framework provides a mechanistic model for how a sophisticated language model that does well on next token prediction task, could end up developing a complicated world model and agentic behavior.
My guess is you have a significantly more sophisticated, empirical model of LMs, such that the simulators framework feels like a simplification to your empirical knowledge + “the model is doing next token prediction”. But I think the simulator framework is valuable because it incorporates additional knowledge about the LM task while pushing back against two significantly more confused framings. (Indeed, Janus makes these claims explicitly in the simulators post!)
(Paul has a post which talks about this ‘what is actually the correct generalization’ thing somewhere that I wanted to link, but I can’t currently find it)
(Paul does talk about intended vs unintended generalization in a bunch of posts, so it’s conceivable you’re thinking about something more specific.)
GPT-style transformers are purely myopic
I’m not sure this is that important, or that anyone else actually thinks this, but it was something I got wrong for a while. I was thinking of everything that happens at sequence position n as about myopically predicting the nth token.
I think this mainly comes up in person with people who’ve just read the intro AI Safety materials, but one example on LW is What exactly is GPT-3′s base objective?.
Thanks for posting this! I agree that it’s good to get it out anyways, I thought it was valuable. I especially resonate with the point in the Pure simulators section.
Some responses:
I think that the main value of the simulators framing was to push back against confused claims that treat (base) GPT3 and other generative models as traditional rational agents. That being said, I do think there are some reasons why the simulator framework adds value relative to “the model is doing next token prediction”:
The simulator framework incorporates specific facts about the token prediction task. We train generative models on tokens from a variety of agents, as opposed to a single unitary agent a la traditional behavior cloning. Therefore, we should expect different behaviors when the context implies that different agents are “natural”. In other words,
The simulator framework pushes back against “stochastic parrot” claims. In academia or on ML twitter (or, even more so, academic ML twitter), you often encounter claims that language models are “just” stochastic parrots—i.e. they don’t have “understanding” or “grounding”. My guess is this comes from experience with earlier generations of language models, especially early n-gram/small HMM models that really do lack understanding or grounding. (This is less of a thing that happens on LW/AF.) The simulator framework provides a mechanistic model for how a sophisticated language model that does well on next token prediction task, could end up developing a complicated world model and agentic behavior.
My guess is you have a significantly more sophisticated, empirical model of LMs, such that the simulators framework feels like a simplification to your empirical knowledge + “the model is doing next token prediction”. But I think the simulator framework is valuable because it incorporates additional knowledge about the LM task while pushing back against two significantly more confused framings. (Indeed, Janus makes these claims explicitly in the simulators post!)
Are you thinking of A naive alignment strategy and optimism about generalization?
(Paul does talk about intended vs unintended generalization in a bunch of posts, so it’s conceivable you’re thinking about something more specific.)
I do think people think variants of this, see the comments of Steering Behaviour: Testing for (Non-)Myopia in Language Models for example.
I’m pretty surprised to hear that anyone made such claims in the first place. Do you have examples of this?
I think this mainly comes up in person with people who’ve just read the intro AI Safety materials, but one example on LW is What exactly is GPT-3′s base objective?.