being trained on “follow instructions”
What does this actually mean, in terms of the details of how you’d train a model to do this?
Take a big language model like GPT-3, and then train it via RL on tasks where it gets given a language instruction from a human, and then it gets reward if the human thinks it’s done the task successfully.
Makes sense, thanks!
What does this actually mean, in terms of the details of how you’d train a model to do this?
Take a big language model like GPT-3, and then train it via RL on tasks where it gets given a language instruction from a human, and then it gets reward if the human thinks it’s done the task successfully.
Makes sense, thanks!