One technique that might help for fine-tuning the generator is Meta AI’s DIRECTOR [1]. The technique uses a classifier to estimate the probability that a generated sequence will be unacceptable each time a new token is generated. Rather than generating full completions and sampling among them, this method guides the towards acceptable completions during the generation process. The Blender Bot 3 paper finds that this method works better than the more standard approach of ranking full completions according to the classifier’s acceptability score [2].
One technique that might help for fine-tuning the generator is Meta AI’s DIRECTOR [1]. The technique uses a classifier to estimate the probability that a generated sequence will be unacceptable each time a new token is generated. Rather than generating full completions and sampling among them, this method guides the towards acceptable completions during the generation process. The Blender Bot 3 paper finds that this method works better than the more standard approach of ranking full completions according to the classifier’s acceptability score [2].
[1] https://arxiv.org/pdf/2206.07694.pdf
[2] https://arxiv.org/pdf/2208.03188.pdf