(Technically, I suppose one could also train a forward-looking LM on a dataset with reversed strings, then feed it prompts of reversed strings, to make it predict preceding tokens. So I guess the remaining question is whether one can get the same behavior without retraining the LM.)
Yes, that works. This is how models like CogView can go both ways with the reverse caption trick, akin to projectors in GANs for reversing images to latents (see also Decision Transformer). Since it’s just a sequence of tokens concatenated, the model and training is indifferent to whether you trained it with the text caption tokens first and then image tokens (to generate an image based on text) or image tokens and then text captions (to train a captioner). Apparently it costs very little training to finetune the reverse direction, the model already has most or all of the necessary knowledge (which makes sense) and just needs to rejigger its input handling. It also gives you a way to self-critique image generation: if you train CogView with a reverse captioning version, instead of calling out to CLIP to gauge quality of (text,generated-image) pairs, you can simply run the generated-image’s tokens through the reverse CogView and calculate the likelihood of the text caption token by token & sum—if the total likelihood of the text seems strange based on the generated images, then it wasn’t a good generated image.
No one’s done this with text models that I know of. Probably everyone is too keen on doing complicated prompt-finetuning methods of various kinds to do gradient ascent on the model in various ways to optimize the inputs and weights/biases to bother with trying reversing. I’d predict that trying to infer the necessary prompt with the reversing trick wouldn’t work for small models anyhow, and would be a waste of time compared to directly editing/controlling the model.
I’d predict that trying to infer the necessary prompt with the reversing trick wouldn’t work for small models anyhow, and would be a waste of time compared to directly editing/controlling the model.
Also, even if one had a reversed model available, it would not be trivial to generate useful prompts with it.
The goal is (roughly) to find a prompt that maximizes P(correct_answer|prompt). But the reversed model gives you
The answer is a constant so we can ignore it, but P(prompt) is problematic: we don’t care about the likelihood of the prompt, since we plan to condition on it.
Moreover, we need a way to communicate what the prompt is supposed to mean, and a single answer isn’t a sufficient latent for that. (Consider ”...2002? Barack Obama”→“who was the Illinois State senator for the 13th district in the year...”)
Prompt-finetuning resolves the ambiguity by averaging over multiple answers, which could work here, but would require an unusual sampling technique (average likelihood over multiple prompts?).
My intuition is that it would Just Work for large smart models, which are easy to prompt and few-shot well: like the ‘French->English’ prompt, if you reversed that and you kept getting ‘French->English’ as your prompt across multiple examples, well, that’s obviously a good prompt, whatever the objective likelihood of text starting with ‘French->English’ may be. And to the extent that it didn’t work, it would either be relatively ‘obvious’ what the reversed outputs have in common so you could try to come up with a generic prompt (“who was the X State senator for the nth district in the year MMMM?”), or it would at least be a good starting point for the gradient ascent-type procedures (in the same way that with GAN projectors you often treat it as a hybrid: a quick projection of amortized computation into roughly the right latent z and then gradient ascent to clean it up) and you get a bunch of reversed prompts as seeds for optimization to find the best one or maximize diversity for some other reason.
Yes, that works. This is how models like CogView can go both ways with the reverse caption trick, akin to projectors in GANs for reversing images to latents (see also Decision Transformer). Since it’s just a sequence of tokens concatenated, the model and training is indifferent to whether you trained it with the text caption tokens first and then image tokens (to generate an image based on text) or image tokens and then text captions (to train a captioner). Apparently it costs very little training to finetune the reverse direction, the model already has most or all of the necessary knowledge (which makes sense) and just needs to rejigger its input handling. It also gives you a way to self-critique image generation: if you train CogView with a reverse captioning version, instead of calling out to CLIP to gauge quality of (text,generated-image) pairs, you can simply run the generated-image’s tokens through the reverse CogView and calculate the likelihood of the text caption token by token & sum—if the total likelihood of the text seems strange based on the generated images, then it wasn’t a good generated image.
No one’s done this with text models that I know of. Probably everyone is too keen on doing complicated prompt-finetuning methods of various kinds to do gradient ascent on the model in various ways to optimize the inputs and weights/biases to bother with trying reversing. I’d predict that trying to infer the necessary prompt with the reversing trick wouldn’t work for small models anyhow, and would be a waste of time compared to directly editing/controlling the model.
Also, even if one had a reversed model available, it would not be trivial to generate useful prompts with it.
The goal is (roughly) to find a prompt that maximizes P(correct_answer | prompt). But the reversed model gives you
P(prompt | correct_answer)=P(correct_answer | prompt) P(prompt)P(correct_answer)The answer is a constant so we can ignore it, but P(prompt) is problematic: we don’t care about the likelihood of the prompt, since we plan to condition on it.
Moreover, we need a way to communicate what the prompt is supposed to mean, and a single answer isn’t a sufficient latent for that. (Consider ”...2002? Barack Obama” → “who was the Illinois State senator for the 13th district in the year...”)
Prompt-finetuning resolves the ambiguity by averaging over multiple answers, which could work here, but would require an unusual sampling technique (average likelihood over multiple prompts?).
My intuition is that it would Just Work for large smart models, which are easy to prompt and few-shot well: like the ‘French->English’ prompt, if you reversed that and you kept getting ‘French->English’ as your prompt across multiple examples, well, that’s obviously a good prompt, whatever the objective likelihood of text starting with ‘French->English’ may be. And to the extent that it didn’t work, it would either be relatively ‘obvious’ what the reversed outputs have in common so you could try to come up with a generic prompt (“who was the X State senator for the nth district in the year MMMM?”), or it would at least be a good starting point for the gradient ascent-type procedures (in the same way that with GAN projectors you often treat it as a hybrid: a quick projection of amortized computation into roughly the right latent z and then gradient ascent to clean it up) and you get a bunch of reversed prompts as seeds for optimization to find the best one or maximize diversity for some other reason.