nostalgebraist comments on larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist 30 Nov 2021 17:33 UTC
4 points
There is no known way to “reverse” an LM like that.
(Well, besides the brute force method, where you generate a preceding token by looping over all possible values for that token. GPT’s vocab has ~50k tokens, so this is ~50k slower than forwared sampling.)
There are some LMs that naturally work in both directions. Namely, masked language models (eg BERT), as opposed to causal language models (eg GPT). Rather than taking a substring as input, masked language models take a complete string, but with some positions randomly masked or corrupted, and it’s trained to undo these changes.
However, these models are mostly used for things other than text generation; it’s possible to make them write text, but the resulting text tends to be lower-quality than what you can get from a comparably sized GPT.