Sandi comments on larger language models may disappoint you [or, an eternally unfinished draft]

Sandi 3 May 2022 21:47 UTC
1 point
Epistemic status: I’m not familiar with the technical details of how LMs work, so this is more word association.
You can glide along almost thinking “a human wrote this,” but soon enough, you’ll hit a point where the model gives away the whole game. Not just something weird (humans can be weird) but something alien, inherently unfitted to the context, something no one ever would write, even to be weird on purpose.
What if the missing ingredient is a better sampling method, as in this paper? To my eye, the completions they show don’t seem hugely better. But I do buy their point that sampling for high probability means you get low information completions.
- nostalgebraist 3 May 2022 22:24 UTC
  5 points
  Parent
  I’ve tried the method from that paper (typical sampling), and I wasn’t hugely impressed with it. In fact, it was worse than my usual sampler to a sufficient extent that users noticed the difference, and I switched back after a few days. See this post and these tweets.
  (My usual sampler one I came up with myself, called Breakruns. It works the best in practice of any I’ve tried.)
  I’m also not sure I really buy the argument behind typical sampling. It seems to conflate “there are a lot of different ways the text could go from here” with “the text is about to get weird.” In practice, I noticed it would tend to do the latter at points where the former was true, like the start of a sample or of a new paragraph or section.
  Deciding how you sample is really important for avoiding the repetition trap, but I haven’t seen sampling tweaks yield meaningful gains outside of that area.
  - Sandi 3 May 2022 23:05 UTC
    1 point
    Parent
    Very comprehensive, thank you!