Matthew Barnett comments on Language models seem to be much better than humans at next-token prediction

Matthew Barnett 11 Aug 2022 23:49 UTC
LW: 5 AF: 3
0
AF
Building on this comment, I think it might be helpful for readers to make a few distinctions in their heads:
- “True entropy of internet text” refers to the entropy rate (measured in bits per character, or bits per byte) of English text, in the limit of perfect prediction abilities.
  
  Operationally, if one developed a language model such that the cross entropy between internet text and the model was minimized to the maximum extent theoretically possible, the cross entropy score would be equal to the “true” entropy of internet text. By definition, scaling laws dictate that it takes infinite computation to train a model to reach this cross entropy score. This quantity depends on the data distribution, and is purely a hypothetical (though useful) abstraction.
- “Human-level perplexity” refers to perplexity associated with humans tested on the predict-the-next-token task. Perplexity, in this context, is defined as two raised to the power of the cross entropy between internet text, and a model.
- “Human-level performance” refers to a level of performance such that a model is doing “about as well as a human”. This term is ambiguous, but is likely best interpreted as a level of perplexity between the “true perplexity” and “human-level perplexity” (as defined previously).