jacob_cannell comments on What evidence is there of LLM’s containing world models?

jacob_cannell 4 Oct 2023 18:27 UTC
2 points
0
Sure, but that doesn’t achieve a good compression capability, and LLMs are trained as universal compressors/predictors (ie they are trained to predict but subject to regularization entropy constraints).
- ChristianKl 4 Oct 2023 23:01 UTC
  3 points
  0
  Parent
  This is a reason for why it makes sense for LLMs to develop world models but it doesn’t prove that an individual LLM uses a world model to answer questions you ask it.
- M. Y. Zuo 4 Oct 2023 19:24 UTC
  1 point
  0
  Parent
  How much of a ‘good compression capability’ have LLMs achieved?
  i.e. How is the metric defined, and how reliable are the figures?
  - jacob_cannell 5 Oct 2023 0:32 UTC
    5 points
    3
    Parent
    Due to the compression prediction equivalence (compression requires a predictive model), and the fact that LLMs are the best known general predictors, implies they are the best known general compressors^[1]. Memorization does not generalize.
    
    One point of common confusion is the large size of trained LLMs. But that is actually irrelevant. An ideal solomonoff inductor would have infinite size and perfect generalization. It is an ensemble distribution over entropy constrained models, not a single entropy constrained model—so the MDL principle only applies to each (of the infinite) submodels, not the whole ensemble.
    
    Same applies to LLMs and the brain. They are—like all highly capable general predictors—some approximation of bayesian ensembles. However there is a good way to measure the total compression—you just measure it throughout the entire training process, so that the only complexity penalty is that of the initial architecture prior (which is tiny).
    
    ↩︎
    https://arxiv.org/abs/2309.10668