Charlie Steiner comments on [missing post]

Charlie Steiner 14 Nov 2023 22:01 UTC
4 points
2
Thanks!
I think one lesson of superposition research is that neural nets are the compressed version. The world is really complicated, and NNs that try to model it are incentivized to try to squeeze as much in as they can.
I guess I also wrote a hot take about this.
- 1stuserhere 15 Nov 2023 12:42 UTC
  2 points
  0
  Parent
  It’s also worth noting that LLMs are not learning directly from the raw input stream but from a crux of that data (LLMs learn on compressed data) i.e. the LLMs are fed tokenized data, and the tokenizers act as compressors. This benefits the models by enabling them to have a more information-rich context.
  - Fergus Fettes 27 Nov 2023 17:10 UTC
    1 point
    0
    Parent
    Would you say that tokenization is part of the architecture?
    And, in your wildest moments, would you say that language is also part of the architecture :)? I mean the latent space is probably mapping either a) brain states or b) world states right? Is everything between latent spaces architecture?