johnswentworth comments on Strong Evidence is Common

johnswentworth 16 Mar 2021 16:55 UTC
53 points
0
Meta point: this is one of those insights which is very likely to hit you over the head if you’re doing practical technical work with probabilitistic models, but not if you’re just using them for small semi-intuitive problems (a use-case we often see on LW).
I remember the first time I wrote a mixture of gaussians clustering model, and saw it spitting out probabilities like 10^-5000, and thought it must be a bug. It wasn’t a bug. Probabilities naturally live on a log scale, and those sorts of numbers are normal once we move away from artificially-low-dimensional textbook problems and start working with more realistic high-dimensional systems. When your data channel has a capacity of kilobytes or megabytes per data point, even if 99% of that information is irrelevant, that’s still a lot of bits; the probabilities get exponentially small very quickly.
Tying back to an example in the post: if we’re using ascii encoding, then the string “Mark Xu” takes up 49 bits. It’s quite compressible, but that still leaves more than enough room for 24 bits of evidence to be completely reasonable.
What links here?
- Noosphere89's comment on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.) by Noosphere89 (16 Oct 2023 3:09 UTC; 4 points)
- hazel 20 Mar 2021 13:48 UTC
  6 points
  0
  Parent
  Tying back to an example in the post: if we’re using ascii encoding, then the string “Mark Xu” takes up 49 bits. It’s quite compressible, but that still leaves more than enough room for 24 bits of evidence to be completely reasonable.
  This paper suggests that spoken language is consistently ~39bits/second.
  https://advances.sciencemag.org/content/5/9/eaaw2594