cousin_it comments on On the Universal Distribution

cousin_it 29 Oct 2021 21:58 UTC
4 points
It’s interesting that the “up to a finite fudge factor” problem isn’t specific to universal priors. The same problem exists in ordinary probability theory, where you can have different priors about, for example, the propensity of a coin. Then after many observations, all reasonable priors get closer and closer to the true propensity of the coin.

Then it’s natural to ask, what kind of such long-run truths do all universal priors converge on? It’s not only truths of the form “this bit string comes from this program”, because a universal prior can also look at a random string where a certain fraction of entries are zeros, and predict that the same trend will hold in the future (learn the propensity of a coin). So what’s the general form of long-run truths that can be learned by universal priors? I’m having trouble formulating this to myself, and can’t even decide if the set of learnable truths is inherent in the prior or something we imagine on top of it.
- TekhneMakre 29 Oct 2021 23:29 UTC
  3 points
  Parent
  “this bit string comes from this program”
  To be clear, SI can’t learn which program, just a program with the same functional behavior (depending on the setup, same functional behavior on the prefixes of the string in question).
  long-run truths that can be learned by universal priors
  Hm. Say we have universal priors P and Q. For computable infinite strings x, we have that P(x|n) converges to Q(x|n), because they both converge to the right answer. For any x, we have that P(x|n) and Q(x|n) are always within some fixed constant factor of each other, by definition. I conjecture, very unconfidently, that for any P there’s a universal Q and a string x such that | P(x|n) - Q(x|n) | > c for some fixed c > 0 for infinitely many n. I don’t even have an idea to show this and don’t know if it’s true, but the intuition is like, an adversary choosing x should be able to find a machine M (or rather, equivalence classes of machines that make the same predictions; this trips me up in constructing Q) that Q and P have different priors on, and are consistent so far with the x|n chosen; then the adversary confirms M heavily by choosing more bits of x, so that the probability on M dominates, and the difference between Q(M) and P(M) is ballooned up maximally (given the constraint that P(Q) and Q(P) are positive); then the adversary picks a different M’ and repeats, and so on.