Leon Lang comments on A Technical Introduction to Solomonoff Induction without K-Complexity

Leon Lang 26 Nov 2025 23:51 UTC
4 points
0
I’m confused. Isn’t one of the standard justification for the Solomonoff prior that you can get it without talking about K-complexity, just by assuming a uniform prior over programs of length $l$ on a universal monotone Turing machine and letting $l$ tend to infinity?
What you describe is not the Solomonoff prior on hypotheses, but the Solomonoff a priori distribution on sequences/histories! This is the distribution I call $M$ in my post. It can then be written as a mixture of LSCSMs, with the weights given either by the Solomonoff prior $P_{s o l}$ (involving Kolmogorov complexity) or the a priori prior $P_{a p}$ in my work. Those priors are not the same.
- Lucius Bushnaq 27 Nov 2025 8:14 UTC
  4 points
  0
  Parent
  If they have the same prior on sequences/histories, then in what relevant sense are they not the same prior on hypotheses? If they both sum to $M (x)$ , how can their predictions come to differ?
  - Leon Lang 27 Nov 2025 15:32 UTC
    4 points
    0
    Parent
    Well, their induced mixture distributions are the same up to a constant, but the priors on hypotheses are different. I’m not sure if you consider the difference “relevant”, perhaps you only care about the induced mixture distribution?
    To make a simple example: Assume there were only three Turing machines $T$ , $T_{0}$ , and $T_{1}$ . Assume that $T (0 p) = T_{0} (p)$ and $T (1 p) = T_{1} (p)$ . Let $ν$ , $ν_{0}$ and $ν_{1}$ be the LSCSMs induced by $T$ , $T_{1}$ , and $T_{2}$ . Notice that $ν$ is a mixture of $ν_{0}$ and $ν_{1}$ : $ν = 1 / 2 ν_{0} + 1 / 2 ν_{1}$ .
    Let $M$ be the mixture distribution given as $M = 1 / 3 ν + 1 / 3 ν_{0} + 1 / 3 ν_{1} .$ Then clearly, $M$ is also represented as $M = 1 / 2 ν_{0} + 1 / 2 ν_{1}$ . My viewpoint is that the prior distributions giving weight $1 / 3$ to each of the three hypotheses is different from the one giving weight $1 / 2$ to each of $ν_{0}$ and $ν_{1}$ , even if their mixture distributions are exactly the same.
    And this is exactly the situation we’re in with the true mixture distribution $M$ from the post. Some of the LSCSMs $ν$ in the mixture are given by $ν = ν_{T}$ for a separate universal monotone Turing machine, which means that $ν_{T}$ is itself a mixture of all LSCSMs. Any such mixtures in the LSCSMs allow to redistribute the prior weight from this LSCSM to all others, without affecting the mixture $M$ in any way.
    This is also related to what makes a prior based on Kolmogorov complexity ultimately so arbitrary: We could have chosen just about anything and it would still essentially sum to $M$ . A posteriori the Kolmogorov complexity then has some mathematical advantages as outlined in the post, however.
    - Lucius Bushnaq 27 Nov 2025 16:13 UTC
      5 points
      2
      Parent
      My viewpoint is that the prior distributions giving weight $1 / 3$ to each of the three hypotheses is different from the one giving weight $1 / 2$ to each of $ν_{0}$ and $ν_{1}$ , even if their mixture distributions are exactly the same.
      That’s pretty unintuitive to me. What does it matter whether we happen to write out our belief state one way or the other? So long as the predictions come out the same, what we do and don’t choose to call our ‘hypotheses’ doesn’t seem particularly relevant for anything?
      We made our choice when we settled on $M$ as the prior. Everything past that point just seems like different choices of notation to me? If our induction procedure turned out to be wrong or suboptimal, it’d be because $M$ was a bad prior to pick, not because we happened to write $M$ down in a weird way, right?
      - Leon Lang 27 Nov 2025 17:30 UTC
        4 points
        0
        Parent
        I answered in the parallel thread, which is probably going down to the crux now. To add a few more points:
        The prior matters for the Solomonoff bound, see Theorem 5. (Tbc., the true value of the prediction error is the same irrespective of the prior, but the bound we can prove differs)
        I think different priors have different aesthetics. Choosing a prior because it gives you a nice result (i.e., Solomonoff prior) feels different from choosing it because it’s a priori correct (like the a priori prior in this post). to me, aesthetics matter.
    - Leon Lang 27 Nov 2025 15:34 UTC
      4 points
      0
      Parent
      It’s also useful to emphasize why even if the mixtures are the same, having different priors can make a ~~practical~~ difference. E.g., imagine that in the example above we had one prior giving 100% weight to $ν$ , and another prior giving 50% weight to each of $ν_{0}$ and $ν_{1}$ . They give the same mixture, but the first prior can’t update, and the second prior can!
      - Lucius Bushnaq 27 Nov 2025 16:19 UTC
        4 points
        0
        Parent
        … Wait, are you saying we’re not propagating updates into $ν$ to change the mass it puts on inputs $0$ vs. $1$ ?
        Leon Lang 27 Nov 2025 17:22 UTC
        4 points
        0
        Parent
        Okay, I think I overstated the extent to which the difference in priors matters in the previous comments and crossed out “practical”.
        Basically, I was right that the prior that gives 100% on $ν$ cannot update, it gives all its weight to $ν$ no matter how much data comes in. However, $ν$ itself can update with more data and shift between $ν_{1}$ and $ν_{2}$ .
        I can see that this feels perhaps very syntactic, but in my mind the two priors still feel different. One of them is saying “The world first samples a bit indicating whether the world will continue with world 0 or world 1”, and the other one is saying “I am uncertain on whether we live in world 0 or world 1″.
        Cole Wyeth 27 Nov 2025 20:23 UTC
        5 points
        0
        Parent
        The difference is not a “practical” one as long as you only use the posterior predictive distribution, but in some AIXI variants (KSA, certain safety proposals) the posterior weights themselves are accessed and the form may matter. Arguably this is a defect of those variants.
        interstice 27 Nov 2025 19:33 UTC
        4 points
        0
        Parent
        Might be worth more explicitly noting in the post that P_sol and P_ap in fact define the same semimeasure over strings(up to a multiplicative factor) From a skim I was confused about this point “wait, is he saying that not only are alt-complexity and K-complexity different, but even define different probability distributions? That seems to contradict the universality of P_sol, doesn’t it....?”
        Leon Lang 28 Nov 2025 1:21 UTC
        4 points
        0
        Parent
        Good idea, I now added the following to the opening paragraphs of the section doing the comparisons:
        Importantly, due to Theorem 4, this means that the Solomonoff prior $P_{s o l}$ and a priori prior $P_{a p}$ lead up to a constant to the same predictions on sequences. The advantages of the priors that we analyze are thus not statements about their induced predictive distributions.