It’s also useful to emphasize why even if the mixtures are the same, having different priors can make a practical difference. E.g., imagine that in the example above we had one prior giving 100% weight to ν, and another prior giving 50% weight to each of ν0 and ν1. They give the same mixture, but the first prior can’t update, and the second prior can!
Okay, I think I overstated the extent to which the difference in priors matters in the previous comments and crossed out “practical”.
Basically, I was right that the prior that gives 100% on ν cannot update, it gives all its weight to ν no matter how much data comes in. However, νitself can update with more data and shift between ν1 and ν2.
I can see that this feels perhaps very syntactic, but in my mind the two priors still feel different. One of them is saying “The world first samples a bit indicating whether the world will continue with world 0 or world 1”, and the other one is saying “I am uncertain on whether we live in world 0 or world 1″.
The difference is not a “practical” one as long as you only use the posterior predictive distribution, but in some AIXI variants (KSA, certain safety proposals) the posterior weights themselves are accessed and the form may matter. Arguably this is a defect of those variants.
Might be worth more explicitly noting in the post that P_sol and P_ap in fact define the same semimeasure over strings(up to a multiplicative factor) From a skim I was confused about this point “wait, is he saying that not only are alt-complexity and K-complexity different, but even define different probability distributions? That seems to contradict the universality of P_sol, doesn’t it....?”
Good idea, I now added the following to the opening paragraphs of the section doing the comparisons:
Importantly, due to Theorem 4, this means that the Solomonoff prior Psol and a priori prior Pap lead up to a constant to the same predictions on sequences. The advantages of the priors that we analyze are thus not statements about their induced predictive distributions.
It’s also useful to emphasize why even if the mixtures are the same, having different priors can make a
practicaldifference. E.g., imagine that in the example above we had one prior giving 100% weight to ν, and another prior giving 50% weight to each of ν0 and ν1. They give the same mixture, but the first prior can’t update, and the second prior can!… Wait, are you saying we’re not propagating updates into ν to change the mass it puts on inputs 0 vs. 1?
Okay, I think I overstated the extent to which the difference in priors matters in the previous comments and crossed out “practical”.
Basically, I was right that the prior that gives 100% on ν cannot update, it gives all its weight to ν no matter how much data comes in. However, ν itself can update with more data and shift between ν1 and ν2.
I can see that this feels perhaps very syntactic, but in my mind the two priors still feel different. One of them is saying “The world first samples a bit indicating whether the world will continue with world 0 or world 1”, and the other one is saying “I am uncertain on whether we live in world 0 or world 1″.
The difference is not a “practical” one as long as you only use the posterior predictive distribution, but in some AIXI variants (KSA, certain safety proposals) the posterior weights themselves are accessed and the form may matter. Arguably this is a defect of those variants.
Might be worth more explicitly noting in the post that P_sol and P_ap in fact define the same semimeasure over strings(up to a multiplicative factor) From a skim I was confused about this point “wait, is he saying that not only are alt-complexity and K-complexity different, but even define different probability distributions? That seems to contradict the universality of P_sol, doesn’t it....?”
Good idea, I now added the following to the opening paragraphs of the section doing the comparisons: