The standard Solomonoff prior discounts hypotheses by 2^(-l), where l is the number of bits required to describe them. However, we can easily imagine a whole class of priors, each with a different discount rate. For instance, one could discount by 1/Z2^(-(2l)) where Z is a normalizing factor to get probabilities to add up to one. Why do we put special emphasis on this rate of discounting rather than any other rate of discounting?
I think that we can justify this discount rate with the principle of maximum entropy, as distributions with steeper asymptotic discounting rates will have lower entropy than distributions with shallower asymptotic discounting rates and any distribution with a shallower discounting rate than 2^(-l) would (probably) diverge and therefore constitute an invalid probability distribution.
Are there arguments/situations justifying steeper discounting rates?
The standard Solomonoff prior discounts hypotheses by 2^(-l), where l is the number of bits required to describe them. However, we can easily imagine a whole class of priors, each with a different discount rate. For instance, one could discount by 1/Z2^(-(2l)) where Z is a normalizing factor to get probabilities to add up to one. Why do we put special emphasis on this rate of discounting rather than any other rate of discounting?
I think that we can justify this discount rate with the principle of maximum entropy, as distributions with steeper asymptotic discounting rates will have lower entropy than distributions with shallower asymptotic discounting rates and any distribution with a shallower discounting rate than 2^(-l) would (probably) diverge and therefore constitute an invalid probability distribution.
Are there arguments/situations justifying steeper discounting rates?