Yeah, I agree, there’s this inconvenient competition between shorter programs that encode a probability distribution and longer ones that “hardcode” the random bits.
Here’s a random idea, I’m not sure if it works. Formulating SI as a mixture of semimeasures doesn’t require any particular weighting of these semimeasures, you just need all weights to be nonzero. We can try making the weights of longer programs decrease faster than 2^-L, so the shorter programs outcompete the longer ones.
You’ll hit the opposite problem, I think: the winning program can itself implement some approximation to SI. Would be bad predictions wise, too—the dominating hypotheses would tend to have a doomsday (because when you approximate SI you have to set a limit on number of steps). And if you can only learn that the doomsday hypothesis was wrong after the doomsday did not happen… it’s not good.
I don’t think the distinction between ‘data’ and ‘program’ is fundamental in any way. In any universe where a Turing machine (or a bounded TM) can exist, the data is a program too.
In general, I agree that reasoning about approximations to SI is trickier than reasoning about SI itself. But I don’t see how we can confidently say things like “any method of approximating SI will have such-and-such problem”. Such statements seem to be caused by limitations of our imagination, not limitations of SI. I propose to wrap up this discussion and agree to disagree, at least until we come up with some hard math about the limitations of SI.
I were originally speaking of vanilla SI, not of some not yet defined methods.
And I don’t think it’s even a limitation of SI. It just seems to me that the code vs data distinction is, in the case of machines that can run interpreters * , entirely arbitrary, subjective, and a matter of personal taste.
and not only can run interpreters, but ultimately should, if we are to represent a world that contains computers.
Yeah, I agree, there’s this inconvenient competition between shorter programs that encode a probability distribution and longer ones that “hardcode” the random bits.
Here’s a random idea, I’m not sure if it works. Formulating SI as a mixture of semimeasures doesn’t require any particular weighting of these semimeasures, you just need all weights to be nonzero. We can try making the weights of longer programs decrease faster than 2^-L, so the shorter programs outcompete the longer ones.
You’ll hit the opposite problem, I think: the winning program can itself implement some approximation to SI. Would be bad predictions wise, too—the dominating hypotheses would tend to have a doomsday (because when you approximate SI you have to set a limit on number of steps). And if you can only learn that the doomsday hypothesis was wrong after the doomsday did not happen… it’s not good.
I don’t think the distinction between ‘data’ and ‘program’ is fundamental in any way. In any universe where a Turing machine (or a bounded TM) can exist, the data is a program too.
In general, I agree that reasoning about approximations to SI is trickier than reasoning about SI itself. But I don’t see how we can confidently say things like “any method of approximating SI will have such-and-such problem”. Such statements seem to be caused by limitations of our imagination, not limitations of SI. I propose to wrap up this discussion and agree to disagree, at least until we come up with some hard math about the limitations of SI.
I were originally speaking of vanilla SI, not of some not yet defined methods.
And I don’t think it’s even a limitation of SI. It just seems to me that the code vs data distinction is, in the case of machines that can run interpreters * , entirely arbitrary, subjective, and a matter of personal taste.
and not only can run interpreters, but ultimately should, if we are to represent a world that contains computers.