Is there any evidence that outcomes in the universe actually occur with probablities in proportion to their information-complexity?
Yes, and the first piece of evidence is rather trivial. For any given law of physics, chemistry, etc. or basically any model of anything in the universe, I can conjure up an arbitrary amount of more and more complicated hypotheses that match the current data, but all or nearly-all of which will fail utterly against new data obtained later.
For a very trivial thought experiment / example, we could have an alternate hypothesis which includes all of the current data, with only instructions to the turing machine to print this data. Then we could have another which includes all the current data twice, but tells the turing machine to only print one copy. Necessarily, both of these will fail against new data, because they will only print the old data and halt.
We could conjure any infinities of copies similar to this which also contain arbitrary amounts of gibberish right after the old data, gibberish which will be unlikely to match the new data (with probability 1/2^n where n is the length of the new data / gibberish, assuming perfect randomness).
This seems reasonable—it basically makes use of the fact that most statements are wrong, therefore adding a given statement whose truth-value is as-yet-unknown is likely to be wrong.
However, that’s vague. It supports Occam’s Razor pretty well, but does it also offer good evidence that that those likelihoods will manifest in real-world probabilities IN EXACT PROPORTION to the bit-lengths of their inputs? That is a much more precise claim! (For convenience I am ignoring the problem of multiple algorithms where hypotheses have different bit-lengths.)
It supports Occam’s Razor pretty well, but does it also offer good evidence that that those likelihoods will manifest in real-world probabilities IN EXACT PROPORTION to the bit-lengths of their inputs?
Nope, and we have no idea where we’d even start on evaluating this precisely because of the various problems relating to different languages. I think this is an active area of research.
It does seem though, by observation and inference (heh, use whatever tools you have), that more efficient languages tend to formulate shorter hypotheses that tend to hint at this.
There’s also been some demonstrations of how well SI works for learning and inferring about a completely unknown environment. I think this was what AIXI was about, though I can’t recall specifics.
Yes, and the first piece of evidence is rather trivial. For any given law of physics, chemistry, etc. or basically any model of anything in the universe, I can conjure up an arbitrary amount of more and more complicated hypotheses that match the current data, but all or nearly-all of which will fail utterly against new data obtained later.
For a very trivial thought experiment / example, we could have an alternate hypothesis which includes all of the current data, with only instructions to the turing machine to print this data. Then we could have another which includes all the current data twice, but tells the turing machine to only print one copy. Necessarily, both of these will fail against new data, because they will only print the old data and halt.
We could conjure any infinities of copies similar to this which also contain arbitrary amounts of gibberish right after the old data, gibberish which will be unlikely to match the new data (with probability 1/2^n where n is the length of the new data / gibberish, assuming perfect randomness).
This seems reasonable—it basically makes use of the fact that most statements are wrong, therefore adding a given statement whose truth-value is as-yet-unknown is likely to be wrong.
However, that’s vague. It supports Occam’s Razor pretty well, but does it also offer good evidence that that those likelihoods will manifest in real-world probabilities IN EXACT PROPORTION to the bit-lengths of their inputs? That is a much more precise claim! (For convenience I am ignoring the problem of multiple algorithms where hypotheses have different bit-lengths.)
Nope, and we have no idea where we’d even start on evaluating this precisely because of the various problems relating to different languages. I think this is an active area of research.
It does seem though, by observation and inference (heh, use whatever tools you have), that more efficient languages tend to formulate shorter hypotheses that tend to hint at this.
There’s also been some demonstrations of how well SI works for learning and inferring about a completely unknown environment. I think this was what AIXI was about, though I can’t recall specifics.