I’ve just reread Eliezer’s post on Occam’s Razor and it seems to have clarified my thinking a little.
I originally said:
If it is also true that hypotheses which are easier to locate make more predictions… then we are perfectly justified in assigning a probability to a hypothesis based on it’s locate-ability.
But I would now say:
If it is also true that hypotheses with a shorter minimum message length make more predictions relative to that minimum message length than do hypotheses with longer MMLs… then we are perfectly justified in assigning a probability to a hypothesis based on MML.
This solves the problem your counterexample presents: Hypothesis 1 describes only one possible world, but Hypothesis 2 requires say, ~30 more bits of information (for those particular strings of results, plus a disjunction) to describe only two possible worlds, making it 2^30 / 2 times less likely.
Then let’s try this. Hypothesis 1 says the sequence will consist of only H repeated forever. Hypothesis 2 says the sequence will be HTTTHHTHTHTTTT repeated forever, where the can take different values on each repetition. The second hypothesis is harder to locate but describes an infinite number of possible worlds :-)
The problem with this counterexample is that you can’t actually repeat something forever.
Even taking the case where we repeat each sequence 1000 times, which seems like it should be similar, you’ll end up with 1000 coin flips and 15000 coin flips for Hypothesis 1 and Hypothesis 2, respectively. So the odds of being in a world where Hypothesis 1 is true are 1 in 2^1000, but the odds of being in a world where Hypothesis 2 is true are 1 in 2^15000.
It’s an apples to balloons comparison, basically.
(I spent about twenty minutes staring at an empty comment box and sweating blood before I figured this out, for the record.)
I think this is still wrong. Take the finite case where both hypotheses are used to explain sequences of a billion throws. Then the first hypothesis describes one world, and the second one describes an exponentially huge number of worlds. You seem to think that the length of the sequence should depend on the length of the hypothesis, and I don’t understand why.
I’ve just reread Eliezer’s post on Occam’s Razor and it seems to have clarified my thinking a little.
I originally said:
But I would now say:
This solves the problem your counterexample presents: Hypothesis 1 describes only one possible world, but Hypothesis 2 requires say, ~30 more bits of information (for those particular strings of results, plus a disjunction) to describe only two possible worlds, making it 2^30 / 2 times less likely.
Then let’s try this. Hypothesis 1 says the sequence will consist of only H repeated forever. Hypothesis 2 says the sequence will be HTTTHHTHTHTTTT repeated forever, where the can take different values on each repetition. The second hypothesis is harder to locate but describes an infinite number of possible worlds :-)
If at first you don’t succeed, try, try again!
The problem with this counterexample is that you can’t actually repeat something forever.
Even taking the case where we repeat each sequence 1000 times, which seems like it should be similar, you’ll end up with 1000 coin flips and 15000 coin flips for Hypothesis 1 and Hypothesis 2, respectively. So the odds of being in a world where Hypothesis 1 is true are 1 in 2^1000, but the odds of being in a world where Hypothesis 2 is true are 1 in 2^15000.
It’s an apples to balloons comparison, basically.
(I spent about twenty minutes staring at an empty comment box and sweating blood before I figured this out, for the record.)
I think this is still wrong. Take the finite case where both hypotheses are used to explain sequences of a billion throws. Then the first hypothesis describes one world, and the second one describes an exponentially huge number of worlds. You seem to think that the length of the sequence should depend on the length of the hypothesis, and I don’t understand why.