I was just thinking that there is actually a way to justify using occams razor, because by using it, you will always converge on the true hypothesis in the limit of accumulating evidence. Not sure if I’ve seen this somewhere else before, or if I gigabrained myself into some nonsense:
Let’s say the true world is some finite state machine M’∈M with the input alphabet {1} and the output alphabet {0,1}. Now I feed into this an infinite sequence of 1s. If I use a uniform prior over all possible finite state automatons, then at any step of observing the output, there will be a countably infinite number of machines that explain my observation, so my prior and posterior will always be flat and never converge. Now I use as my prior,m∈M f: M → R, m->2^(|m|+1) where |m| is the number of states after which m repeats (I will view different automatons that always produce the same output as the same). If I use this prior, then M’ will be my top hypothesis after observing |M’| + 1 bits and will just rise in confidence after that. Since we used finite state automatons, we avoid the whole computability business, but my intuition would be that you could make the argument go through with Turing machines, it would have to become way more subtle though.
I rechecked Hutter on induction https://arxiv.org/pdf/1105.5721.pdf and the convergence stuff seems to be already known. Going to recheck logical induction. I think maybe Occam’s razor is actually hard to justify. What is easier justify is using a prior that will actually converge, if there is any explanation at all (your observations aren’t random noise)
All of this is just getting annoyed at the NFL theorem trying to be objective, but one thing that I’d find interesting is what happens if you start out with very different priors.
I was just thinking that there is actually a way to justify using occams razor, because by using it, you will always converge on the true hypothesis in the limit of accumulating evidence. Not sure if I’ve seen this somewhere else before, or if I gigabrained myself into some nonsense:
Let’s say the true world is some finite state machine M’∈M with the input alphabet {1} and the output alphabet {0,1}. Now I feed into this an infinite sequence of 1s. If I use a uniform prior over all possible finite state automatons, then at any step of observing the output, there will be a countably infinite number of machines that explain my observation, so my prior and posterior will always be flat and never converge. Now I use as my prior,m∈M f: M → R, m->2^(|m|+1) where |m| is the number of states after which m repeats (I will view different automatons that always produce the same output as the same). If I use this prior, then M’ will be my top hypothesis after observing |M’| + 1 bits and will just rise in confidence after that. Since we used finite state automatons, we avoid the whole computability business, but my intuition would be that you could make the argument go through with Turing machines, it would have to become way more subtle though.
I rechecked Hutter on induction https://arxiv.org/pdf/1105.5721.pdf and the convergence stuff seems to be already known. Going to recheck logical induction. I think maybe Occam’s razor is actually hard to justify. What is easier justify is using a prior that will actually converge, if there is any explanation at all (your observations aren’t random noise)
Ok yeah. Logical induction just works then because you don’t expect any adversaries in math truths.
All of this is just getting annoyed at the NFL theorem trying to be objective, but one thing that I’d find interesting is what happens if you start out with very different priors.