I am struggling to understand part of how Solomonoff’s Induction is justified. Once we have all hypothesis that satisfy the data represented as a series of 0s and 1s, how can we say that some hypothesis are more likely than others? I understand that some strings have a higher probability of appearing within a certain sets (e.g. 101 is 2x as likely in the set of binaries of length 3 as 1111 is in the set of binaries of length 4) but how can we say that we are more likely to encounter 101 versus 1111 in the set of strings of length n (where n > 4)? Won’t both strings occur exactly once? Or is it the case that, let’s say, we are deciding between one hypothesis looks like “101” + “either 0 or 1 or nothing” versus 1111. In this case it is clear that the first is more likely than the second, but this would only be valid if 1010 and 1011 both were also valid algorithms, which doesn’t seem like it would necessarily be the case. Can someone help me understand this?
But why does “having to explain it more” make it less likely? It seems like your argument would only appeal to an intuition which has already accepted and internalized the very simplicity bias which Solomonoff induction is trying to justify.
I am struggling to understand part of how Solomonoff’s Induction is justified. Once we have all hypothesis that satisfy the data represented as a series of 0s and 1s, how can we say that some hypothesis are more likely than others? I understand that some strings have a higher probability of appearing within a certain sets (e.g. 101 is 2x as likely in the set of binaries of length 3 as 1111 is in the set of binaries of length 4) but how can we say that we are more likely to encounter 101 versus 1111 in the set of strings of length n (where n > 4)? Won’t both strings occur exactly once? Or is it the case that, let’s say, we are deciding between one hypothesis looks like “101” + “either 0 or 1 or nothing” versus 1111. In this case it is clear that the first is more likely than the second, but this would only be valid if 1010 and 1011 both were also valid algorithms, which doesn’t seem like it would necessarily be the case. Can someone help me understand this?
Think of it as each bit being generated from scratch. The more bits that are generated, the more you have to explain why they are.
But why does “having to explain it more” make it less likely? It seems like your argument would only appeal to an intuition which has already accepted and internalized the very simplicity bias which Solomonoff induction is trying to justify.