Is there a proof anywhere that occam’s razor is correct? More specifically, that occam priors are the correct priors. Going from the conjunction rule to P(A) >= P(B & C) when A and B&C are equally favored by the evidence seems simple enough (and A, B, and C are atomic propositions), but I don’t (immediately) see how to get from here to an actual number that you can plug into Baye’s rule. Is this just something that is buried in textbook on information theory?
On that note, assuming someone had a strong background in statistics (phd level) and little to no background in computer science outside of a stat computing course or two, how much computer science/other fields would they have to learn to be able to learn information theory?
Not only is there no proof, there isn’t even any evidence for it. Any effort to collect evidence for it leaves you assuming what you’re trying to prove. This is the “problem of induction” and there is no solution; however, since you are built to be incapable of not applying induction and you couldn’t possibly make any decisions without it.
Occam’s razor is dependent on a descriptive language / complexity metric (so there are multiple flavours of the razor).
I think you might be making this sound easier than it is. If there are an infinite number of possible descriptive languages (or of ways of measuring complexity) aren’t there an infinite number of “flavours of the razor”?
Yes, but not all languages are equal—and some are much better than others—so people use the “good” ones on applications which are sensitive to this issue.
There’s a proof that any two (Turing-complete) metrics can only differ by at most a constant amount, which is the message length it takes to encode one metric in the other.
Is there a proof anywhere that occam’s razor is correct? More specifically, that occam priors are the correct priors. Going from the conjunction rule to P(A) >= P(B & C) when A and B&C are equally favored by the evidence seems simple enough (and A, B, and C are atomic propositions), but I don’t (immediately) see how to get from here to an actual number that you can plug into Baye’s rule. Is this just something that is buried in textbook on information theory?
On that note, assuming someone had a strong background in statistics (phd level) and little to no background in computer science outside of a stat computing course or two, how much computer science/other fields would they have to learn to be able to learn information theory?
Thanks to anyone who bites
I found Rob Zhara’s comment helpful.
thanks. I suppose a mathematical proof doesn’t exist, then.
Yes, there is a proof.
http://lesswrong.com/lw/s0/where_recursive_justification_hits_bottom/ljr
Try Solomonoff Induction
Not only is there no proof, there isn’t even any evidence for it. Any effort to collect evidence for it leaves you assuming what you’re trying to prove. This is the “problem of induction” and there is no solution; however, since you are built to be incapable of not applying induction and you couldn’t possibly make any decisions without it.
Occam’s razor is dependent on a descriptive language / complexity metric (so there are multiple flavours of the razor).
Unless a complexity metric is specified, the first question seems rather vague.
I think you might be making this sound easier than it is. If there are an infinite number of possible descriptive languages (or of ways of measuring complexity) aren’t there an infinite number of “flavours of the razor”?
Yes, but not all languages are equal—and some are much better than others—so people use the “good” ones on applications which are sensitive to this issue.
There’s a proof that any two (Turing-complete) metrics can only differ by at most a constant amount, which is the message length it takes to encode one metric in the other.
Of course, the constant can be arbitrarily large.
However, there are a number of domains for which this issue is no big deal.
As far as I can tell, this is exactly zero comfort if you have finitely many hypotheses.
This is little comfort if you have finitely many hypotheses — you can still find some encoding to order them in any way you want.
Bayes’ rule.
Unless several people named Baye collectively own the rule, it’s Bayes’s rule. :)