The claim “the simplest theory that fits the data is also most likely to be correct” (or a variation of that claim regarding compression performance) is a factual claim about the world—a claim that may not be true (aggregate methods, boosting and bagging, can yield better predictors than predicting with the simplest theory).
I think the majority of research in machine learning indicates that this claim IS true. Certainly all methods of preventing overfitting that I am aware of involve some form of capacity control, regularization, or model complexity penalty. If you can cite a generalization theorem that does not depend on some such scheme, I would be very interested to hear about it.
I think the majority of research in machine learning indicates that this claim IS true. Certainly all methods of preventing overfitting that I am aware of involve some form of capacity control, regularization, or model complexity penalty. If you can cite a generalization theorem that does not depend on some such scheme, I would be very interested to hear about it.