As far as I understand, the basic idea is: In order to possibly eventually become correct, you must switch among theories as you acquire more evidence, moving from simpler to more complex, because it’s impossible to list theories from more complex to simpler (there are no monotonic descending functions from the natural numbers to the natural numbers, and theories can be Godel-coded).
The claim “the simplest theory that fits the data is also most likely to be correct” (or a variation of that claim regarding compression performance) is a factual claim about the world—a claim that may not be true (aggregate methods, boosting and bagging, can yield better predictors than predicting with the simplest theory).
Kevin Kelly is providing an alternative reason why we should follow simplicity in scientific method, one not based on these dubious factual claims.
The claim “the simplest theory that fits the data is also most likely to be correct” (or a variation of that claim regarding compression performance) is a factual claim about the world—a claim that may not be true (aggregate methods, boosting and bagging, can yield better predictors than predicting with the simplest theory).
I think the majority of research in machine learning indicates that this claim IS true. Certainly all methods of preventing overfitting that I am aware of involve some form of capacity control, regularization, or model complexity penalty. If you can cite a generalization theorem that does not depend on some such scheme, I would be very interested to hear about it.
I think Kevin T Kelly has some slight adjustment of this “the scientific method is compression” paradigm. http://www.andrew.cmu.edu/user/kk3n/ockham/Ockham.htm
As far as I understand, the basic idea is: In order to possibly eventually become correct, you must switch among theories as you acquire more evidence, moving from simpler to more complex, because it’s impossible to list theories from more complex to simpler (there are no monotonic descending functions from the natural numbers to the natural numbers, and theories can be Godel-coded).
The claim “the simplest theory that fits the data is also most likely to be correct” (or a variation of that claim regarding compression performance) is a factual claim about the world—a claim that may not be true (aggregate methods, boosting and bagging, can yield better predictors than predicting with the simplest theory).
Kevin Kelly is providing an alternative reason why we should follow simplicity in scientific method, one not based on these dubious factual claims.
I think the majority of research in machine learning indicates that this claim IS true. Certainly all methods of preventing overfitting that I am aware of involve some form of capacity control, regularization, or model complexity penalty. If you can cite a generalization theorem that does not depend on some such scheme, I would be very interested to hear about it.