Can you elaborate your sentence that begins “It’s not a challenge..”?
My understanding is that if our real justification for “Why do we use Occam’s razor?” was “Because that way we avoid overfitting.” then if a future statistical technique that outperformed Occam by proposing weirdly complex hypotheses came along, we would embrace it wholeheartedly.
Boosting and bagging are merely illustrative of the idea that there might be a statistical technique that achieves good performance in a statistical sense, though nobody believes that their outputs (large ensembles of rules) are “really out there”, the way we might believe Schroedinger’s equation is “really out there”.
We only use boosting if our set of low-complexity hypotheses does not contain the solution we need. And instead of switching to a larger set of still-low-complexity hypotheses, we do something much cheaper, a second best thing: we try to find a good hypothesis in the convex hull of the original hypothesis space.
In short, boosting “outperforms Occam” only in man-hours saved: boosting requires less thinking and less work than properly applying Occam’s razor. That really is a good thing, of course.
Can you elaborate your sentence that begins “It’s not a challenge..”?
My understanding is that if our real justification for “Why do we use Occam’s razor?” was “Because that way we avoid overfitting.” then if a future statistical technique that outperformed Occam by proposing weirdly complex hypotheses came along, we would embrace it wholeheartedly.
Boosting and bagging are merely illustrative of the idea that there might be a statistical technique that achieves good performance in a statistical sense, though nobody believes that their outputs (large ensembles of rules) are “really out there”, the way we might believe Schroedinger’s equation is “really out there”.
We only use boosting if our set of low-complexity hypotheses does not contain the solution we need. And instead of switching to a larger set of still-low-complexity hypotheses, we do something much cheaper, a second best thing: we try to find a good hypothesis in the convex hull of the original hypothesis space.
In short, boosting “outperforms Occam” only in man-hours saved: boosting requires less thinking and less work than properly applying Occam’s razor. That really is a good thing, of course.