Unfortunately I can’t easily find a link to the presentation: it was a talk on Mondrian random forests by Yee Whye Teh back in 2014. I don’t think it was necessarily anything special about the presentation, since I hadn’t put much thought into them before then.
The very short version is it would be nice if classifiers had fuzzy boundaries—if you look at the optimization underlying things like logistic regression, it turns out that if the underlying data is linearly separable it’ll make the boundary as sharp as possible, and put it in a basically arbitrary spot. Random forests will, by averaging many weak classifiers, create one ‘fuzzy’ classifier that gets the probabilities mostly right in a computationally cheap fashion.
(This comment is way more opaque than I’d like, but most of the ways I’d want to elaborate on it require a chalkboard.)
Unfortunately I can’t easily find a link to the presentation: it was a talk on Mondrian random forests by Yee Whye Teh back in 2014. I don’t think it was necessarily anything special about the presentation, since I hadn’t put much thought into them before then.
The very short version is it would be nice if classifiers had fuzzy boundaries—if you look at the optimization underlying things like logistic regression, it turns out that if the underlying data is linearly separable it’ll make the boundary as sharp as possible, and put it in a basically arbitrary spot. Random forests will, by averaging many weak classifiers, create one ‘fuzzy’ classifier that gets the probabilities mostly right in a computationally cheap fashion.
(This comment is way more opaque than I’d like, but most of the ways I’d want to elaborate on it require a chalkboard.)