Razied comments on Why square errors?

Razied 27 Nov 2022 3:53 UTC
20 points
10
As usual, being a bayesian makes everything extraordinarily clear. The mean-squared-error loss is just the negative logarithm of your data likelihood $P (x_{1}, . . ., x_{n} | α, β) = \prod_{i} exp (- \frac{(x_{i} - α x_{i} - β)^{2}}{2 σ^{2}})$ under the assumption of gaussian-distributed data, so “minimizing the mean-squared-loss” is completely equivalent to a MLE with gaussian errors . Any other loss you might want to compute directly implies an assumption about the data distribution, and vice-versa. If you have reason to believe that your data might not be normally distributed around an x-dependent mean… then don’t use a mean-squared loss
- Oliver Sourbut 28 Nov 2022 10:49 UTC
  3 points
  0
  Parent
  This approach also makes lots of regularisation techniques transparent. Typically regularisation corresponds to applying some prior (over the weights/parameters of the model you’re fitting). e.g. L2 norm aka ridge aka weight decay regularisation corresponds exactly to taking a Gaussian prior on the weights and finding the Maximum A Priori estimate (rather than the Maximum Likelihood).
- Aprillion 27 Nov 2022 9:19 UTC
  1 point
  0
  Parent
  see your β there? you assume that people remember to “control for bias” before they apply tools that assume Gaussian error
  
  that is indeed what I should have remembered about the implications of “we can often assume approximately normal distribution” from my statistics course ~15 years ago, but then I saw people complaining about sensitivity to outliers in 1 direction and I failed to make a connection until I dug deeper into my reasoning