Oliver Sourbut comments on Why square errors?

Oliver Sourbut 28 Nov 2022 10:49 UTC
3 points
0
This approach also makes lots of regularisation techniques transparent. Typically regularisation corresponds to applying some prior (over the weights/parameters of the model you’re fitting). e.g. L2 norm aka ridge aka weight decay regularisation corresponds exactly to taking a Gaussian prior on the weights and finding the Maximum A Priori estimate (rather than the Maximum Likelihood).