ZankerH comments on A Simple Introduction to Neural Networks

ZankerH 11 Feb 2020 13:22 UTC
2 points
0
Square error has been used instead of absolute error in many diverse optimization problems in part because its derivative is proportional to the magnitude of the error, whereas the derivative of the absolute error is constant. When you’re trying to solve a smooth optimization problem with gradient methods, you generally benefit from loss functions with a smooth gradient than tends towards zero along with the error.