neverix comments on Hessian and Basin volume

neverix 6 Jun 2023 15:56 UTC
1 point
0
There are also somewhat principled reasons for using a “fuzzy ellipsoid”, which I won’t explain here.
If you view $T$ as 2x learning rate, the ellipsoid contains parameters which will jump straight into the basin under the quadratic approximation, and we assume for points outside the basin the approximation breaks entirely. If you account for gradient noise ~~in the form of a Gaussian with sigma equal to gradient, the PDF of the resulting point at the basin is equal to the probability a Gaussian parametrized by the ellipsoid at the preceding point.~~ This is wrong, but there is an interpretation of the noise as a Gaussian with variance increasing away from the basin origin.