Lucius Bushnaq comments on Lucius Bushnaq’s Shortform

Lucius Bushnaq 2 Jan 2025 20:20 UTC
4 points
0
Minor detail, but this is false in practice because we are doing gradient descent with a non-zero learning rate, so there will be some diffusion between different hyperbolas in weight space as we take gradient steps of finite size.
See footnote 1.