Garrett Baker comments on Taking the parameters which seem to matter and rotating them until they don’t

Garrett Baker 28 Aug 2022 22:01 UTC
1 point
0
This is interesting, as I’ve (preliminarily) found the opposite with my methods. In my MNIST model, the first and last layers can’t really be optimized any more than they are for sparsity, but the middle layer undergoes a drastic change.