This is interesting, as I’ve (preliminarily) found the opposite with my methods. In my MNIST model, the first and last layers can’t really be optimized any more than they are for sparsity, but the middle layer undergoes a drastic change.
This is interesting, as I’ve (preliminarily) found the opposite with my methods. In my MNIST model, the first and last layers can’t really be optimized any more than they are for sparsity, but the middle layer undergoes a drastic change.