Am8ryllis comments on Sparse trinary weighted RNNs as a path to better language model interpretability

Am8ryllis 17 Sep 2022 23:22 UTC
2 points
0
Discretized weights/activation are very much not amenable to the usual gradient descent. :) Hence the usual practice is to train in floating point, and then quantize afterwords. Doing this naively tends to cause a big drop in accuracy, but there are tricks involving gradually quantizing during training, or quantizing layer by layer.