Jonathan_Graehl comments on Sparse trinary weighted RNNs as a path to better language model interpretability

Jonathan_Graehl 17 Sep 2022 20:33 UTC
2 points
0
how is a discretized weight/activation set amenable to the usual gradient descent optimizers?
- Am8ryllis 17 Sep 2022 23:22 UTC
  2 points
  0
  Parent
  Discretized weights/activation are very much not amenable to the usual gradient descent. :) Hence the usual practice is to train in floating point, and then quantize afterwords. Doing this naively tends to cause a big drop in accuracy, but there are tricks involving gradually quantizing during training, or quantizing layer by layer.