Nathan Helm-Burger comments on Sparse trinary weighted RNNs as a path to better language model interpretability

Nathan Helm-Burger 17 Sep 2022 20:16 UTC
2 points
1
I agree that alternative, more interpretable, architectures are a plausible path to alignment. I think maybe there’s some tradeoff between alignment tax (e.g. reduced ease of training, diversion from mainstream path) and increased interpretability. I, myself, am working on an experiment with unusually sparse nets with architecture much closer to (and hopefully interoperable with) a GPT-like transformer.
- Am8ryllis 17 Sep 2022 23:33 UTC
  1 point
  0
  Parent
  I am hopeful that we can get interpretability and easy training. But you may well be right.
  After skimming some of your progress reports, I am very excited about your sparse nets work!
  - Nathan Helm-Burger 19 Sep 2022 3:18 UTC
    2 points
    0
    Parent
    Thanks! And I’m excited to hear more about your work. It sounds like if it did work, the results would be quite interesting.