gwern comments on [AN #116]: How to make explanations of neurons compositional

gwern 9 Sep 2020 20:49 UTC
LW: 10 AF: 7
AF
The composition paper seems to exemplify what I talk about as my intuition for how NNs work. The models are both very small and trained on little data, but image classification seems to be much easier than NLP (which is why the DL revolution came to image classification many years before NLP), so it’s enough to train the CNN to have fairly meaningful disentangled representations of the kind we expect; their RNN model, however, continues to grope through relatively superficial associations and tricks, as the text database is relatively tiny. I’d predict that if they analyze much larger networks, like BiT or GPT-3, they’d find much more composition, and much less reliance on polysemanticity, and less vulnerability to easy ‘copy-paste’ adversarial examples.
- Rohin Shah 10 Sep 2020 0:30 UTC
  LW: 8 AF: 6
  AF Parent
  Yup, I generally agree (both with the three predictions, and the general story of how NNs work).