Ofer comments on Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

Ofer 4 Sep 2020 16:45 UTC
10 points
0
Both the general idea of trying to train competitive NNs with modular architectures and the idea of trying to use language models to get descriptions of NNs (or parts thereof) seem extremely interesting! I hope a lot of work will be done on these research directions.

We’re assuming humans can interpret small NN’s, given enough time. A “Modular” NN is just a collection of small NN’s connected by sparse weights. If humans could interpret each module in theory, then GPT-N could too.

I’m not sure about that. Notice that we need GPT-N to learn a distribution over strings that assigns more probability to [a modular NN specification followed by a natural language description of its ~~models~~ modules] than [a modular NN specification followed by an arbitrary string]. Learning such a distribution may be unlikely if the training corpus doesn’t contain anything as challenging-to-produce as the former (regardless of what humans can do in theory).
- lifelonglearner 4 Sep 2020 18:23 UTC
  3 points
  0
  Parent
  This is a good point, and this is where I think a good amount of the difficulty lies, especially as the cited example of human interpretable NNs (i.e. Microscope AI) doesn’t seem easily applicable to things outside of image recognition.
  - Ofer 4 Sep 2020 18:58 UTC
    3 points
    0
    Parent
    I just to want to flag that, like Evan, I don’t understand the usage of the term “microscope AI” in the OP. My understanding is that the term (as described here) describes a certain way to use a NN that implements a world model, namely, looking inside the NN and learning useful things about the world. It’s an idea about how to use transparency, not how to achieve transparency.
- Logan Riggs 5 Sep 2020 20:44 UTC
  1 point
  0
  Parent
  I’m expecting either (1) A future GPT’s meta-learning combined with better prompt engineering will be able to learn the correct distribution and find the correct distribution, respectively. Or (2) curating enough examples will be good enough (though I’m not sure if GPT-3 could do it even then).
  - Ofer 6 Sep 2020 13:59 UTC
    1 point
    0
    Parent
    When I said “we need GPT-N to learn a distribution over strings...” I was referring to the implicit distribution that the model learns during training. We need that distribution to assign more probability to the string [a modular NN specification followed by a prompt followed by a natural language description of the modules] than to [a modular NN specification followed by a prompt followed by an arbitrary string]. My concern is that maybe there is no prompt that will make this requirement fulfill.
    
    Re “curating enough examples”, this assumes humans are already able* to describe the modules of a sufficiently powerful language model (powerful enough to yield such descriptions).
    
    *Able in practice, not just in theory.