Zach Furman comments on Against Almost Every Theory of Impact of Interpretability

Zach Furman 21 Aug 2023 6:43 UTC
1 point
0
The reason the Othello result is surprising to the NTK is that neurons implementing an “Othello board state detector” would be vanishingly rare in the initial distribution, and the NTK thinks that the neuron function distribution does not change during training.
Yeah, that’s probably the best way to explain why this is surprising from the NTK perspective. I was trying to include mean-field and tensor programs as well (where that explanation doesn’t work anymore).
As an example, imagine that our input space consisted of five pixels, and at initialization neurons were randomly sensitive to one of the pixels. You would easily be able to construct linear probes sensitive to individual pixels even though the distribution over neurons is invariant over all the pixels.
Yeah, this is a good point. What I meant to specify wasn’t that you can’t recover any permutation-sensitive data at all (trivially, you can recover data about the input), but that any learned structures must be invariant to neuron permutation. (Though I’m feeling sketchy about the details of this claim). For the case of NTK, this is sort of trivial, since (as you pointed out) it doesn’t really learn features anyway.
By the way, there are actually two separate problems that come from the IID assumption: the “independent” part, and the “identically-distributed” part. For space I only really mentioned the second one. But even if you deal with the identically distributed assumption, the independence assumption still causes problems.This prevents a lot of structure from being representable—for example, a layer where “at most two neurons are activated on any input from some set” can’t be represented with independently distributed neurons. More generally a lot of circuit-style constructions require this joint structure. IMO this is actually the more fundamental limitation, though takes longer to dig into.
- interstice 21 Aug 2023 16:16 UTC
  2 points
  0
  Parent
  
  I was trying to include mean-field and tensor programs as well
  
  but that any learned structures must be invariant to neuron permutation. (Though I’m feeling sketchy about the details of this claim)
  
  The same argument applies—if the distribution of intermediate neurons shifts so that Othello-board-state-detectors have a reasonably high probability of being instantiated, it will be possible to construct a linear probe detecting this, regardless of the permutation-invariance of the distribution.
  
  the independence assumption still causes problems
  
  This is a more reasonable objection(although actually, I’m not sure if independence does hold in the tensor programs framework—probably?)
  - Zach Furman 21 Aug 2023 17:35 UTC
    1 point
    0
    Parent
    if the distribution of intermediate neurons shifts so that Othello-board-state-detectors have a reasonably high probability of being instantiated
    Yeah, this “if” was the part I was claiming permutation invariance causes problems for—that identically distributed neurons probably couldn’t express something as complicated as a board-state-detector. As soon as that’s true (plus assuming the board-state-detector is implemented linearly), agreed, you can recover it with a linear probe regardless of permutation-invariance.
    This is a more reasonable objection(although actually, I’m not sure if independence does hold in the tensor programs framework—probably?)
    I probably should’ve just gone with that one, since the independence barrier is the one I usually think about, and harder to get around (related to non-free-field theories, perturbation theory, etc).
    My impression from reading through one of the tensor program papers a while back was that it still makes the IID assumption, but there could be some subtlety about that I missed.