jacob_drori comments on Against blanket arguments against interpretability

jacob_drori 23 Jan 2025 17:56 UTC
1 point
0
We have pretty robust measurements of complexity of algorithms from SLT
This seems overstated. What’s the best evidence so far that the LLC positively correlates with the complexity of the algorithm implemented by a model? In fact, do we even have any models whose circuitry we understand well enough to assign them a “complexity”?
… and it seems like similar methods can lead to pretty good ways of separating parallel circuits (Apollo also has some interesting work here that I think constitutes real progress)
Citation?
- Dmitry Vaintrob 26 Jan 2025 22:27 UTC
  2 points
  0
  Parent
  This seems overstated
  In some sense this is the definition of the complexity of an ML algorithm; more precisely, the direct analog of complexity in information theory, which is the “entropy” or “Solomonoff complexity” measurement, is the free energy (I’m writing a distillation on this but it is a standard result). The relevant question then becomes whether the “SGLD” sampling techniques used in SLT for measuring the free energy (or technically its derivative) actually converge to reasonable values in polynomial time. This is checked pretty extensively in this paper for example.
  A possibly more interesting question is whether notions of complexity in interpretations of programs agree with the inherent complexity as measured by free energy. The place I’m aware of where this is operationalized and checked is our project with Nina on modular addition: here we do have a clear understanding of the platonic complexity, and the local learning coefficient does a very good job of asymptotically capturing it with very good precision (both for memorizing and generalizing algorithms, where the complexity difference is very significant).
  Citation? [for Apollo]
  Look at this paper (note I haven’t read it yet). I think their LIB work is also promising (at least it separates circuits of small algorithms)
  - Louis Jaburi 27 Jan 2025 16:01 UTC
    1 point
    0
    Parent
    The relevant question then becomes whether the “SGLD” sampling techniques used in SLT for measuring the free energy (or technically its derivative) actually converge to reasonable values in polynomial time. This is checked pretty extensively in this paper for example.
    The linked paper considers only large models which are DLNs. I don’t find this too compelling evidence for large models with non-linearities. Other measurements I have seen for bigger/deeper non-linear models seem promising, but I wouldn’t call them robust yet (though it is not clear to me if this is because of an SGLD implementation/hyperparameter issue or if there is a more fundamental problem here).
    As long as I don’t have a more clear picture of the relationship between free energy and training dynamics under SGD, I agree with OP that the claim is too strong.