shawnghu comments on Mech interp is not pre-paradigmatic

shawnghu 20 Jun 2025 5:10 UTC
1 point
0
So Parameter Decomposition in theory suggests solutions to the anomalies of Second-Wave Mech Interp. But a theory doesn’t make a paradigm.
Nitpicking a little bit here, I think this is a different use of the word “theory” than the use in the phrase “scientific theory”. One could think you mean the latter in its second usage here, but it seems like you’re making a claim more like “these things could make progress explaining some of these things, if the experiments go well”.

> The requirement that the parameter components sum to the original parameters also means that there can be no ‘missing mechanisms’. At worst, there can only be ‘parameter components which aren’t optimally minimal or simple’.

Echoing a part of Adam Shai’s comment, I don’t see how this is different from the feature-based case. Won’t there be a problem if you extract a bunch of parameter components you “can explain”, and then you’re left with a big one you “can’t explain”, which “isn’t optimally minimal or simple”?

> Another attractive property of Parameter Decomposition is that it identifies Minimum Description Length as the optimization criterion for our explanations of neural networks

Why is this an attractive property? (Serious question.)