There are ~two broad ways of thinking about RG. ‘HEP-like’ is structural; it helps roughly map out the space of theories at different scales/couplings, even if it does not capture all of physics at all times (for example, the standard model is a good description up to some energy scale, but each sector is often treated independently). Different aspects of training and inference seems to take the shape of renormalization, although as we pointed out there is a lot of work to be done to understand the various scales and couplings inherent to NNs. A goal of this opportunity space is not to make RG ‘go backwards’, but to correctly map this renormalization picture onto a ‘space of representations’ of a NN theory. I don’t expect this to be easy or simple, but am hopeful that it will shed more theoretical insight onto different training and inference regimes and point to insightful conceptual gaps.
In contrast, interpreting NNs to a certain degree of (human-specified) abstraction is more CMT-like. I suspect you are focusing on this perspective overall. Again, as we pointed out, this formulation of RG is not invertible, there are likely many ways to coarse grain a representation (just as there are many RG schemes in condensed matter), but that doesn’t mean you can’t reframe something like a sparse autoencoder as doing RG.
Sure, there are ‘lots of phase transitions’, but that doesn’t mean that it’s pointless to try and classify or describe some interesting one (like memorization → generalization). Similarly, just because lots of interesting physics happens far from phase transitions doesn’t mean they aren’t useful ‘tethers’ in the space of theories (phi-4 theory has been pretty useful).
Regarding attention, I agree that a theory of representations here would be non-local with respect to inputs. That’s fine. Long range dependencies just change the flavor of RG. A lot depends on how we measure locality, since there’s no a priori natural way to do this in NNs (although there are several options that likely pick up on different relationships between features).
There are ~two broad ways of thinking about RG. ‘HEP-like’ is structural; it helps roughly map out the space of theories at different scales/couplings, even if it does not capture all of physics at all times (for example, the standard model is a good description up to some energy scale, but each sector is often treated independently). Different aspects of training and inference seems to take the shape of renormalization, although as we pointed out there is a lot of work to be done to understand the various scales and couplings inherent to NNs. A goal of this opportunity space is not to make RG ‘go backwards’, but to correctly map this renormalization picture onto a ‘space of representations’ of a NN theory. I don’t expect this to be easy or simple, but am hopeful that it will shed more theoretical insight onto different training and inference regimes and point to insightful conceptual gaps.
In contrast, interpreting NNs to a certain degree of (human-specified) abstraction is more CMT-like. I suspect you are focusing on this perspective overall. Again, as we pointed out, this formulation of RG is not invertible, there are likely many ways to coarse grain a representation (just as there are many RG schemes in condensed matter), but that doesn’t mean you can’t reframe something like a sparse autoencoder as doing RG.
Sure, there are ‘lots of phase transitions’, but that doesn’t mean that it’s pointless to try and classify or describe some interesting one (like memorization → generalization). Similarly, just because lots of interesting physics happens far from phase transitions doesn’t mean they aren’t useful ‘tethers’ in the space of theories (phi-4 theory has been pretty useful).
Regarding attention, I agree that a theory of representations here would be non-local with respect to inputs. That’s fine. Long range dependencies just change the flavor of RG. A lot depends on how we measure locality, since there’s no a priori natural way to do this in NNs (although there are several options that likely pick up on different relationships between features).