This post is great to see, I think renormalization is a very exciting direction for AI safety research!
First, as one possible way to represent the real-world, we can think of representation_0 as a low-energy description of the dataset: IR_0.
If the NN is capable of learning a meaningful generalization of the data, representation_0 flows to representation_1 (now UV_1) via an implicit RG flow to higher energies. Instead of throwing information away, flowing to UV_1 adds structure that allows it to more reliably adapt to unseen information.
Shouldn’t this go the other way, with representation_0 being UV and representation_1 being IR? A NN compresses the input representation (data) to obtain a coarse-grained output representation (label). The ability to throw away information, i.e. the irrelevant noise w.r.t. the target function, is what enables generalization to unseen inputs differing in fine-grained details.
Thanks for the comment! The way I think about it, there are several ways of thinking about RG in terms of different ‘energy’ analogues in NNs, and each is likely tied to a different framing in terms of ‘UV’ and ‘IR’.
For example, during training, you start with a simplified (IR like) description of the dataset that flows to a richer representation, adding finer grained structure capable of generalizing (UV).
During inference, I agree that you can describe this process as UV → IR, as each layer is a progressively coarser representation as the features that are irrelevant for a certain task (like classification) are ‘integrated out’ to yield a usefully abstract simplification. However, you can also think of inference in terms of ‘feature refinement’, where each layer becomes progressively more structured, able to pick up on finer or more abstract details. This ultimately depends on how you think of ‘scale’ along the RG flow.
This post is great to see, I think renormalization is a very exciting direction for AI safety research!
Shouldn’t this go the other way, with representation_0 being UV and representation_1 being IR? A NN compresses the input representation (data) to obtain a coarse-grained output representation (label). The ability to throw away information, i.e. the irrelevant noise w.r.t. the target function, is what enables generalization to unseen inputs differing in fine-grained details.
Thanks for the comment! The way I think about it, there are several ways of thinking about RG in terms of different ‘energy’ analogues in NNs, and each is likely tied to a different framing in terms of ‘UV’ and ‘IR’.
For example, during training, you start with a simplified (IR like) description of the dataset that flows to a richer representation, adding finer grained structure capable of generalizing (UV).
During inference, I agree that you can describe this process as UV → IR, as each layer is a progressively coarser representation as the features that are irrelevant for a certain task (like classification) are ‘integrated out’ to yield a usefully abstract simplification. However, you can also think of inference in terms of ‘feature refinement’, where each layer becomes progressively more structured, able to pick up on finer or more abstract details. This ultimately depends on how you think of ‘scale’ along the RG flow.