This is where this question of “scale” comes in. I want to add that (at least morally/intuitively) we are also thinking about discrete systems like lattices, and then instead of a regulator you have a coarsegraining or a “blocking transformation”, which you have a lot of freedom to choose. For example in PDLT, the object that plays the role of coarsegraining is the operation that takes a probability distribution on neurons and applies a single-layer NN to it.
For me, this question of the relevant scale(s) is the main point of introducing this work. d/w is one example of a cutoff, and one associated with the data distribution is another, but more work needs to be done to understand how to relate potentially different theoretical descriptions (for example, how these two cutoffs work together). We also mention the ‘lattice as regulator’ as a natural cut-off for physical systems, and hope to find similarly natural scales in real-world AI systems.
This is where this question of “scale” comes in. I want to add that (at least morally/intuitively) we are also thinking about discrete systems like lattices, and then instead of a regulator you have a coarsegraining or a “blocking transformation”, which you have a lot of freedom to choose. For example in PDLT, the object that plays the role of coarsegraining is the operation that takes a probability distribution on neurons and applies a single-layer NN to it.
I consider the lattice to be a regulator as well, but, semantics aside, thank you for the example.
For me, this question of the relevant scale(s) is the main point of introducing this work. d/w is one example of a cutoff, and one associated with the data distribution is another, but more work needs to be done to understand how to relate potentially different theoretical descriptions (for example, how these two cutoffs work together). We also mention the ‘lattice as regulator’ as a natural cut-off for physical systems, and hope to find similarly natural scales in real-world AI systems.