In my experience, the main issue with this kind of thing is finding really central examples of symmetries in the input that are emulatable. There’s a couple easy ones, like low rank[1] structure, but I never really managed to get a good argument for why generic symmetries in the data would often be emulatable[2] in real life.[3]
You might want to chat with Owen Lewis about this. He’s been thinking about connections between input symmetries and mechanistic structure for a while, and was interested in figuring out some kind of general correspondence between input symmetries and parameter symmetries.
If q(x) only depends on a low-rank subspace of the inputs x, there will usually[4] be degrees of freedom in the weights that connect to that input vector. The same is true of the hidden activations, if they’re low rank, we get a corresponding number of free weights. See e.g. section 3.1.2 here.
For a while I was hoping that almost any kind of input symmetry would tend to correspond to low-rank structure in the hidden representations of p(x|Θ∗), if p(.) has the sort of architecture used by modern neural networks. Then, almost any kind of symmetry would be reducible to the low-rank structure case[2], and hence almost any symmetry would be emulatable.
But I never managed to show this, and I no longer think it is true.
There are a couple of necessary conditions for this of course. E.g. the architecture p(.) needs to actually use weight matrices, like neural networks do.
There’s a couple easy ones, like low rank structure, but I never really managed to get a good argument for why generic symmetries in the data would often be emulatable in real life.
Right, I expect emulability to be a specific condition enabled by a particular class of algorithms that a NN might implement, rather than a generic one that is satisfied by almost all weights of a given NN architecture[1]. Glad to hear that you’ve thought about this before, I’ve also been trying to find a more general setting to formalize this argument beyond the toy exponential model.
Maybe this can help decompose the LLC into finer quantities based on where the degeneracy rises from—eg a given critical point’s LLC might come solely from the degeneracy in the parameter-function map, some from one of the multiple groups that the true distribution is invariant under at order r, other from an interaction of several groups, etc (sort of Mobius-like inversion)
And perhaps it’s possible to distinguish / measure these LLC components experimentally by measuring how the LLC changes as you perturb the true distribution q(x) by introducing new / destroying existing symmetries (susceptibilites-style).
This is more about how I conceptually think they should be (since my motivation is to use their non-genericity to argue why certain algorithms should be favored over others), and there are probably interesting exceptions of symmetries that are generically emulatable due to properties of the NN architecture (eg depth).
In my experience, the main issue with this kind of thing is finding really central examples of symmetries in the input that are emulatable. There’s a couple easy ones, like low rank[1] structure, but I never really managed to get a good argument for why generic symmetries in the data would often be emulatable[2] in real life.[3]
You might want to chat with Owen Lewis about this. He’s been thinking about connections between input symmetries and mechanistic structure for a while, and was interested in figuring out some kind of general correspondence between input symmetries and parameter symmetries.
If q(x) only depends on a low-rank subspace of the inputs x, there will usually[4] be degrees of freedom in the weights that connect to that input vector. The same is true of the hidden activations, if they’re low rank, we get a corresponding number of free weights. See e.g. section 3.1.2 here.
Good name for this concept by the way, thanks.
For a while I was hoping that almost any kind of input symmetry would tend to correspond to low-rank structure in the hidden representations of p(x|Θ∗), if p(.) has the sort of architecture used by modern neural networks. Then, almost any kind of symmetry would be reducible to the low-rank structure case[2], and hence almost any symmetry would be emulatable.
But I never managed to show this, and I no longer think it is true.
There are a couple of necessary conditions for this of course. E.g. the architecture p(.) needs to actually use weight matrices, like neural networks do.
Right, I expect emulability to be a specific condition enabled by a particular class of algorithms that a NN might implement, rather than a generic one that is satisfied by almost all weights of a given NN architecture[1]. Glad to hear that you’ve thought about this before, I’ve also been trying to find a more general setting to formalize this argument beyond the toy exponential model.
Other related thoughts[2]:
Maybe this can help decompose the LLC into finer quantities based on where the degeneracy rises from—eg a given critical point’s LLC might come solely from the degeneracy in the parameter-function map, some from one of the multiple groups that the true distribution is invariant under at order r, other from an interaction of several groups, etc (sort of Mobius-like inversion)
And perhaps it’s possible to distinguish / measure these LLC components experimentally by measuring how the LLC changes as you perturb the true distribution q(x) by introducing new / destroying existing symmetries (susceptibilites-style).
This is more about how I conceptually think they should be (since my motivation is to use their non-genericity to argue why certain algorithms should be favored over others), and there are probably interesting exceptions of symmetries that are generically emulatable due to properties of the NN architecture (eg depth).
Some of these ideas were motivated following a conversation with Fernando Rosas.