There’s a couple easy ones, like low rank structure, but I never really managed to get a good argument for why generic symmetries in the data would often be emulatable in real life.
Right, I expect emulability to be a specific condition enabled by a particular class of algorithms that a NN might implement, rather than a generic one that is satisfied by almost all weights of a given NN architecture[1]. Glad to hear that you’ve thought about this before, I’ve also been trying to find a more general setting to formalize this argument beyond the toy exponential model.
Maybe this can help decompose the LLC into finer quantities based on where the degeneracy rises from—eg a given critical point’s LLC might come solely from the degeneracy in the parameter-function map, some from one of the multiple groups that the true distribution is invariant under at order r, other from an interaction of several groups, etc (sort of Mobius-like inversion)
And perhaps it’s possible to distinguish / measure these LLC components experimentally by measuring how the LLC changes as you perturb the true distribution q(x) by introducing new / destroying existing symmetries (susceptibilites-style).
This is more about how I conceptually think they should be (since my motivation is to use their non-genericity to argue why certain algorithms should be favored over others), and there are probably interesting exceptions of symmetries that are generically emulatable due to properties of the NN architecture (eg depth).
Right, I expect emulability to be a specific condition enabled by a particular class of algorithms that a NN might implement, rather than a generic one that is satisfied by almost all weights of a given NN architecture[1]. Glad to hear that you’ve thought about this before, I’ve also been trying to find a more general setting to formalize this argument beyond the toy exponential model.
Other related thoughts[2]:
Maybe this can help decompose the LLC into finer quantities based on where the degeneracy rises from—eg a given critical point’s LLC might come solely from the degeneracy in the parameter-function map, some from one of the multiple groups that the true distribution is invariant under at order r, other from an interaction of several groups, etc (sort of Mobius-like inversion)
And perhaps it’s possible to distinguish / measure these LLC components experimentally by measuring how the LLC changes as you perturb the true distribution q(x) by introducing new / destroying existing symmetries (susceptibilites-style).
This is more about how I conceptually think they should be (since my motivation is to use their non-genericity to argue why certain algorithms should be favored over others), and there are probably interesting exceptions of symmetries that are generically emulatable due to properties of the NN architecture (eg depth).
Some of these ideas were motivated following a conversation with Fernando Rosas.