We can use the L2−spectral radius similarity to measure more complicated similarities between data sets.
Suppose that A1,…,Ar are m×m-real matrices and B1,…,Br are n×n-real matrices. Let ρ(A) denote the spectral radius of A and let A⊗B denote the tensor product of A with B. Define the L2-spectral radius by setting ρ2(A1,…,Ar)=ρ(A1⊗A1+⋯+Ar⊗Ar)1/2, Define the L2-spectral radius similarity between A1,…,Ar and B1,…,Br as
We observe that if C is invertible and λ is a constant, then
∥(A1,…,Ar)≃(λCB1C−1,…,λCBrC−1)∥2=1.
Therefore, the L2-spectral radius is able to detect and measure symmetry that is normally hidden.
Example: Suppose that u1,…,ur;v1,…,vr are vectors of possibly different dimensions. Suppose that we would like to determine how close we are to obtaining an affine transformation T with T(uj)=vj for all j (or a slightly different notion of similarity). We first of all should normalize these vectors to obtain vectors x1,…,xr;y1,…,yr with mean zero and where the covariance matrix is the identity matrix (we may not need to do this depending on our notion of similarity). Then ∥(x1x∗1,…,xrx∗r)≃(y1y∗1,…,yry∗r)∥2 is a measure of low close we are to obtaining such an affine transformation T. We may be able to apply this notion to determining the distance between machine learning models. For example, suppose that M,N are both the first few layers in a (typically different) neural network. Suppose that a1,…,ar is a set of data points. Then if uj=M(aj) and vj=M(aj), then ∥(x1x∗1,…,xrx∗r)≃(y1y∗1,…,yry∗r)∥2 is a measure of the similarity between M and N.
I have actually used this example to see if there is any similarity between two different neural networks trained on the same data set. For my experiment, I chose a random collection of S⊆{0,1}32×{0,1}32 of ordered pairs and I trained the neural networks M,N to minimize the expected losses E(∥N(a)−b∥2:(a,b)∈S),E(∥M(a)−b∥2:(a,b)∈S). In my experiment, each aj was a random vector of length 32 whose entries were 0′s and 1′s. In my experiment, the similarity ∥(x1x∗1,…,xrx∗r)≃(y1y∗1,…,yry∗r)∥2 was worse than if x1,…,xr,y1,…,yr were just random vectors.
This simple experiment suggests that trained neural networks retain too much random or pseudorandom data and are way too messy in order for anyone to develop a good understanding or interpretation of these networks. In my personal opinion, neural networks should be avoided in favor of other AI systems, but we need to develop these alternative AI systems so that they eventually outperform neural networks. I have personally used the L2-spectral radius similarity to develop such non-messy AI systems including LSRDRs, but these non-neural non-messy AI systems currently do not perform as well as neural networks for most tasks. For example, I currently cannot train LSRDR-like structures to do any more NLP than just a word embedding, but I can train LSRDRs to do tasks that I have not seen neural networks perform (such as a tensor dimensionality reduction).
We can use the L2−spectral radius similarity to measure more complicated similarities between data sets.
Suppose that A1,…,Ar are m×m-real matrices and B1,…,Br are n×n-real matrices. Let ρ(A) denote the spectral radius of A and let A⊗B denote the tensor product of A with B. Define the L2-spectral radius by setting ρ2(A1,…,Ar)=ρ(A1⊗A1+⋯+Ar⊗Ar)1/2, Define the L2-spectral radius similarity between A1,…,Ar and B1,…,Br as
∥(A1,…,Ar)≃(B1,…,Br)∥2=ρ(A1⊗B1+⋯+Ar⊗Br)ρ2(A1,…,Ar)ρ2(B1,…,Br).
We observe that if C is invertible and λ is a constant, then
∥(A1,…,Ar)≃(λCB1C−1,…,λCBrC−1)∥2=1.
Therefore, the L2-spectral radius is able to detect and measure symmetry that is normally hidden.
Example: Suppose that u1,…,ur;v1,…,vr are vectors of possibly different dimensions. Suppose that we would like to determine how close we are to obtaining an affine transformation T with T(uj)=vj for all j (or a slightly different notion of similarity). We first of all should normalize these vectors to obtain vectors x1,…,xr;y1,…,yr with mean zero and where the covariance matrix is the identity matrix (we may not need to do this depending on our notion of similarity). Then ∥(x1x∗1,…,xrx∗r)≃(y1y∗1,…,yry∗r)∥2 is a measure of low close we are to obtaining such an affine transformation T. We may be able to apply this notion to determining the distance between machine learning models. For example, suppose that M,N are both the first few layers in a (typically different) neural network. Suppose that a1,…,ar is a set of data points. Then if uj=M(aj) and vj=M(aj), then ∥(x1x∗1,…,xrx∗r)≃(y1y∗1,…,yry∗r)∥2 is a measure of the similarity between M and N.
I have actually used this example to see if there is any similarity between two different neural networks trained on the same data set. For my experiment, I chose a random collection of S⊆{0,1}32×{0,1}32 of ordered pairs and I trained the neural networks M,N to minimize the expected losses E(∥N(a)−b∥2:(a,b)∈S),E(∥M(a)−b∥2:(a,b)∈S). In my experiment, each aj was a random vector of length 32 whose entries were 0′s and 1′s. In my experiment, the similarity ∥(x1x∗1,…,xrx∗r)≃(y1y∗1,…,yry∗r)∥2 was worse than if x1,…,xr,y1,…,yr were just random vectors.
This simple experiment suggests that trained neural networks retain too much random or pseudorandom data and are way too messy in order for anyone to develop a good understanding or interpretation of these networks. In my personal opinion, neural networks should be avoided in favor of other AI systems, but we need to develop these alternative AI systems so that they eventually outperform neural networks. I have personally used the L2-spectral radius similarity to develop such non-messy AI systems including LSRDRs, but these non-neural non-messy AI systems currently do not perform as well as neural networks for most tasks. For example, I currently cannot train LSRDR-like structures to do any more NLP than just a word embedding, but I can train LSRDRs to do tasks that I have not seen neural networks perform (such as a tensor dimensionality reduction).