In this post, I will post some observations that I have made about the octonions that demonstrate that the machine learning algorithms that I have been looking at recently behave mathematically and such machine learning algorithms seem to be highly interpretable. The good behavior of these machine learning algorithms is in part due to the mathematical nature of the octonions and also the compatibility with the octonions and the machine learning algorithm. To be specific, one should think of the octonions as encoding a mixed unitary quantum channel that looks very close to the completely depolarizing channel, but my machine learning algorithms work well with those sorts of quantum channels and similar objects.
Suppose that K is either the field of real numbers, complex numbers, or quaternions.
If A1,…,Ar∈Mm(K),B1,…,Br∈Mn(K) are matrices, then define an superoperator Γ(A1,…,Ar;B1,…,Br):Mm,n(K)→Mm,n(K)
by settingΓ(A1,…,Ar;B1,…,Br)(X)=A1XB∗1+⋯+ArXB∗r
(the domain and range of )and define Φ(A1,…,Ar)=Γ(A1,…,Ar;A1,…,Ar). Define the L_2-spectral radius similarity ∥(A1,…,Ar)≃(B1,…,Br)∥2 by setting
∥(A1,…,Ar)≃(B1,…,Br)∥2
=ρ(Γ(A1,…,Ar;B1,…,Br))ρ(Φ(A1,…,Ar))1/2ρ(Φ(B1,…,Br))1/2 where ρ denotes the spectral radius.
Recall that the octonions are the unique (up-to-isomorphism) 8 dimensional real inner product space V together with a bilinear binary operation ∗ such that∥x∗y∥=∥x∥⋅∥y∥ and 1∗x=x∗1=x for all x,y∈V.
Suppose that e1,…,e8 is an orthonormal basis for V. Define operators (A1,…,A8) by setting Aiv=ej∗v. Now, define operators (B1,…,B64) up to reordering by setting {B1,…,B64}={Ai⊗Aj:i,j∈{1,…,8}}.
Let d be a positive integer. Then the goal is to find complex symmetric d×d-matrices (X1,…,X64) where ∥(A1,…,A64)≃(X1,…,X64)∥2 is locally maximized. We achieve this goal through gradient ascent optimization. Since we are using gradient ascent, I consider this to be a machine learning algorithm, but the function mapping Aj to Xj is a linear transformation, so we are training linear models here (we can generalize this fitness function to one where we train non-linear models though, but that takes a lot of work if we want the generalized fitness functions to still behave mathematically).
Experimental Observation: If 1≤d≤8, then we can easily find complex symmetric matrices (X1,…,X64) where ∥(A1,…,A64)≃(X1,…,X64)∥2 is locally maximized and where ∥(A1,…,A64)≃(X1,…,X64)∥22=(2d+6)/64=(d+3)/32.
If 7≤d≤16, then we can easily find complex symmetric matrices (X1,…,X64) where ∥(A1,…,A64)≃(X1,…,X64)∥2 is locally maximized and where∥(A1,…,A64)≃(X1,…,X64)∥22=(2d+4)/64=(d+2)/32.
In this post, I will post some observations that I have made about the octonions that demonstrate that the machine learning algorithms that I have been looking at recently behave mathematically and such machine learning algorithms seem to be highly interpretable. The good behavior of these machine learning algorithms is in part due to the mathematical nature of the octonions and also the compatibility with the octonions and the machine learning algorithm. To be specific, one should think of the octonions as encoding a mixed unitary quantum channel that looks very close to the completely depolarizing channel, but my machine learning algorithms work well with those sorts of quantum channels and similar objects.
Suppose that K is either the field of real numbers, complex numbers, or quaternions.
If A1,…,Ar∈Mm(K),B1,…,Br∈Mn(K) are matrices, then define an superoperator Γ(A1,…,Ar;B1,…,Br):Mm,n(K)→Mm,n(K)
by settingΓ(A1,…,Ar;B1,…,Br)(X)=A1XB∗1+⋯+ArXB∗r
(the domain and range of )and define Φ(A1,…,Ar)=Γ(A1,…,Ar;A1,…,Ar). Define the L_2-spectral radius similarity ∥(A1,…,Ar)≃(B1,…,Br)∥2 by setting
∥(A1,…,Ar)≃(B1,…,Br)∥2
=ρ(Γ(A1,…,Ar;B1,…,Br))ρ(Φ(A1,…,Ar))1/2ρ(Φ(B1,…,Br))1/2 where ρ denotes the spectral radius.
Recall that the octonions are the unique (up-to-isomorphism) 8 dimensional real inner product space V together with a bilinear binary operation ∗ such that∥x∗y∥=∥x∥⋅∥y∥ and 1∗x=x∗1=x for all x,y∈V.
Suppose that e1,…,e8 is an orthonormal basis for V. Define operators (A1,…,A8) by setting Aiv=ej∗v. Now, define operators (B1,…,B64) up to reordering by setting {B1,…,B64}={Ai⊗Aj:i,j∈{1,…,8}}.
Let d be a positive integer. Then the goal is to find complex symmetric d×d-matrices (X1,…,X64) where ∥(A1,…,A64)≃(X1,…,X64)∥2 is locally maximized. We achieve this goal through gradient ascent optimization. Since we are using gradient ascent, I consider this to be a machine learning algorithm, but the function mapping Aj to Xj is a linear transformation, so we are training linear models here (we can generalize this fitness function to one where we train non-linear models though, but that takes a lot of work if we want the generalized fitness functions to still behave mathematically).
Experimental Observation: If 1≤d≤8, then we can easily find complex symmetric matrices (X1,…,X64) where ∥(A1,…,A64)≃(X1,…,X64)∥2 is locally maximized and where ∥(A1,…,A64)≃(X1,…,X64)∥22=(2d+6)/64=(d+3)/32.
If 7≤d≤16, then we can easily find complex symmetric matrices (X1,…,X64) where ∥(A1,…,A64)≃(X1,…,X64)∥2 is locally maximized and where∥(A1,…,A64)≃(X1,…,X64)∥22=(2d+4)/64=(d+2)/32.
.