Joseph Van Name comments on Joseph Van Name’s Shortform

Joseph Van Name 3 May 2025 9:38 UTC
9 points
0
In this post, I will post some observations that I have made about the octonions that demonstrate that the machine learning algorithms that I have been looking at recently behave mathematically and such machine learning algorithms seem to be highly interpretable. The good behavior of these machine learning algorithms is in part due to the mathematical nature of the octonions and also the compatibility with the octonions and the machine learning algorithm. To be specific, one should think of the octonions as encoding a mixed unitary quantum channel that looks very close to the completely depolarizing channel, but my machine learning algorithms work well with those sorts of quantum channels and similar objects.
Suppose that $K$ is either the field of real numbers, complex numbers, or quaternions.
If $A_{1}, \dots, A_{r} \in M_{m} (K), B_{1}, \dots, B_{r} \in M_{n} (K)$ are matrices, then define an superoperator $Γ (A_{1}, \dots, A_{r}; B_{1}, \dots, B_{r}) : M_{m, n} (K) \to M_{m, n} (K)$
by setting $Γ (A_{1}, \dots, A_{r}; B_{1}, \dots, B_{r}) (X) = A_{1} X B_{1}^{*} + \dots + A_{r} X B_{r}^{*}$
(the domain and range of )and define $Φ (A_{1}, \dots, A_{r}) = Γ (A_{1}, \dots, A_{r}; A_{1}, \dots, A_{r})$ . Define the L_2-spectral radius similarity $∥ (A_{1}, \dots, A_{r}) ≃ (B_{1}, \dots, B_{r}) ∥_{2}$ by setting
$∥ (A_{1}, \dots, A_{r}) ≃ (B_{1}, \dots, B_{r}) ∥_{2}$
$= \frac{ρ (Γ (A_{1}, \dots, A_{r}; B_{1}, \dots, B_{r}))}{ρ (Φ (A_{1}, \dots, A_{r}))^{1 / 2} ρ (Φ (B_{1}, \dots, B_{r}))^{1 / 2}}$ where $ρ$ denotes the spectral radius.
Recall that the octonions are the unique (up-to-isomorphism) 8 dimensional real inner product space $V$ together with a bilinear binary operation $*$ such that $∥ x * y ∥ = ∥ x ∥ \cdot ∥ y ∥$ and $1 * x = x * 1 = x$ for all $x, y \in V$ .
Suppose that $e_{1}, \dots, e_{8}$ is an orthonormal basis for $V$ . Define operators $(A_{1}, \dots, A_{8})$ by setting $A_{i} v = e_{j} * v$ . Now, define operators $(B_{1}, \dots, B_{64})$ up to reordering by setting ${B_{1}, \dots, B_{64}} = {A_{i} \otimes A_{j} : i, j \in {1, \dots, 8}}$ .
Let $d$ be a positive integer. Then the goal is to find complex symmetric $d \times d$ -matrices $(X_{1}, \dots, X_{64})$ where $∥ (A_{1}, \dots, A_{64}) ≃ (X_{1}, \dots, X_{64}) ∥_{2}$ is locally maximized. We achieve this goal through gradient ascent optimization. Since we are using gradient ascent, I consider this to be a machine learning algorithm, but the function mapping $A_{j}$ to $X_{j}$ is a linear transformation, so we are training linear models here (we can generalize this fitness function to one where we train non-linear models though, but that takes a lot of work if we want the generalized fitness functions to still behave mathematically).
Experimental Observation: If $1 \leq d \leq 8$ , then we can easily find complex symmetric matrices $(X_{1}, \dots, X_{64})$ where $∥ (A_{1}, \dots, A_{64}) ≃ (X_{1}, \dots, X_{64}) ∥_{2}$ is locally maximized and where $∥ (A_{1}, \dots, A_{64}) ≃ (X_{1}, \dots, X_{64}) ∥_{2}^{2} = (2 d + 6) / 64 = (d + 3) / 32.$
If $7 \leq d \leq 16$ , then we can easily find complex symmetric matrices $(X_{1}, \dots, X_{64})$ where $∥ (A_{1}, \dots, A_{64}) ≃ (X_{1}, \dots, X_{64}) ∥_{2}$ is locally maximized and where $∥ (A_{1}, \dots, A_{64}) ≃ (X_{1}, \dots, X_{64}) ∥_{2}^{2} = (2 d + 4) / 64 = (d + 2) / 32.$
.