Spectral radii dimensionality reduction computed without gradient calculations

In this post, I shall describe a fitness function that can be locally maximized without gradient computations. This fitness function is my own. I initially developed this fitness function in order to evaluate block ciphers for cryptocurrency technologies, but I later found that this fitness function may be used to solve other problems such as the clique problem (which is NP-complete) in the average case and some natural language processing tasks as well. After describing algorithms for locally maximizing this fitness function, I conclude that such a fitness function is inherently interpretable and mathematical which is what we need for AI safety.

Let denote either the field of real numbers, complex numbers, or the division ring of quaternions. Given and , the goal is to find a tuple most similar to . In other words, we want to approximate the -matrices with -matrices.

Suppose that Define a function by setting

, and define.

If is a matrix, then let denote the spectral radius of .

Define the -spectral radius similarity between and

by setting

The quantity is always a real number in the interval (the proof is a generalization of the Cauchy-Schwarz inequality).

If , and , then we say that is an -spectral radius dimensionality reduction (LSRDR) of if the similarity is locally maximized.

One can produce an LSRDR of simply by locally maximizing using gradient ascent. But there is another way to obtain LSRDRs since they behave mathematically.

Let denote the center of the algebra . In particular,.

Empirical observation: If is an LSRDR of , then typically exists along with matrices with for all and where if , then is a (typically non-orthogonal) projection matrix. The projection matrix is typically unique in the sense that if we train an LSRDR of with rank multiple times, then we generally end up with the same projection matrix . If the projection matrix is unique, then we shall call the canonical LSRDR projection matrix of rank . Let denote the dominant eigenvector of with (in the quaternionic case, we should define the trace as the real part of the sum of diagonal entries in the matrix). Let denote the dominant eigenvector of with . Then are positive semidefinite matrices with and .

Said differently, along with the eigenvalue is a solution to the following equations:

  1. .

  2. .

  3. .

  4. .

  5. .

  6. .

One may often solve an equation or system of equations using iteration. For example, if is a complete metric space, and is a function with for all , and , then by the contraction mapping theorem. We shall also apply this idea to produce the in an LSRDR since the satisfy a system of equations.

We need a function that can be easily applied to matrices but where each projection matrix is an attractive fixed point for . Consider the function . We observe that . Furthermore, if , then the sequence converges to very quickly and if , then converges to very quickly. Actually, there are neighborhoods of and of in the complex plane where if , then converges to quickly and if , then converges to quickly. As a consequence, if is a projection matrix, then there is a neighborhood of where if , then converges to some projection matrix very quickly. Let Define a partial function by setting .

Iterative algorithm for computing LSRDRs: Let but set to a sufficiently small value. Let . Let be an random orthonormal projection of rank . Let be random matrices in . Define for all recursively by setting

  1. ,

  2. ,

  3. ,

  4. , and

  5. .

Then typically converges to a matrix of rank at an exponential rate. Furthermore, the matrix typically does not depend on the initialization , but does depend on and . Therefore set whenever does not depend on the initialization (just assume that the limit converges). If , then is typically the canonical rank LSRDR projection operator.

The iterative algorithm for computing LSRDRs is a proper extension of the notion of an LSRDR in some cases. For example, if we simply count the number of parameters, the matrices have parameters while the matrices have parameters. Therefore, if , then the matrices (and the projection matrix ) have more parameters than . This means that we cannot obtain a unique canonical LSRDR projection of dimension when . On the other hand, even when , the matrix typically exists, and if we set , there are where and where is an LSRDR of . This means that the iterative algorithm for computing LSRDRs gives more information than simply an LSRDR. The iterative algorithm for computing LSRDRs gives a projection operator.

Interpretability: Since there are several ways of computing LSRDRs, LSRDRs behave mathematically. A machine learning algorithm that typically produces the same output is more interpretable than one that does not for a few reasons including the conclusion that when there is only one output of a machine learning algorithm, that one output only depends on the input and it does not have any other source of random information contributing to it. Since we typically (but not always) attain the same local maximum when training LSRDRs multiple times, LSRDRs are both interpretable and mathematical. This sort of mathematical behavior is what we need to make sense of the inner workings of LSRDRs and other machine learning algorithms. There are several ways to generalize the notion of an LSRDR, but these generalized machine learning algorithms still tend to behave mathematically; they still tend to produce the exact same trained model after training multiple times.

Capabilities: Trained LSRDRs can solve NP-complete problems such as the clique problem. I have also trained LSRDRs to produce word embeddings for natural language processing and to analyze the octonions. I can generalize LSRDRs so that they behave more like deep neural networks, but we still have a way to go until these generalized LSRDRs perform as well as our modern AI algorithms, and I do not know how far we can push the performance of generalized LSRDRs while retaining their inherent interpretability and mathematical behavior.

If mathematical generalized LSRDRs can compete in performance with neural networks, then we are closer to solving the problem of general AI interpretability. But if not, then mathematical generalized LSRDRs could possibly be used as a narrow AI tool or even as a component of general AI; in this case, generalized LSRDRs could still improve general AI interpretability a little bit. An increased usage of narrow AI will (slightly) decrease the need for general AI.

Added 5/​28/​2025

When , the iterative algorithm for computing LSRDRs produces a low rank projection matrix, but a matrix of rank can be factored as where . To save computational resources, we may just work with during the training.

Suppose that with . Then there are several ways to easily factor as the product of an -matrix with a -matrix including the following factorizations:

.

Therefore, define

.

In particular, if , then .

Let for all , and define .

In the following algorithm, we will need to sometimes replace a pair with a new pair such that but where has norm smaller than because if we do not do this, after much training, the norm of will grow very large while is just a projection matrix. To do this, we decrease the norm of using gradient descent. In particular, we observe that . We are taking the gradient when , so we may have when is small. We want to move in the direction that minimizes the sum , but

Iterative algorithm for computing LSRDRs with low rank factorization:

Let but needs to be sufficiently small. Let . Set . Let be random rank matrices. Define recursively for all by setting

  1. ,

  2. ,

  3. ,

  4. ,

  5. ,

  6. ,

  7. , and

  8. .

Then if everything goes right, would converge to at an exponential rate.

Added 5/​31/​2025

The iterative algorithm for computing LSRDRs can be written in terms of completely positive operators. We define a completely positive superoperator as an operator of the form where are square matrices over . Completely positive operators are typically defined when for quantum information theory, but we are using a more general context here. If , then

and

.

Added 6/​7/​2025

Iterative algorithm for computing LSRDRs apply not just to completely positive superoperators, but it also applies to positive operators that are not completely positive as long as the superoperators do not stray too far away from complete positivity. A linear superoperator is said to be positive if is positive semidefinite whenever is positive semidefinite. Clearly, every completely positive superoperator is positive, but not every positive operator is completely positive. For example, the transpose map defined by is always positive but not completely positive whenever . The difference of two completely positive superoperators from to is known as a Hermitian preserving map. Every positive map is Hermitian preserving. If is linear but not too far from positivity, then use the iterative algorithm for computing LSRDRs, we typically get the same results every time we train, and if is Hermitian preserving, then the operators will be positive semidefinite after training.