Joseph Van Name comments on Joseph Van Name’s Shortform

Joseph Van Name 12 Jun 2025 6:36 UTC
3 points
0
I am going to share an algorithm that I came up with that tends to produce the same result when we run it multiple times with a different initialization. The iteration is not even guaranteed convergence since we are not using gradient ascent, but it typically converges as long as the algorithm is given a reasonable input. This suggests that the algorithm behaves mathematically and may be useful for things such as quantum error correction. After analyzing the algorithm, I shall use the algorithm to solve a computational problem.
We say that an algorithm is pseudodeterministic if it tends to return the same output even if the computation leading to that output is non-deterministic (due to a random initialization). I believe that we should focus a lot more on pseudodetermistic machine learning algorithms for AI safety and interpretability since pseudodeterministic algorithms are inherently interpretable.
Define $f (z) = 3 z^{2} - 2 z^{3}$ for all complex numbers $z$ . Then $f (0) = 0, f (1) = 1, f^{'} (0) = f^{'} (1) = 0$ , and there are neighborhoods $U, V$ of $0, 1$ respectively where if $x \in U$ , then $f^{N} (x) \to 0$ quickly and if $y \in V$ , then $f^{N} (y) \to 1$ quickly. Set $f^{\infty} = {lim}_{N \to \infty} f^{N}$ . The function $f^{\infty}$ serves as error correction for projection matrices since if $Q$ is nearly a projection matrix, then $f^{\infty} (Q)$ will be a projection matrix.
Suppose that $K$ is either the field of real numbers, complex numbers or quaternions. Let $Z (K)$ denote the center of $K$ . In particular, $Z (R) = R, Z (C) = C, Z (H) = R$ .
If $A_{1}, \dots, A_{r}$ are $m \times n$ -matrices, then define $Φ (A_{1}, \dots, A_{r}) : M_{n} (K) \to M_{m} (K)$ by setting $Φ (A_{1}, \dots, A_{r}) = \sum_{k = 1}^{r} A_{k} X A_{k}^{*}$ . Then we say that an operator of the form $Φ (A_{1}, \dots, A_{r})$ is completely positive. We say that a $Z (K)$ -linear operator $E : M_{n} (K) \to M_{m} (K)$ is Hermitian preserving if $E (X)$ is Hermitian whenever $X$ is Hermitian. Every completely positive operator is Hermitian preserving.
Suppose that $E : M_{n} (K) \to M_{n} (K)$ is $Z (K)$ -linear. Let $t > 0$ . Let $P_{0} \in M_{n} (K)$ be a random orthogonal projection matrix of rank $d$ . Set $P_{N + 1} = f^{\infty} (P_{N} + t \cdot E (P_{N}))$ for all $N$ . Then if everything goes well, the sequence $(P_{N})_{N}$ will converge to a projection matrix $P$ of rank $d$ , and the projection matrix $P$ will typically be unique in the sense that if we run the experiment again, we will typically obtain the exact same projection matrix $P$ . If $E$ is Hermitian preserving, then the projection matrix $P$ will typically be an orthogonal projection. This experiment performs well especially when $E$ is completely positive or at least Hermitian preserving or nearly so. The projection matrix $P$ will satisfy the equation $P \cdot E (P) = E (P) \cdot P = P \cdot E (P) \cdot P$ .
In the case when $E$ is a quantum channel, we can easily explain what the projection $P$ does. The operator $P$ is a projection onto a subspace of complex Euclidean space that is particularly well preserved by the channel $E$ . In particular, the image $Im (P)$ is spanned by the top $d$ eigenvectors of $E (P)$ . This means that if we send the completely mixed state $P / d$ through the quantum channel $E$ and we measure the state $E (P / d)$ with respect to the projective measurement $(P, I - P)$ , then there is an unusually high probability that this measurement will land on $P$ instead of $I - P$ .
Let us now use the algorithm that obtains $P$ from $E$ to solve a problem in many cases.
If $x$ is a vector, then let $Diag (x)$ denote the diagonal matrix where $x$ is the vector of diagonal entries, and if $X$ is a square matrix, then let $Diag (X)$ denote the diagonal of $X$ . If $x$ is a length $n$ vector, then $Diag (x)$ is an $n \times n$ -matrix, and if $X$ is an $n \times n$ -matrix, then $Diag (X)$ is a length $n$ vector.
Problem Input: An $n \times n$ -square matrix $A$ with non-negative real entries and a natural number $d$ with $1 \leq d < n$ .
Objective: Find a subset $B \subseteq {1, \dots, n}$ with $| B | = d$ and where if $x = A \cdot χ_{B}$ , then the $d$ largest entries in $x$ are the values $x [b]$ for $b \in B$ .
Algorithm: Let $E$ be the completely positive operator defined by setting $E (X) = Diag (A \cdot Diag (X))$ . Then we run the iteration using $E$ to produce an orthogonal projection $P$ with rank $d$ . In this case, the projection $P$ will be a diagonal projection matrix with rank $d$ where $diag (P) = χ_{B}$ and where $B$ is our desired subset of ${1, \dots, n}$ .
While the operator $P$ is just a linear operator, the pseudodeterminism of the algorithm that produces the operator $P$ generalizes to other pseudodeterministic algorithms that return models that are more like deep neural networks.