I am going to share an algorithm that I came up with that tends to produce the same result when we run it multiple times with a different initialization. The iteration is not even guaranteed convergence since we are not using gradient ascent, but it typically converges as long as the algorithm is given a reasonable input. This suggests that the algorithm behaves mathematically and may be useful for things such as quantum error correction. After analyzing the algorithm, I shall use the algorithm to solve a computational problem.
We say that an algorithm is pseudodeterministic if it tends to return the same output even if the computation leading to that output is non-deterministic (due to a random initialization). I believe that we should focus a lot more on pseudodetermistic machine learning algorithms for AI safety and interpretability since pseudodeterministic algorithms are inherently interpretable.
Define f(z)=3z2−2z3 for all complex numbers z. Then f(0)=0,f(1)=1,f′(0)=f′(1)=0, and there are neighborhoods U,V of 0,1 respectively where if x∈U, then fN(x)→0 quickly and if y∈V, then fN(y)→1 quickly. Set f∞=limN→∞fN. The function f∞ serves as error correction for projection matrices since if Q is nearly a projection matrix, then f∞(Q) will be a projection matrix.
Suppose that K is either the field of real numbers, complex numbers or quaternions. Let Z(K) denote the center of K. In particular, Z(R)=R,Z(C)=C,Z(H)=R.
If A1,…,Ar are m×n-matrices, then define Φ(A1,…,Ar):Mn(K)→Mm(K) by setting Φ(A1,…,Ar)=∑rk=1AkXA∗k. Then we say that an operator of the form Φ(A1,…,Ar) is completely positive. We say that a Z(K)-linear operator E:Mn(K)→Mm(K) is Hermitian preserving if E(X) is Hermitian whenever X is Hermitian. Every completely positive operator is Hermitian preserving.
Suppose that E:Mn(K)→Mn(K) is Z(K)-linear. Let t>0. Let P0∈Mn(K) be a random orthogonal projection matrix of rank d. Set PN+1=f∞(PN+t⋅E(PN)) for all N. Then if everything goes well, the sequence (PN)N will converge to a projection matrix P of rank d, and the projection matrix P will typically be unique in the sense that if we run the experiment again, we will typically obtain the exact same projection matrix P. If E is Hermitian preserving, then the projection matrix P will typically be an orthogonal projection. This experiment performs well especially when E is completely positive or at least Hermitian preserving or nearly so. The projection matrix P will satisfy the equation P⋅E(P)=E(P)⋅P=P⋅E(P)⋅P.
In the case when E is a quantum channel, we can easily explain what the projection P does. The operator P is a projection onto a subspace of complex Euclidean space that is particularly well preserved by the channel E. In particular, the image Im(P) is spanned by the top d eigenvectors of E(P). This means that if we send the completely mixed state P/d through the quantum channel E and we measure the state E(P/d) with respect to the projective measurement (P,I−P), then there is an unusually high probability that this measurement will land on P instead of I−P.
Let us now use the algorithm that obtains P from E to solve a problem in many cases.
If x is a vector, then let Diag(x) denote the diagonal matrix where x is the vector of diagonal entries, and if X is a square matrix, then let Diag(X) denote the diagonal of X. If x is a length n vector, then Diag(x) is an n×n-matrix, and if X is an n×n-matrix, then Diag(X) is a length n vector.
Problem Input: An n×n-square matrix A with non-negative real entries and a natural number d with 1≤d<n.
Objective: Find a subset B⊆{1,…,n} with |B|=d and where if x=A⋅χB, then the d largest entries in x are the values x[b] for b∈B.
Algorithm: Let E be the completely positive operator defined by setting E(X)=Diag(A⋅Diag(X)). Then we run the iteration using E to produce an orthogonal projection P with rank d. In this case, the projection P will be a diagonal projection matrix with rank d where diag(P)=χB and where B is our desired subset of {1,…,n}.
While the operator P is just a linear operator, the pseudodeterminism of the algorithm that produces the operator P generalizes to other pseudodeterministic algorithms that return models that are more like deep neural networks.
I am going to share an algorithm that I came up with that tends to produce the same result when we run it multiple times with a different initialization. The iteration is not even guaranteed convergence since we are not using gradient ascent, but it typically converges as long as the algorithm is given a reasonable input. This suggests that the algorithm behaves mathematically and may be useful for things such as quantum error correction. After analyzing the algorithm, I shall use the algorithm to solve a computational problem.
We say that an algorithm is pseudodeterministic if it tends to return the same output even if the computation leading to that output is non-deterministic (due to a random initialization). I believe that we should focus a lot more on pseudodetermistic machine learning algorithms for AI safety and interpretability since pseudodeterministic algorithms are inherently interpretable.
Define f(z)=3z2−2z3 for all complex numbers z. Then f(0)=0,f(1)=1,f′(0)=f′(1)=0, and there are neighborhoods U,V of 0,1 respectively where if x∈U, then fN(x)→0 quickly and if y∈V, then fN(y)→1 quickly. Set f∞=limN→∞fN. The function f∞ serves as error correction for projection matrices since if Q is nearly a projection matrix, then f∞(Q) will be a projection matrix.
Suppose that K is either the field of real numbers, complex numbers or quaternions. Let Z(K) denote the center of K. In particular, Z(R)=R,Z(C)=C,Z(H)=R.
If A1,…,Ar are m×n-matrices, then define Φ(A1,…,Ar):Mn(K)→Mm(K) by setting Φ(A1,…,Ar)=∑rk=1AkXA∗k. Then we say that an operator of the form Φ(A1,…,Ar) is completely positive. We say that a Z(K)-linear operator E:Mn(K)→Mm(K) is Hermitian preserving if E(X) is Hermitian whenever X is Hermitian. Every completely positive operator is Hermitian preserving.
Suppose that E:Mn(K)→Mn(K) is Z(K)-linear. Let t>0. Let P0∈Mn(K) be a random orthogonal projection matrix of rank d. Set PN+1=f∞(PN+t⋅E(PN)) for all N. Then if everything goes well, the sequence (PN)N will converge to a projection matrix P of rank d, and the projection matrix P will typically be unique in the sense that if we run the experiment again, we will typically obtain the exact same projection matrix P. If E is Hermitian preserving, then the projection matrix P will typically be an orthogonal projection. This experiment performs well especially when E is completely positive or at least Hermitian preserving or nearly so. The projection matrix P will satisfy the equation P⋅E(P)=E(P)⋅P=P⋅E(P)⋅P.
In the case when E is a quantum channel, we can easily explain what the projection P does. The operator P is a projection onto a subspace of complex Euclidean space that is particularly well preserved by the channel E. In particular, the image Im(P) is spanned by the top d eigenvectors of E(P). This means that if we send the completely mixed state P/d through the quantum channel E and we measure the state E(P/d) with respect to the projective measurement (P,I−P), then there is an unusually high probability that this measurement will land on P instead of I−P.
Let us now use the algorithm that obtains P from E to solve a problem in many cases.
If x is a vector, then let Diag(x) denote the diagonal matrix where x is the vector of diagonal entries, and if X is a square matrix, then let Diag(X) denote the diagonal of X. If x is a length n vector, then Diag(x) is an n×n-matrix, and if X is an n×n-matrix, then Diag(X) is a length n vector.
Problem Input: An n×n-square matrix A with non-negative real entries and a natural number d with 1≤d<n.
Objective: Find a subset B⊆{1,…,n} with |B|=d and where if x=A⋅χB, then the d largest entries in x are the values x[b] for b∈B.
Algorithm: Let E be the completely positive operator defined by setting E(X)=Diag(A⋅Diag(X)). Then we run the iteration using E to produce an orthogonal projection P with rank d. In this case, the projection P will be a diagonal projection matrix with rank d where diag(P)=χB and where B is our desired subset of {1,…,n}.
While the operator P is just a linear operator, the pseudodeterminism of the algorithm that produces the operator P generalizes to other pseudodeterministic algorithms that return models that are more like deep neural networks.