In this post, we shall describe 3 related fitness functions with discrete domains where the process of maximizing these functions is pseudodeterministic in the sense that if we locally maximize the fitness function multiple times, then we typically attain the same local maximum; this appears to be an important aspect of AI safety. These fitness functions are my own. While these functions are far from deep neural networks, I think they are still related to AI safety since they are closely related to other fitness functions that are locally maximized pseudodeterministically that more closely resemble deep neural networks.
Let K denote a finite dimensional algebra over the field of real numbers together with an adjoint operation ∗ (the operation ∗ is a linear involution with (xy)∗=y∗x∗). For example, K could be the field of real numbers, complex numbers, quaternions, or a matrix ring over the reals, complex, or quaternions. We can extend the adjoint ∗ to the matrix ring Mr(K) by setting (xi,j)∗i,j=(x∗j,i)i,j.
Let n be a natural number. If A1,…,Ar∈Mn(K),X1,…,Xr∈Md(K), then define
Γ(A1,…,Ar;X1,…,Xr):Mn,d(K)→Mn,d(K) by setting Γ(A1,…,Ar;X1,…,Xr)(X)=A1XX∗1+⋯+ArXX∗r.
Suppose now that 1≤d<n. Then let Sd⊆Mn,n(K) be the set of all 0,1-diagonal matrices with d many 1’s on the diagonal. We observe that each element in Sd is an orthogonal projection. Define fitness functions Fd,Gd,Hd:Sd→R by setting
Fd(P)=ρ(Γ(A1,…,Ar;PA1P,…,PArP)),
Gd(P)=ρ(Γ(PA1P,…,PArP;PA1P,…,PArP)), and
Hd(P)=Fd(P)2Gd(P). Here, ρ denotes the spectral radius.
Fd(P) is typically slightly larger than Gd(P), so these three fitness functions are closely related.
If P,Q∈Sd, then we say that Q is in the neighborhood of P if Q differs from P by at most 2 entries. If F is a fitness function with domain Sd, then we say that (P,F(P)) is a local maximum of the function F if F(P)≥F(Q) whenever Q is in the neighborhood of P.
The path from initialization to a local maximum (Ps,F(Ps)) for will be a sequence (P0,…,Ps) where Pj is always in the neighborhood of Pj−1 and where F(Pj)≥F(Pj−1) for all j and the length of the path will be s and where P0 is generated uniformly randomly.
Empirical observation: Suppose that F∈{Fd,Gd,Hd}. If we compute a path from initialization to local maximum for F, then such a path will typically have length less than n. Furthermore, if we locally maximize F multiple times, we will typically obtain the same local maximum each time. Moreover, if PF,PG,PH are the computed local maxima of Fd,Gd,Hd respectively, then PF,PG,PH will either be identical or differ by relatively few diagonal entries.
I have not done the experiments yet, but one should be able to generalize the above empirical observation to matroids. Suppose that M is a basis matroid with underlying set {1,…,n} and where |A|=d for each A∈M. Then one should be able to make the same observation about the fitness functions Fd|M,Gd|M,Hd|M as well.
We observe that the problems of maximizing Fd,Gd,Hd are all NP-complete problems since the clique problems can be reduced to special cases of maximizing Fd,Gd,Hd. This means that the problems of maximizing Fd,Gd,Hd can be sophisticated problems, but this also means that we should not expect it to be easy to find the global maxima for Fd,Gd,Hd in some cases.
In this post, we shall describe 3 related fitness functions with discrete domains where the process of maximizing these functions is pseudodeterministic in the sense that if we locally maximize the fitness function multiple times, then we typically attain the same local maximum; this appears to be an important aspect of AI safety. These fitness functions are my own. While these functions are far from deep neural networks, I think they are still related to AI safety since they are closely related to other fitness functions that are locally maximized pseudodeterministically that more closely resemble deep neural networks.
Let K denote a finite dimensional algebra over the field of real numbers together with an adjoint operation ∗ (the operation ∗ is a linear involution with (xy)∗=y∗x∗). For example, K could be the field of real numbers, complex numbers, quaternions, or a matrix ring over the reals, complex, or quaternions. We can extend the adjoint ∗ to the matrix ring Mr(K) by setting (xi,j)∗i,j=(x∗j,i)i,j.
Let n be a natural number. If A1,…,Ar∈Mn(K),X1,…,Xr∈Md(K), then define
Γ(A1,…,Ar;X1,…,Xr):Mn,d(K)→Mn,d(K) by setting Γ(A1,…,Ar;X1,…,Xr)(X)=A1XX∗1+⋯+ArXX∗r.
Suppose now that 1≤d<n. Then let Sd⊆Mn,n(K) be the set of all 0,1-diagonal matrices with d many 1’s on the diagonal. We observe that each element in Sd is an orthogonal projection. Define fitness functions Fd,Gd,Hd:Sd→R by setting
Fd(P)=ρ(Γ(A1,…,Ar;PA1P,…,PArP)),
Gd(P)=ρ(Γ(PA1P,…,PArP;PA1P,…,PArP)), and
Hd(P)=Fd(P)2Gd(P). Here, ρ denotes the spectral radius.
Fd(P) is typically slightly larger than Gd(P), so these three fitness functions are closely related.
If P,Q∈Sd, then we say that Q is in the neighborhood of P if Q differs from P by at most 2 entries. If F is a fitness function with domain Sd, then we say that (P,F(P)) is a local maximum of the function F if F(P)≥F(Q) whenever Q is in the neighborhood of P.
The path from initialization to a local maximum (Ps,F(Ps)) for will be a sequence (P0,…,Ps) where Pj is always in the neighborhood of Pj−1 and where F(Pj)≥F(Pj−1) for all j and the length of the path will be s and where P0 is generated uniformly randomly.
Empirical observation: Suppose that F∈{Fd,Gd,Hd}. If we compute a path from initialization to local maximum for F, then such a path will typically have length less than n. Furthermore, if we locally maximize F multiple times, we will typically obtain the same local maximum each time. Moreover, if PF,PG,PH are the computed local maxima of Fd,Gd,Hd respectively, then PF,PG,PH will either be identical or differ by relatively few diagonal entries.
I have not done the experiments yet, but one should be able to generalize the above empirical observation to matroids. Suppose that M is a basis matroid with underlying set {1,…,n} and where |A|=d for each A∈M. Then one should be able to make the same observation about the fitness functions Fd|M,Gd|M,Hd|M as well.
We observe that the problems of maximizing Fd,Gd,Hd are all NP-complete problems since the clique problems can be reduced to special cases of maximizing Fd,Gd,Hd. This means that the problems of maximizing Fd,Gd,Hd can be sophisticated problems, but this also means that we should not expect it to be easy to find the global maxima for Fd,Gd,Hd in some cases.