I was able to completely interpret a simple machine learning model trained on some cryptographic input. This objective is a special case of something I call an LSRDR which is a machine learning algorithm that I created in order to analyze block ciphers for cryptocurrency mining.
Set
Here,
The scenario where we obtain an overly perfect and completely interpretation to the local optimum happens all the time with these sorts of optimization algorithms that I have been working on, so if we want to develop more interpretable machine learning, it seems like this is the right direction to go. Of course, my trained model is very simple, so we need to do a substantial amount of work to generalize this sort of machine learning algorithm to something like a deep neural network. I am making progress, but it takes more computational power than I have to make progress with inherently interpretable deep learning.
In this post, we shall compute average loss/fitness level for a linear dimensionality reduction.
The purpose of these calculations is to demonstrate that such a linear dimensionality reduction behaves mathematically and should be used as a simple model for what your loss/fitness functions should look like in AI/ML if you want your AI/ML to be well-behaved and interpretable.
Suppose that is either the field or real numbers, the field of complex numbers, or the division ring of quaternions. Suppose that is a -dimensional inner product space over the field
Suppose that is a measure over the unit sphere in . Then the objective is to find an optimal -dimensional subspace of for the measure . Let be a function. Therefore, define a function mapping the set of all -dimensional orthogonal projection matrices to by setting . The goal is to find an orthogonal projection that maximize/minimizes .
Let . Then, let be independent random variables each following the standard normal distribution on one real-variable. Then observe that follows the Chi-squared distribution with degrees of freedom. If follows the Chi-square distribution with degrees of freedom, then where is the digamma function. Let be a probability measure on the unit sphere of , and let be the uniform probability measure on the set of all orthogonal projections from to of rank . Then
where the random variable follows the F-distribution with and degrees of freedom. From standard facts about the F-distribution, we know that if and is a positive integer, then