I was thinking of constructing a deep neural network by interlacing ordinary linear layers x↦Ax+b with tropical layers x↦(C⊙x)⊕d where the ⊙,⊕ refer to tropical matrix multiplication and addition. One can think of this as replacing the ReLU activation with the more complicated expression (x1,…,xn)↦max(a1+x1,…,an+xn,d)=a1⊗x1⊕⋯⊕an⊗xn⊕d. Someone probably thought about this already and they may have already run experiments using this approach.
One problem with this approach is that for an input x1,…,xn, there will typically be a unique j∈{0,…,n} where max(a1+x1,…,an+xn,d)=aj+xj or where j=0,max(a1+x1,…,an+xn,d)=d. This means that only one of the biases a1,…,an,d will contribute to the output max(a1+x1,…,an+xn,d). Once trained, this is not a bad thing because it simply means that the tropical operation is sparse and sparse matrices are good for saving space, but during training we may want to pump up the values of a1,…,an,d so that(P(max(a1+x1,…,an+xn,d)=aj+xj)>α/(n+1) and at the end of training, we can allow for the probability (P(max(a1+x1,…,an+xn,d)=aj+xj) to go down so that we can reduce the number of weights in the tropical portion of the matrix.
Another possible issue with tropical matrices is how tropical matrices may reduce the dimension of the data. Consider the tropical layer TC,d(x)=(C⊙x)⊕d where C is an m×n-matrix. Then for almost all x, there will be a function fx:{1,…,n}→{1,…,m} where J(TC,d)(x)=(δfx(i),j)i,j (J stands for the Jacobian). If image Im(fx) is too small, then the tropical layer TC,d may throw away too much information from the vector x.
One may also need to apply a couple of other tricks to make sure that the tropical layers work right. For example, one can start the training just by using ReLU activation and then after everything is going right, we can replace ReLU activation with the more general tropical layers. By starting off with ReLU, we can ensure that Im(fx) is large enough so that the tropical layers do not destroy too much information. Further training with tropical layers can only improve the network.
Any ReLU MLP can be turned into a composition of ordinary linear layers with tropical linear layers. Is there any reason why this would not work? I think I should do a couple of experiments with tropical matrix operations in the place of ReLU to see if it works and what should be done to optimize neural networks formed by interlacing tropical matrix operations with ordinary matrix operations. In an MLP, most of the parameters are the entries for a matrix in order to linearly transform a vector to another vector, but for some reason, we do not load the the activation layer with parameters.
I do not think that I have much more knowledge of tropical geometry than the average mathematician, so I do not think I am too biased in favor of tropical geometry.
I was thinking of constructing a deep neural network by interlacing ordinary linear layers x↦Ax+b with tropical layers x↦(C⊙x)⊕d where the ⊙,⊕ refer to tropical matrix multiplication and addition. One can think of this as replacing the ReLU activation with the more complicated expression (x1,…,xn)↦max(a1+x1,…,an+xn,d)=a1⊗x1⊕⋯⊕an⊗xn⊕d. Someone probably thought about this already and they may have already run experiments using this approach.
One problem with this approach is that for an input x1,…,xn, there will typically be a unique j∈{0,…,n} where max(a1+x1,…,an+xn,d)=aj+xj or where j=0,max(a1+x1,…,an+xn,d)=d. This means that only one of the biases a1,…,an,d will contribute to the output max(a1+x1,…,an+xn,d). Once trained, this is not a bad thing because it simply means that the tropical operation is sparse and sparse matrices are good for saving space, but during training we may want to pump up the values of a1,…,an,d so that(P(max(a1+x1,…,an+xn,d)=aj+xj)>α/(n+1) and at the end of training, we can allow for the probability (P(max(a1+x1,…,an+xn,d)=aj+xj) to go down so that we can reduce the number of weights in the tropical portion of the matrix.
Another possible issue with tropical matrices is how tropical matrices may reduce the dimension of the data. Consider the tropical layer TC,d(x)=(C⊙x)⊕d where C is an m×n-matrix. Then for almost all x, there will be a function fx:{1,…,n}→{1,…,m} where J(TC,d)(x)=(δfx(i),j)i,j (J stands for the Jacobian). If image Im(fx) is too small, then the tropical layer TC,d may throw away too much information from the vector x.
One may also need to apply a couple of other tricks to make sure that the tropical layers work right. For example, one can start the training just by using ReLU activation and then after everything is going right, we can replace ReLU activation with the more general tropical layers. By starting off with ReLU, we can ensure that Im(fx) is large enough so that the tropical layers do not destroy too much information. Further training with tropical layers can only improve the network.
Any ReLU MLP can be turned into a composition of ordinary linear layers with tropical linear layers. Is there any reason why this would not work? I think I should do a couple of experiments with tropical matrix operations in the place of ReLU to see if it works and what should be done to optimize neural networks formed by interlacing tropical matrix operations with ordinary matrix operations. In an MLP, most of the parameters are the entries for a matrix in order to linearly transform a vector to another vector, but for some reason, we do not load the the activation layer with parameters.
I do not think that I have much more knowledge of tropical geometry than the average mathematician, so I do not think I am too biased in favor of tropical geometry.