I don’t understand. Doesn’t that shift the problem from the weight matrix to the rotation matrix? Yes, you know how inputs corresponds to outputs, but now there is this new matrix between outputs and the inputs of the next layer and it creates the same non-locality.
I’m sorry if this is stupid, my linear algebra course was a very long time ago.
You don’t have to compute the rotation every time for the weight matrix. You can compute it once. It’s true that you have to actually rotate the input activations for every input but that’s really trivial.
Think of the rotation as an adjustment to our original program’s representation, which keeps the program the same, but hopefully makes it clearer what it’s doing. It would probably be interesting to see what exactly the adjustment is doing, like how it’s interesting to see how someone would go about translating assembly to a python script, but not particularly your highest priority when trying to figure out what the assembly program is doing after you have the corresponding python script.
I was going to ask the same thing. It may not be possible to create a simple vector representation of the circuit if the circuit must simulate a complex nonlinear system. Doesn’t seem possible, if the inputs are embeddings or sensor data from real world causal dynamic systems like images and text; and the outputs are vector space representations of the meaning in the inputs (semantic segmentation, NLP, etc). If it were possible it’s like saying there’s a simple, consitent vector representation of all the common sense reasoning about a particular image or natural language text. And your rotated explainable embedding would be far more accurate and robust than the original spaghetti circuit/program. Some programs are irreducible.
I think the best you can do is something like Capsule Nets (Hinton) which are many of your rotations (just smaller, 4 d quaternions, I think) distributed throughout the circuit.
I don’t understand. Doesn’t that shift the problem from the weight matrix to the rotation matrix? Yes, you know how inputs corresponds to outputs, but now there is this new matrix between outputs and the inputs of the next layer and it creates the same non-locality.
I’m sorry if this is stupid, my linear algebra course was a very long time ago.
You don’t have to compute the rotation every time for the weight matrix. You can compute it once. It’s true that you have to actually rotate the input activations for every input but that’s really trivial.
Think of the rotation as an adjustment to our original program’s representation, which keeps the program the same, but hopefully makes it clearer what it’s doing. It would probably be interesting to see what exactly the adjustment is doing, like how it’s interesting to see how someone would go about translating assembly to a python script, but not particularly your highest priority when trying to figure out what the assembly program is doing after you have the corresponding python script.
Thanks! I think I understood.
I was going to ask the same thing. It may not be possible to create a simple vector representation of the circuit if the circuit must simulate a complex nonlinear system. Doesn’t seem possible, if the inputs are embeddings or sensor data from real world causal dynamic systems like images and text; and the outputs are vector space representations of the meaning in the inputs (semantic segmentation, NLP, etc). If it were possible it’s like saying there’s a simple, consitent vector representation of all the common sense reasoning about a particular image or natural language text. And your rotated explainable embedding would be far more accurate and robust than the original spaghetti circuit/program. Some programs are irreducible.
I think the best you can do is something like Capsule Nets (Hinton) which are many of your rotations (just smaller, 4 d quaternions, I think) distributed throughout the circuit.