Hey, I really like the idea and have been thinking about something similar lately—which is how I found your posts. However, I think it would be interesting to not only look at the inputs/outputs of the LLM, but at both feature activation, and the “dynamics” of them, along a longer input/chain-of-thought. To me the real problem here, however, would be to define good quantities/observables which you would investigate for equivariance, as this seems very fuzzy and more ill-defined than the nice and simple representation of an image in the hidden layers of a CNN. Would love to read your thoughts on this, because I really do think, that thinking about this and looking at some toy-models might be a worthwhile endeavour.
Hey,
I really like the idea and have been thinking about something similar lately—which is how I found your posts. However, I think it would be interesting to not only look at the inputs/outputs of the LLM, but at both feature activation, and the “dynamics” of them, along a longer input/chain-of-thought.
To me the real problem here, however, would be to define good quantities/observables which you would investigate for equivariance, as this seems very fuzzy and more ill-defined than the nice and simple representation of an image in the hidden layers of a CNN.
Would love to read your thoughts on this, because I really do think, that thinking about this and looking at some toy-models might be a worthwhile endeavour.
Cheers