I’m a little confused about which computation you’re trying to sparsify. The paper seems to be written in the context of the technique where one uses sparse autoencoders to extract “features” from the embedding space of large language models which are hopefully interpretable. (Please correct me if I’m wrong about that!)
The goal would seem to be, then, to sparsify the computation of the language model. However, the method in your paper seems to sparsify the computation of the autoencoders themselves, not the language model. Shouldn’t the goal be to sparsify the language model’s computation? If so, why not use weight pruning? What is JSAE better at?
I’m a little confused about which computation you’re trying to sparsify. The paper seems to be written in the context of the technique where one uses sparse autoencoders to extract “features” from the embedding space of large language models which are hopefully interpretable. (Please correct me if I’m wrong about that!)
The goal would seem to be, then, to sparsify the computation of the language model. However, the method in your paper seems to sparsify the computation of the autoencoders themselves, not the language model. Shouldn’t the goal be to sparsify the language model’s computation? If so, why not use weight pruning? What is JSAE better at?