Does it make sense to extract sparse feature graph for a behavior from only residual layers of gpt2 small or do we need all mlp and attention as well?
[Question] SAE sparse feature graph using only residual layers
Does it make sense to extract sparse feature graph for a behavior from only residual layers of gpt2 small or do we need all mlp and attention as well?