Logan Riggs comments on Deep sparse autoencoders yield interpretable features too

Logan Riggs 24 Feb 2025 19:49 UTC
4 points
0
I agree. There is a tradeoff here for the L0/MSE curve & circuit-simplicity.

I guess another problem (w/ SAEs in general) is optimizing for L0 leads to feature absorption. However, I’m unsure of a metric (other than the L0/MSE) that does capture what we want.