I agree. There is a tradeoff here for the L0/MSE curve & circuit-simplicity.
I guess another problem (w/ SAEs in general) is optimizing for L0 leads to feature absorption. However, I’m unsure of a metric (other than the L0/MSE) that does capture what we want.
I agree. There is a tradeoff here for the L0/MSE curve & circuit-simplicity.
I guess another problem (w/ SAEs in general) is optimizing for L0 leads to feature absorption. However, I’m unsure of a metric (other than the L0/MSE) that does capture what we want.