While authors claim that their approach is fundamentally different from transcoders, from my perspective, it addresses thesame issue: finding interpretable circuits. I agree that it modulates MLPs in different ways (e.g., connecting sparse inputs and sparse outputs, rather than modulating MLP directly). However, it would be greatto see thedifference between circuits identified by transcoders and those found by JSAE.
We also discuss this similarity in Appendix F of our recent work (where transition Tis analogous to f), though we do not consider gradient-based approaches. Nevertheless, the most similar features between input and output can still sufficiently explainfeature dynamics being a good baseline.
While authors claim that their approach is fundamentally different from transcoders, from my perspective, it addresses the same issue: finding interpretable circuits. I agree that it modulates MLPs in different ways (e.g., connecting sparse inputs and sparse outputs, rather than modulating MLP directly). However, it would be great to see the difference between circuits identified by transcoders and those found by JSAE.
We also discuss this similarity in Appendix F of our recent work (where transition T is analogous to f ), though we do not consider gradient-based approaches. Nevertheless, the most similar features between input and output can still sufficiently explain feature dynamics being a good baseline.