Jatin Nainani comments on Scaling Sparse Feature Circuit Finding to Gemma 9B

Jatin Nainani 13 Jan 2025 15:15 UTC
2 points
0
Yes—By design, the circuits discovered in this manner might miss how/when something is computed. But we argue that finding the important representations at bottlenecks and their change over layers can provide important/useful information about the model.
One of our future directions, along the direction of crosscoders, is to have “Layer Output Buffer SAEs” that aim to tackle the computation between bottlenecks.