Do superposition features actually seem to work like this in practice in current networks? I was not aware of this.
I’m not aware of any work that identifies superposition in exactly this way in NNs of practical use. As Spencer notes, you can verify that it does appear in certain toy settings though. Anthropic notes in their SoLU paper that they view their results as evidence for the SPH in LLMs. Imo the key part of the evidence here is that using a SoLU destroys performance but adding another LayerNorm afterwards solves that issue. The SoLU selects strongly against superposition and LayerNorm makes it possible again, which is some evidence that the way the LLM got to its performance was via superposition.
ETA: Ofc there could be some other mediating factor, too.
I’m not aware of any work that identifies superposition in exactly this way in NNs of practical use.
As Spencer notes, you can verify that it does appear in certain toy settings though. Anthropic notes in their SoLU paper that they view their results as evidence for the SPH in LLMs. Imo the key part of the evidence here is that using a SoLU destroys performance but adding another LayerNorm afterwards solves that issue. The SoLU selects strongly against superposition and LayerNorm makes it possible again, which is some evidence that the way the LLM got to its performance was via superposition.
ETA: Ofc there could be some other mediating factor, too.