Lucius Bushnaq comments on Taking the parameters which seem to matter and rotating them until they don’t

Lucius Bushnaq 1 Sep 2022 13:48 UTC
2 points
0
Ah, I see. Thank you for pointing this out. Do superposition features actually seem to work like this in practice in current networks? I was not aware of this.
In any case, for a network like the one you describe I would change my claim from
it’d mean that to the AI, dog heads and car fronts are “the same thing”.
to the AI having a concept for something humans don’t have a neat short description for. So for example, if your algorithm maps X>0 Y>0 to the first case, I’d call it a feature of “presence of dog heads or car fronts, or presence of car fronts”.

I don’t think this is an inherent problem for the theory. That a single floating point number can contain a lot of information is fine, so long as you have some way to measure how much it is.
- Tom Lieberum 1 Sep 2022 14:31 UTC
  1 point
  0
  Parent
  Do superposition features actually seem to work like this in practice in current networks? I was not aware of this.
  I’m not aware of any work that identifies superposition in exactly this way in NNs of practical use.
  As Spencer notes, you can verify that it does appear in certain toy settings though. Anthropic notes in their SoLU paper that they view their results as evidence for the SPH in LLMs. Imo the key part of the evidence here is that using a SoLU destroys performance but adding another LayerNorm afterwards solves that issue. The SoLU selects strongly against superposition and LayerNorm makes it possible again, which is some evidence that the way the LLM got to its performance was via superposition.
  ETA: Ofc there could be some other mediating factor, too.