As a more productive question, say we had an LLM, which, amongst other things, if there is a known bigram encoded in the residual stream of the form t1A+t2B (corresponding to known bigram t1t2), potentially with interference from other aspects, outputs a consistent vector vt1t2 into the residual stream from an MLP layer. This is how GPT-2 encodes known bigrams, hence the relevance.
And say that there are quadratically many known bigrams as a function of hidden neuron size, so that in particular there are more bigrams than residual stream dimension. As far as I know, an appropriately randomly initialized network should be able to accomplish this task (or at least with Win random)
Is the goal for SPD to learn components for Win such that any given component only fires non-negligibly on a single bigram? Or is it ok if components fire on multiple different bigrams? I am trying to reason through how SPD would act in this case.
I’d have to think about the exact setup here to make sure there’s no weird caveats, but my first thought is that for Win, this ought to be one component per bigram, firing exclusively for that bigram.
As a more productive question, say we had an LLM, which, amongst other things, if there is a known bigram encoded in the residual stream of the form t1A+t2B (corresponding to known bigram t1t2), potentially with interference from other aspects, outputs a consistent vector vt1t2 into the residual stream from an MLP layer. This is how GPT-2 encodes known bigrams, hence the relevance.
And say that there are quadratically many known bigrams as a function of hidden neuron size, so that in particular there are more bigrams than residual stream dimension. As far as I know, an appropriately randomly initialized network should be able to accomplish this task (or at least with Win random)
Is the goal for SPD to learn components for Win such that any given component only fires non-negligibly on a single bigram? Or is it ok if components fire on multiple different bigrams? I am trying to reason through how SPD would act in this case.
I’d have to think about the exact setup here to make sure there’s no weird caveats, but my first thought is that for Win, this ought to be one component per bigram, firing exclusively for that bigram.