zroe1 comments on Alternative Models of Superposition

zroe1 11 Aug 2025 19:02 UTC
1 point
0
To your point about the loss, I believe it’s absolutely correct that this is an entirely different setting than the linear models from TMS. I wouldn’t characterize this as cheating, because it feels entirely possible that models in practice have an effective mechanism for handling lots of interference, but admittedly, the fact that you only select the target feature is the difference that makes this experiment work at all.
On the model itself, for $p = 0.0$ , and $p = 1.0$ , why can’t you place vectors equidistant around the circle, allowing for arbitrarily many features?
If I understand this question correctly, for $p = 0.0$ it should be possible to have arbitrarily many features. In this setting, there is no possibility for interference so if you tune hyperparameters correctly, you should be able to get as many features as you want. Empirically, I didn’t find a clear limit but, at the very least I can say that you should be able to get “a lot.” Because all inputs are orthogonal, in this case, the results should be very similar to Superposition, Memorization, and Double Descent.

$p = 1.0$ would be an interesting experiment that I didn’t run, but if I had to guess, the results wouldn’t be very clean because there would be quite a bit of interference on each training example.
- Alex Gibson 11 Aug 2025 19:13 UTC
  1 point
  0
  Parent
  I guess I mean cheating purely as “I don’t think this applies to to the Toy Model setting”, as opposed to saying it’s not a potentially valuable loss to study.
  For p=1.0, I forgot that each of the noise features are random between 0 and 0.1, as opposed to fixed magnitude. The reason I brought it up is because if they were fixed magnitude 0.05 then they would all cancel out and face in the opposite direction to the target feature with magnitude 0.05. Now I reread the setting again I don’t think that’s relevant, though.
  - zroe1 11 Aug 2025 19:18 UTC
    1 point
    0
    Parent
    The reason I brought it up is because if they were fixed magnitude 0.05 then they would all cancel out and face in the opposite direction to the target feature with magnitude 0.05. Now i’m curious what the variance in noise looks like as a function of number of features if you place them equidistant.
    This is a very interesting thought! I think your intuition is probably correct even though it is somewhat counterintuitive. Perhaps I’ll run this experiment at some point.