Thanks for this update. This is really cool. I have a couple of questions, in case you have the time to answer them.
When you sweep layers do you observe a smooth change in how “efficient” the general solution is? Is there a band of layers where general misalignment is especially easy to pick up?
Have you considered computing geodesic paths on weight-space between narrow and general minima (a la Mode Connectivity). Is there a low-loss tunnel, or are they separated by high-loss barriers? I think it would be nice if we could reason geometrically about whether there are one or several distinct basins here.
Finally, in your orthogonal-noise experiment you perturb all adapter parameters at once. Have you tried layer-wise noise? I wonder whether certain layers (perhaps the same ones where the general solution is most “efficient”) dominate the robustness gap.
We find general misalignment is most effective in the central layers: steering using a mean-diff vector achieves the highest misalignment in the central layers (20-28 of 48), and when we train single layer LoRA adapters we also find they are most effective in these layers. Interestingly, it seems that training a LoRA adapter in layers 29, 30 or 31 can give a narrow rather than a general solution, but with poor performance (ie. low narrow misalignment). Above this, single layer rank 1 LoRAs no longer work.
We may have some nice plots incoming for loss tunnels :)
The results in this post just report single layer adapters, all trained all layer 24. We did also run it on all-layer LoRAs, with similar results, but didn’t try layerwise noise. In the past, we’ve tested ablating the LoRA adapters from specific layers of an all-layer fine-tune. We actually find that ablating the first and last 12 adapters only reduces misalignment by ~25%, so I would expect that noising these also has a small effect.
Thanks for this update. This is really cool. I have a couple of questions, in case you have the time to answer them.
When you sweep layers do you observe a smooth change in how “efficient” the general solution is? Is there a band of layers where general misalignment is especially easy to pick up?
Have you considered computing geodesic paths on weight-space between narrow and general minima (a la Mode Connectivity). Is there a low-loss tunnel, or are they separated by high-loss barriers? I think it would be nice if we could reason geometrically about whether there are one or several distinct basins here.
Finally, in your orthogonal-noise experiment you perturb all adapter parameters at once. Have you tried layer-wise noise? I wonder whether certain layers (perhaps the same ones where the general solution is most “efficient”) dominate the robustness gap.
Thanks!
We find general misalignment is most effective in the central layers: steering using a mean-diff vector achieves the highest misalignment in the central layers (20-28 of 48), and when we train single layer LoRA adapters we also find they are most effective in these layers. Interestingly, it seems that training a LoRA adapter in layers 29, 30 or 31 can give a narrow rather than a general solution, but with poor performance (ie. low narrow misalignment). Above this, single layer rank 1 LoRAs no longer work.
We may have some nice plots incoming for loss tunnels :)
The results in this post just report single layer adapters, all trained all layer 24. We did also run it on all-layer LoRAs, with similar results, but didn’t try layerwise noise. In the past, we’ve tested ablating the LoRA adapters from specific layers of an all-layer fine-tune. We actually find that ablating the first and last 12 adapters only reduces misalignment by ~25%, so I would expect that noising these also has a small effect.