You’re correct, sorry for being confusing. Tracing through;
My understanding of steering is that you can add a steering vector to an activation vector at some layer, which causes the model outputs to be ‘steered’ in that direction. I.e.:
Record layer n’s activations when outputting “I am very happy”, get vector h
Record layer n’s activations when outputting “I am totally neutral”, get vector q
Subtract q from h to get steering vector s=h−q, the difference between ‘happy’ and ‘neutral’ outputs.
Add αs to the activations at layer n to steer the model into acting more happy, where α is some scalar.
The tensor network architecture is scale invariant, which (by my understanding) means that scaling the activation vector at any layer maintains the relative magnitude of the activations at any later layer.
(Dumb) I thought that this meant that adding a steering vector of magnitude α and adding a steering vector of magnitude 2α would preserve the relative magnitude of the activations later in the network. That is, that scaling the steering vector would be scale invariant too. But that’s not the case — we’re changing the direction of the (activation vector + steering vector) when we increase the magnitude of the steering vector.
That’s pretty much all I was trying to correct in my response. When I was talking about entire layer / not entire layer, I was just trying to say you can’t pretend that adding a steering vector is actually just scaling the activation vector even if it is parallel in some dimensions. It’s a trivial point I was just thinking through aloud. Like:
If you have activation vector v=[1,2,3,4,5]
You are scale invariant if you multiply by a scalar a: av=a[1,2,3,4,5]
Which is the same as pointwise multiplication by the vector u=[2,2,2,2,2]⊤: av=u⊙v
But you can’t just say, “Well, I’m only going to scale part of vector u, and since it’s scaling, that means it maintains scale invariance”, because it’s not just scaling, and that’s a dumb thing to say — u′=[2,2,2,0,0]⊤, then u′⊙v≠av for any a.
So basically you can ignore that, I was just slowly thinking through the maths to come to trivial conclusions.
Your claim here is different and good, and points to another useful thing about bilinear layers. As far as I can tell — you are saying you can decompose the effect of the steering vector into separable terms purely from the weights, whereas with ReLU you can’t do this because you don’t know which gates will flip. Neat!
You’re correct, sorry for being confusing. Tracing through;
My understanding of steering is that you can add a steering vector to an activation vector at some layer, which causes the model outputs to be ‘steered’ in that direction. I.e.:
Record layer n’s activations when outputting “I am very happy”, get vector h
Record layer n’s activations when outputting “I am totally neutral”, get vector q
Subtract q from h to get steering vector s=h−q, the difference between ‘happy’ and ‘neutral’ outputs.
Add αs to the activations at layer n to steer the model into acting more happy, where α is some scalar.
The tensor network architecture is scale invariant, which (by my understanding) means that scaling the activation vector at any layer maintains the relative magnitude of the activations at any later layer.
(Dumb) I thought that this meant that adding a steering vector of magnitude α and adding a steering vector of magnitude 2α would preserve the relative magnitude of the activations later in the network. That is, that scaling the steering vector would be scale invariant too. But that’s not the case — we’re changing the direction of the (activation vector + steering vector) when we increase the magnitude of the steering vector.
That’s pretty much all I was trying to correct in my response. When I was talking about entire layer / not entire layer, I was just trying to say you can’t pretend that adding a steering vector is actually just scaling the activation vector even if it is parallel in some dimensions. It’s a trivial point I was just thinking through aloud. Like:
If you have activation vector v=[1,2,3,4,5]
You are scale invariant if you multiply by a scalar a: av=a[1,2,3,4,5]
Which is the same as pointwise multiplication by the vector u=[2,2,2,2,2]⊤: av=u⊙v
But you can’t just say, “Well, I’m only going to scale part of vector u, and since it’s scaling, that means it maintains scale invariance”, because it’s not just scaling, and that’s a dumb thing to say — u′=[2,2,2,0,0]⊤, then u′⊙v≠av for any a.
So basically you can ignore that, I was just slowly thinking through the maths to come to trivial conclusions.
Your claim here is different and good, and points to another useful thing about bilinear layers. As far as I can tell — you are saying you can decompose the effect of the steering vector into separable terms purely from the weights, whereas with ReLU you can’t do this because you don’t know which gates will flip. Neat!