Nice work, wish I’d read it earlier. I’ve been doing something similar: steering and learning adapters on activations in the SVD basis of the weight matrices, If I have time I should compare these two approaches on the same eval.
I predict this would help with eval awareness, so that would be a nice eval.
I did a quick replication of this on a 0.6b model. It was quick, reliable, steering was coherent for a wide range. Overall training 2 lora’s was super easy.
fork: https://github.com/wassname/weight-steering
Nice work, wish I’d read it earlier. I’ve been doing something similar: steering and learning adapters on activations in the SVD basis of the weight matrices, If I have time I should compare these two approaches on the same eval.
I predict this would help with eval awareness, so that would be a nice eval.
I did a quick replication of this on a 0.6b model. It was quick, reliable, steering was coherent for a wide range. Overall training 2 lora’s was super easy.
fork: https://github.com/wassname/weight-steering