I did a quick replication-ish of this on a 0.6b model. It’s fairly anecdotal but it seemed reliable. The steering was monotonic and coherent for a wide range (what we want with steering). Overall training 2 lora’s was easy to engineer without any pitfalls.
My takeaway: this seems more robust and easy to develop than other steering (which is not that reliable yet). At first glance it seems to be a better intervention! I’m pretty keen on this idea and wish I had thought of it first.
I did a quick replication-ish of this on a 0.6b model. It’s fairly anecdotal but it seemed reliable. The steering was monotonic and coherent for a wide range (what we want with steering). Overall training 2 lora’s was easy to engineer without any pitfalls.
My takeaway: this seems more robust and easy to develop than other steering (which is not that reliable yet). At first glance it seems to be a better intervention! I’m pretty keen on this idea and wish I had thought of it first.
fork: https://github.com/wassname/weight-steering