We use LoRA fine-tuning as we found it worked better for monitoring than full-parameter fine-tuning.
This is interesting! It means that lora was a better inductive bias for the difference. See my post on other LoRA variants that use SVD, rotations, magnitude / direction decoupling. Some of them seem data efficient and generalise (like this work), so I would predict using PiSSA, SSVD, DeLORA, or OFT might generalise better than LoRA and with less side effects.
This is interesting! It means that lora was a better inductive bias for the difference. See my post on other LoRA variants that use SVD, rotations, magnitude / direction decoupling. Some of them seem data efficient and generalise (like this work), so I would predict using PiSSA, SSVD, DeLORA, or OFT might generalise better than LoRA and with less side effects.