Do you have an intuition on whether or not using LoRA for the SGMCMC sampling of the BIF breaks everything? I’m vibe-investigating some stuff on top of your code and I want my BIFs to converge better.
I’ve seen someone say something like “LoRA width is a hyperparameter which varies from 1 (probe-steering vector) to full-rank (normal finetuning) and doesn’t affect high level training dynamics” in particular arguing that it shouldn’t affect emergent misalignment, which is basically just a special case of BIFs.
Claude just glazes me, and I don’t have enough intuition to figure out whether this is completely stupid or not.
Do you have an intuition on whether or not using LoRA for the SGMCMC sampling of the BIF breaks everything? I’m vibe-investigating some stuff on top of your code and I want my BIFs to converge better.
I’ve seen someone say something like “LoRA width is a hyperparameter which varies from 1 (probe-steering vector) to full-rank (normal finetuning) and doesn’t affect high level training dynamics” in particular arguing that it shouldn’t affect emergent misalignment, which is basically just a special case of BIFs.
Claude just glazes me, and I don’t have enough intuition to figure out whether this is completely stupid or not.
No, I’m not sure about this. I think it’s an interesting question and would make a valuable paper to really investigate this.
I don’t understand what this means. In what sense is the BIF a special case of what?