I’d probably call it an act-add style intervention? Ablation connotes the removal of something. I originally thought patch, but patch should be about setting it to be equal to the activation on another input, and being able to put in 10x the original value doesn’t count as patching.
I’d probably call it an act-add style intervention? Ablation connotes the removal of something. I originally thought patch, but patch should be about setting it to be equal to the activation on another input, and being able to put in 10x the original value doesn’t count as patching.
Sure, though it seems too general or common to use a long word for it?
Maybe “linear intervention”?
We called it vector arithmetic in my Othello paper?
Oh, I like that one! Going to use it from now on