RobertKirk comments on Steering Llama-2 with contrastive activation additions

RobertKirk 10 Jan 2024 16:22 UTC
LW: 1 AF: 1
0
AF
A quick technical question: In the comparison to fine-tuning results in Section 6 where you stack CAA with fine-tuning, do you find a new steering vector after each fine-tune, or are you using the same steering vector for all fine-tuned models? My guess is you’re doing the former as it’s likely to be more performant, but I’d be interested to see what happens if you try to do the latter.
- Nina Panickssery 10 Jan 2024 17:59 UTC
  LW: 5 AF: 4
  0
  AF Parent
  We used the same steering vectors, derived from the non fine-tuned model