James Chua comments on Steering GPT-2-XL by adding an activation vector

James Chua 17 Jun 2023 15:23 UTC
15 points
0
I managed to get it working for llama-7b on colab after some debugging.
Suprising, it actually does work for the Love / Hate scenario. But not some others like Rome vs Paris.
Heres the link i anyone wants to try it.
https://colab.research.google.com/drive/1ACAA7FO8zc4pFAqPdaPshoy4WWXCvUTQ?usp=sharing
edit: seems like you guys already have a better version here. https://github.com/UlisseMini/activation_additions_hf/blob/main/notebooks/qualitative.ipynb
nevermind! (I’m still keeping this comment for visiblity if anyone wants to try)
- Ulisse Mini 19 Jun 2023 0:23 UTC
  2 points
  0
  Parent
  Haha nice work! I’m impressed you got TransformerLens working on Colab, I underestimated how much CPU ram they had. I would have shared a link to my notebook & Colab but figured it might be good to keep under the radar so people could preregister predictions.
  
  Maybe the knowledge that you’re hot on my heels will make me finish the LLAMAs post faster now ;)
  - James Chua 19 Jun 2023 13:12 UTC
    1 point
    0
    Parent
    Yep! I was very pleasantly surprised that Love/Hate worked for Llama at all. It’s great that you rewrote it without transformer lens too—as transformer lens has issues with 8 bit / 4 bit quantisation.
    Also send you a dm on discord! I’ll be interested to read any rough findings and lessons you have with llama