I managed to get it working for llama-7b on colab after some debugging.
Suprising, it actually does work for the Love / Hate scenario. But not some others like Rome vs Paris.
Heres the link i anyone wants to try it.
https://colab.research.google.com/drive/1ACAA7FO8zc4pFAqPdaPshoy4WWXCvUTQ?usp=sharing
edit: seems like you guys already have a better version here. https://github.com/UlisseMini/activation_additions_hf/blob/main/notebooks/qualitative.ipynb
nevermind! (I’m still keeping this comment for visiblity if anyone wants to try)
Yep! I was very pleasantly surprised that Love/Hate worked for Llama at all. It’s great that you rewrote it without transformer lens too—as transformer lens has issues with 8 bit / 4 bit quantisation.
Also send you a dm on discord! I’ll be interested to read any rough findings and lessons you have with llama