Besides the LLM related work it also reminds somewhat of dynamic prompting in Stable Diffusion, where part of the prompt is changed after a number of steps to achieve a mixture of promp1 and prompt2.
Activation additions work on Vicuna-13B about as well as they work on GPT-2-XL, or perhaps slightly better. GPT-J-6B is harder to work with for some reason.
This is really cool work! Congratulations!
Besides the LLM related work it also reminds somewhat of dynamic prompting in Stable Diffusion, where part of the prompt is changed after a number of steps to achieve a mixture of promp1 and prompt2.
What’s the TL;DR for the Vicuna 13B experiments?
Activation additions work on Vicuna-13B about as well as they work on GPT-2-XL, or perhaps slightly better. GPT-J-6B is harder to work with for some reason.
Note that there’s still a market open for how activation additions interact with larger models, it would be nice if it had more liquidity:
I added m1,000 in liquidity.
This idea of determining whether a result is “obvious” in advance seems valuable, I hope it catches on.
I wonder if this is related to how GPT-J runs the attention and MLP sublayers in parallel, as opposed to sequentially?