p.b. comments on Steering GPT-2-XL by adding an activation vector

p.b. 14 May 2023 13:52 UTC
LW: 4 AF: 3
0
AF
This is really cool work! Congratulations!
Besides the LLM related work it also reminds somewhat of dynamic prompting in Stable Diffusion, where part of the prompt is changed after a number of steps to achieve a mixture of promp1 and prompt2.
What’s the TL;DR for the Vicuna 13B experiments?
- TurnTrout 15 May 2023 16:32 UTC
  LW: 6 AF: 5
  0
  AF Parent
  What’s the TL;DR for the Vicuna 13B experiments?
  Activation additions work on Vicuna-13B about as well as they work on GPT-2-XL, or perhaps slightly better. GPT-J-6B is harder to work with for some reason.
  - TurnTrout 15 May 2023 18:00 UTC
    LW: 13 AF: 6
    0
    AF Parent
    Note that there’s still a market open for how activation additions interact with larger models, it would be nice if it had more liquidity:
    - Martin Randall 19 May 2023 20:47 UTC
      17 points
      9
      Parent
      I added m1,000 in liquidity.
      
      This idea of determining whether a result is “obvious” in advance seems valuable, I hope it catches on.
  - cfoster0 15 May 2023 17:30 UTC
    5 points
    0
    Parent
    I wonder if this is related to how GPT-J runs the attention and MLP sublayers in parallel, as opposed to sequentially?