Garrett Baker comments on Activation additions in a small residual network

Garrett Baker 22 May 2023 23:10 UTC
2 points
0
1. Yup. You should be able to see this in the chart.
2. You’re right, however the results from the Steering GPT-2-XL post showed that in GPT-2-XL, similar modifications had very little effect on model perplexity. The patched model also doesn’t only shift weight from b to a. It also has wonky effects on other digits. For example, in the 3-1 patch for input 4, the weight given to 9 very much increased. More interestingly, it is not too uncommon to find examples which cause seemingly random digits to suddenly become the most likely. The 1-8 patch for input 9 is an example: