I think the idea with internal activations manipulation is interesting. It might require some refinement—I think activations of encoder-decoder transformer model are a function of inputs, and they change with every token. At first, the input is your prompt, then it’s your prompt + generated tokens. So the protocol / task for GPT3 would be: generate now 5 tokens, so with the last generation this logit is maximized? Also, it depends on hyperparameters of beam search which are controlled by human
I think the idea with internal activations manipulation is interesting. It might require some refinement—I think activations of encoder-decoder transformer model are a function of inputs, and they change with every token. At first, the input is your prompt, then it’s your prompt + generated tokens. So the protocol / task for GPT3 would be: generate now 5 tokens, so with the last generation this logit is maximized? Also, it depends on hyperparameters of beam search which are controlled by human