[Question] What’s actually going on in the “mind” of the model when we fine-tune GPT-3 to InstructGPT?

rpglover6410 Feb 2023 7:57 UTC

18 points

I posted in the open thread and was told that it would be worth promoting to top level.

cubefox responded with a link to an great explanation of how the fine-tuning is done, which made me realize that my original question was unclear, so I’m going to try to clarify.

The fundamental behavior of GPT-3 is token prediction, which can straightforwardly be leveraged into text completion; in contrast, the fundamental behavior of InstructGPT is instruction following. Instruction following is a new capability that uses the knowledge from the token prediction task to produce output as well as to understand input; how does that capability develop?

Some plausible experiments related to the question:

Follow a similar methodology to fine-tune a predictive model for instruction following, checkpointing along the way; for 100 (or even more) novel instruction prompts, see how the different checkpoints respond (in particular, how often they do completion vs instruction following).
Given a prompt P, which produces completion C when fed into the fine-tuned model, try to find a prompt P' that produces C when fed into the original model.
Fine-tune twice with the same data and reward model, but in a different order; presumably the models will have different weights, but can we find prompts that give widely diverging results? If we have two checkpoint histories, at which point does the behavior diverge?

What links here?

rpglover64's comment on Open & Welcome Thread—January 2023 by DragonGod (10 Feb 2023 13:08 UTC; 3 points)

rpglover6410 Feb 2023 7:57 UTC

18 points

3 comments1 min readLW link

AI GPT

Jozdien 10 Feb 2023 15:27 UTC
3 points
0
The base models of GPT-3 already have the ability to “follow instructions”, it’s just veiled behind the more general interface. If you prompt it with something as simple as this (GPT generation is highlighted), you can see how it contains this capability somewhere.
You may have noticed that it starts to repeat itself after a few lines, and come up with new questions on its own besides. That’s part of what the fine-tuning fixes, making its generations more concise and stop at the point where the next token would be leading to another question. InstructGPT also has the value of not needing the wrapper of “Q: [] A: []”, but that’s not really a qualitative difference.
In other words, instruction following is not a new capability and the fine-tuning doesn’t really make any qualitative changes to the model. In fact, I think that you can get results [close to] this good if you prompt it really well (like, in the realm of soft prompts).
- rpglover64 10 Feb 2023 17:15 UTC
  1 point
  0
  Parent
  
  The base models of GPT-3 already have the ability to “follow instructions”, it’s just veiled behind the more general interface. [...] you can see how it contains this capability somewhere.
  
  This is a good point that I forgot. My mental model of this is that since many training samples are Q&A, in these cases, learning to complete implies learning how to answer.
  
  InstructGPT also has the value of not needing the wrapper of “Q: [] A: []”, but that’s not really a qualitative difference.
  
  I want to push back a little bit on the claim that this is not a qualitative difference; it does imply a big difference in output for identical input, even if the transformation required to get similar output between the two models is simple.
  
  In other words, instruction following is not a new capability and the fine-tuning doesn’t really make any qualitative changes to the model. In fact, I think that you can get results [close to] this good if you prompt it really well (like, in the realm of soft prompts).
  
  TIL about soft prompts. That’s really cool, and I’m not surprised it works (it also feels a little related to my second proposed experiment). My intuition here (transferred from RNNs, but I think it should mostly apply to unidirectional transformers as well) is that a successful prompt puts the NN into the right “mental state” to generate the desired output: fine-tuning for e.g. instruction following mostly pushes to get the model into this state from the prompts given (as opposed to e.g. for HHH behavior, which also adjusts the outputs from induced states); soft prompts instead search for and learn a “cheat code” that puts the model into a state such that the prompt is interpreted correctly. Would you (broadly) agree with this?
  - Jozdien 10 Feb 2023 18:23 UTC
    2 points
    0
    Parent
    I want to push back a little bit on the claim that this is not a qualitative difference; it does imply a big difference in output for identical input, even if the transformation required to get similar output between the two models is simple.
    That’s fair—I meant mainly on the abstract view where you think of the distribution that the model is simulating. It doesn’t take a qualitative shift either in terms of the model being a simulator, nor a large shift in terms of the distribution itself. My point is mainly that instruction following is still well within the realm of a simulator—InstructGPT isn’t following instructions at the model-level, it’s instantiating simulacra that respond to the instructions. Which is why prompt engineering still works with those models.
    a successful prompt puts the NN into the right “mental state” to generate the desired output
    Yeah. Prompts serve the purpose of telling GPT what world specifically it’s in on its learned distribution over worlds, and what processes it’s meant to be simulating. It’s zeroing in on the right simulacra, or “mental state” as you put it (though it’s at a higher level of abstraction than the model itself, being a simulated process, hence why simulacra evokes a more precise image to me).
    fine-tuning for e.g. instruction following mostly pushes to get the model into this state from the prompts given (as opposed to e.g. for HHH behavior, which also adjusts the outputs from induced states); soft prompts instead search for and learn a “cheat code” that puts the model into a state such that the prompt is interpreted correctly. Would you (broadly) agree with this?
    The way I think of it is that with fine-tuning, you’re changing the learned distribution (both in terms of shifting it and narrowing/collapsing it) to make certain simulacra much more accessible—even without additional information from the prompt to tell the model what kind of setting it’s in, the distribution can be shifted to make instruction-following simulacra much more heavily represented. As stated above, prompts generally give information to the model on what part of the learned prior the setting is in, so soft prompts are giving maximal information in the prompt to the model on what part of the prior to collapse probability onto. For stronger fine-tuning I would expect needing to pack more information into the prompt.
    Take for example the case where you want to fine-tune GPT to write film scripts. You can do this with base GPT models too, it’d just be a lot harder because the simulacra you want (film writers) aren’t as easily accessible as in a fine-tuned model where those are the processes the prior is already focused on. But given enough prompt engineering, you can get pretty powerful performance anyway, which shows that the capability is present in this model which can traverse a wide region of concept space with varying levels of fidelity.

No comments.