cfoster0 comments on Thoughts on the impact of RLHF research

cfoster0 26 Jan 2023 3:48 UTC
7 points
0
To throw in another perspective, I’ve been working with the OpenAI API models most days of the week for the past year or so. For my uses, the step-change in quality came from moving from base davinci to text-davinci-002, whereas the improvements moving from that to text-davinci-003 were decidedly less clear.
- Quintin Pope 26 Jan 2023 4:44 UTC
  9 points
  0
  Parent
  I agree the difference between base and 002 is bigger than the difference between 002 and 003. The base model needs to be carefully coaxed into a scenario where plausible continuations of the prompt align with your intended output, and even then it’s very inclined to repeat stuff and degenerates quickly. By contrast, you can just tell 002 what to do, and it will usually at least try to do what you say.
  - Richard_Ngo 27 Jan 2023 19:24 UTC
    5 points
    0
    Parent
    Seems like you’re implying that davinci is the base model for 002 and 003. That’s not the case; davinci has one base model (GPT-3) and then 002 and 003 share a different base model (GPT-3.5).
    - cfoster0 27 Jan 2023 20:19 UTC
      4 points
      2
      Parent
      Fair. I think the crucial question to Ajeya & Matthew’s discussion of “Why the hype now?” is exactly how much worse the non-RLHF models that had been available since at least last March (davinci, code-davinci-002, text-davinci-002) actually were than the RLHF models made available just recently (text-davinci-003 and ChatGPT’s underlying model). I stand by the opinion that the besides the new chat stuff, most of the improvement happened within the old cohort, rather than between cohorts, so I attribute the recent hype to the convenient and free chat interface.