To throw in another perspective, I’ve been working with the OpenAI API models most days of the week for the past year or so. For my uses, the step-change in quality came from moving from base davinci to text-davinci-002, whereas the improvements moving from that to text-davinci-003 were decidedly less clear.
I agree the difference between base and 002 is bigger than the difference between 002 and 003. The base model needs to be carefully coaxed into a scenario where plausible continuations of the prompt align with your intended output, and even then it’s very inclined to repeat stuff and degenerates quickly. By contrast, you can just tell 002 what to do, and it will usually at least try to do what you say.
Seems like you’re implying that davinci is the base model for 002 and 003. That’s not the case; davinci has one base model (GPT-3) and then 002 and 003 share a different base model (GPT-3.5).
Fair. I think the crucial question to Ajeya & Matthew’s discussion of “Why the hype now?” is exactly how much worse the non-RLHF models that had been available since at least last March (davinci, code-davinci-002, text-davinci-002) actually were than the RLHF models made available just recently (text-davinci-003 and ChatGPT’s underlying model). I stand by the opinion that the besides the new chat stuff, most of the improvement happened within the old cohort, rather than between cohorts, so I attribute the recent hype to the convenient and free chat interface.
To throw in another perspective, I’ve been working with the OpenAI API models most days of the week for the past year or so. For my uses, the step-change in quality came from moving from base
davinci
totext-davinci-002
, whereas the improvements moving from that totext-davinci-003
were decidedly less clear.I agree the difference between base and 002 is bigger than the difference between 002 and 003. The base model needs to be carefully coaxed into a scenario where plausible continuations of the prompt align with your intended output, and even then it’s very inclined to repeat stuff and degenerates quickly. By contrast, you can just tell 002 what to do, and it will usually at least try to do what you say.
Seems like you’re implying that davinci is the base model for 002 and 003. That’s not the case; davinci has one base model (GPT-3) and then 002 and 003 share a different base model (GPT-3.5).
Fair. I think the crucial question to Ajeya & Matthew’s discussion of “Why the hype now?” is exactly how much worse the non-RLHF models that had been available since at least last March (
davinci
,code-davinci-002
,text-davinci-002
) actually were than the RLHF models made available just recently (text-davinci-003
and ChatGPT’s underlying model). I stand by the opinion that the besides the new chat stuff, most of the improvement happened within the old cohort, rather than between cohorts, so I attribute the recent hype to the convenient and free chat interface.