I’m not sure fine-tuning is necessary. Most recent models have a ~100.000 token context window now, so they could fit quite a few short high quality examples for in-context learning. (Gemini Pro even has a 2 million token context window, but of course the base model is unavailable to the public.)
I would be curious to see an attempt! I have a pretty strong prior that it would fail, though, with currently available models. I buy that RLHF hurts, but given Sam Altman’s sample story also not impressing me (and having the same failure modes, just slightly less so), the problem pattern-matches for me to the underlying LLM simply not absorbing the latent structure well enough to imitate it. You might need more parameters, or a different set of training data, or something.
(This also relates to my reply to gwern above—his prompt did indeed include high quality examples, and in my opinion it helped ~0.)
Both Altman and Gwern used fine-tuned models, those don’t really do in-context learning. They don’t support “prompt engineering” in the original sense, they only respond to commands and questions in a particular way.
I’m not sure fine-tuning is necessary. Most recent models have a ~100.000 token context window now, so they could fit quite a few short high quality examples for in-context learning. (Gemini Pro even has a 2 million token context window, but of course the base model is unavailable to the public.)
I would be curious to see an attempt! I have a pretty strong prior that it would fail, though, with currently available models. I buy that RLHF hurts, but given Sam Altman’s sample story also not impressing me (and having the same failure modes, just slightly less so), the problem pattern-matches for me to the underlying LLM simply not absorbing the latent structure well enough to imitate it. You might need more parameters, or a different set of training data, or something.
(This also relates to my reply to gwern above—his prompt did indeed include high quality examples, and in my opinion it helped ~0.)
Both Altman and Gwern used fine-tuned models, those don’t really do in-context learning. They don’t support “prompt engineering” in the original sense, they only respond to commands and questions in a particular way.