In general, what people have been finding seems to be that fine-tuning an LLM on dataset much smaller that it pre-training set can bring out latent abilities or behaviors that it already had, or add narrow new capabilities, but making it a whole lot smarter in general requires a dataset comparable in size to the one it was pretrained on.
Yes, you do need lot of data.
There are a lot of domains where it’s possible to distinguish good answers from bad answers by looking at results.
If you take a lot of mathematical problems, it’s relatively easy to check whether a mathematical proof is correct and hard to write the proof in the first place.
Once you have an AutoGPT-like agent that can do mathematical proofs, you have a lot of room to generate data about mathematical proofs and can optimize for the AutoGPT instance being able to create proofs with less steps of running the LLM.
With the set of prompts that ChatGPT users provided, the agent can also look through the data and find individual problems that have the characteristics that it’s easy to produce problem sets and grade the quality of answers.
Yes, you do need lot of data.
There are a lot of domains where it’s possible to distinguish good answers from bad answers by looking at results.
If you take a lot of mathematical problems, it’s relatively easy to check whether a mathematical proof is correct and hard to write the proof in the first place.
Once you have an AutoGPT-like agent that can do mathematical proofs, you have a lot of room to generate data about mathematical proofs and can optimize for the AutoGPT instance being able to create proofs with less steps of running the LLM.
With the set of prompts that ChatGPT users provided, the agent can also look through the data and find individual problems that have the characteristics that it’s easy to produce problem sets and grade the quality of answers.