Quintin Pope comments on Paper: Large Language Models Can Self-improve [Linkpost]

Quintin Pope 2 Oct 2022 5:25 UTC
7 points
0
You do need a minimum degree of competence in the domain before your own judgement is sufficient to tell the difference between good and bad attempts. Though even for children, there are domains simple enough that they can make that determination. E.g., learning to stack blocks on top of each other has an obvious failure state, and children can learn to do it through trial and error, even though there is probably not a genetically hardcoded reward circuit for correctly stacking things on top of other things.

Math is a much more complex domain where self-directed learning works well, because mathematicians can formally verify the correctness of their attempts, and so have a reliable signal to identify good attempts at proving a theorem, developing a new approach, etc.