The paper describes a method for self-improvement in LLMs. But does it work for recursive self-improvement? I haven’t found any mention of recursion or multiple iterations in the paper.
The most relevant section seems to be 5.2 PUSHING THE LIMIT OF SELF-IMPROVEMENTS. Here the authors talk about their attempts to have the model use self-generated questions and self-generated few-shot Chain-of-Thought prompting. They did measure self-improvement when using self-generated questions, but the self-improvement wasn’t as great as when they used training-set questions. But who cares if it’s lesser self-improvement if it’s an unsupervised process and you can just iterate on it to get more and more self-improvement?
I’m assuming their numbers show the self-improvement they measured for one iteration. Or is it actually the maximum self-improvement possible for any number of iterations?
The paper describes a method for self-improvement in LLMs. But does it work for recursive self-improvement? I haven’t found any mention of recursion or multiple iterations in the paper.
The most relevant section seems to be 5.2 PUSHING THE LIMIT OF SELF-IMPROVEMENTS. Here the authors talk about their attempts to have the model use self-generated questions and self-generated few-shot Chain-of-Thought prompting. They did measure self-improvement when using self-generated questions, but the self-improvement wasn’t as great as when they used training-set questions. But who cares if it’s lesser self-improvement if it’s an unsupervised process and you can just iterate on it to get more and more self-improvement?
I’m assuming their numbers show the self-improvement they measured for one iteration. Or is it actually the maximum self-improvement possible for any number of iterations?