Vladimir_Nesov comments on Absolute Zero: Alpha Zero for LLM

Vladimir_Nesov 11 May 2025 21:41 UTC
12 points
0
The absolute zero paper studies a problem that might turn out to be moot in light of the recent RLVR with one training example paper. The usual number of training examples for RLVR is 10K-100K, but in that paper the authors are choosing just one or two example questions that they keep repeating for up to 2K steps, and that gives results close to what they get from training on 1K-8K examples (Figure 1, Figure 2). So finding a lot of verifiable tasks in some undirected way might be unimportant, though automatically generating or selecting them for a particular purpose might still be a source of key capabilities.