anaguma comments on lemonhope’s Shortform

anaguma 25 Sep 2025 3:39 UTC
1 point
0
Factor 1 is true only to the degree that the models cannot effectively generate hard problems for themselves to solve. If they can generate problems with verifiable rewards just at the edge of their capabilities, similar to AlphaZero, I expect these to be more useful than human generated problems. E.g. there are a lot of incredibly difficult unsolved conjectures in math which provide no RL gradients for labs to train on.