Using RLVR to train models makes them disproportionately good at tasks where it is hard for a less capable model to generate an acceptable answer, but easy for a less capable external grader to verify that an answer is correct.
Google’s AlphaEvolve seems to go even further down this road.
If advancement happens through a bunch of hard-to-find, easy to verify innovations, I think that provides substantial evidence that progress will be distributed rather than local to a specific instance of a recursively-self-improving agent operating within a single lab (e.g. faster matrix multiplication is an improvement which provides small incremental improvements to everyone and is hard to keep secret)
Is it actually hard to keep secret, or is it that people aren’t trying (because the prestige of publishing an advance is worth more than hoarding the incremental performance improvement for yourself)?
Using RLVR to train models makes them disproportionately good at tasks where it is hard for a less capable model to generate an acceptable answer, but easy for a less capable external grader to verify that an answer is correct.
Google’s AlphaEvolve seems to go even further down this road.
If advancement happens through a bunch of hard-to-find, easy to verify innovations, I think that provides substantial evidence that progress will be distributed rather than local to a specific instance of a recursively-self-improving agent operating within a single lab (e.g. faster matrix multiplication is an improvement which provides small incremental improvements to everyone and is hard to keep secret)
Is it actually hard to keep secret, or is it that people aren’t trying (because the prestige of publishing an advance is worth more than hoarding the incremental performance improvement for yourself)?