faul_sname comments on faul_sname’s Shortform

faul_sname 16 May 2025 17:43 UTC
2 points
0
Using RLVR to train models makes them disproportionately good at tasks where it is hard for a less capable model to generate an acceptable answer, but easy for a less capable external grader to verify that an answer is correct.

Google’s AlphaEvolve seems to go even further down this road.

If advancement happens through a bunch of hard-to-find, easy to verify innovations, I think that provides substantial evidence that progress will be distributed rather than local to a specific instance of a recursively-self-improving agent operating within a single lab (e.g. faster matrix multiplication is an improvement which provides small incremental improvements to everyone and is hard to keep secret)
- Zack_M_Davis 23 Oct 2025 6:03 UTC
  2 points
  0
  Parent
  
  is hard to keep secret
  
  Is it actually hard to keep secret, or is it that people aren’t trying (because the prestige of publishing an advance is worth more than hoarding the incremental performance improvement for yourself)?