RSS

JulesRoussel01

Karma: 6

In open RLVR, “im­prove­ment” de­pends on the in­stru­ment — a small GRPO testbed sep­a­rat­ing what train­ing op­ti­mizes, mea­sures, and teaches

JulesRoussel0115 Jun 2026 18:50 UTC
7 points
0 comments20 min readLW link