Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
JulesRoussel01
Karma:
6
All
Posts
Comments
New
Top
Old
In open RLVR, “improvement” depends on the instrument — a small GRPO testbed separating what training optimizes, measures, and teaches
JulesRoussel01
15 Jun 2026 18:50 UTC
7
points
0
comments
20
min read
LW
link
Back to top