RSS

John Schulman

Karma: 461

Scal­ing Laws for Re­ward Model Overoptimization

20 Oct 2022 0:20 UTC
102 points
13 comments1 min readLW link
(arxiv.org)