RSS

leogao

Karma: 1,173

Clar­ify­ing wire­head­ing terminology

leogao24 Nov 2022 4:53 UTC
52 points
6 comments1 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

20 Oct 2022 0:20 UTC
86 points
11 comments1 min readLW link
(arxiv.org)