davidad comments on Trying to disambiguate different questions about whether RLHF is “good”

davidad 15 Dec 2022 9:35 UTC
0 points
−2
AF
That’s not the case when using a global KL penalty—as (I believe) OpenAI does in practice, and as Buck appeals to in this other comment. In the paper linked here a global KL penalty is only applied in section 3.6, because they observe a strictly larger gap between proxy and gold reward when doing so.
- LawrenceC 15 Dec 2022 16:25 UTC
  LW: 2 AF: 1
  0
  AF Parent
  This doesn’t seem to be what Gao et al found: Figure 9 shows that the KL between RL and initial policy, at a given proxy reward score, still is significantly larger than the equivalent KL for a BoN-policy, as shown in Figure 1.