Buck comments on Trying to disambiguate different questions about whether RLHF is “good”

Buck 15 Dec 2022 20:37 UTC
2 points
0
RLHF could then reinforce the high rating mechanism without correspondingly reinforcing the truth mechanism, breaking the correlation.
I unconfidently think that in this case, RLHF will reinforce both mechanisms, but reinforce the high rating mechanism slightly more, which nets out to no clear difference from conditioning. But I wouldn’t be shocked to learn I was wrong.