RogerDearnaley comments on How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaley 28 Dec 2023 5:27 UTC
1 point
0
I agree. The challenge of getting RL to do what you want it to rather then some other reward hack it came up with gets replaced with building good classifiers for human-created content: not a trivial problem, but a less challenging, less adversarial, and better understood one.