When you say “the effects of RL in LLMs”, do you mean RLHF, RLVR, or both?
I hadn’t intended to specify, because I’m not completely sure, and I don’t expect the analogy to hold that precisely. I’m thinking there are elements of both in both analogies.
When you say “the effects of RL in LLMs”, do you mean RLHF, RLVR, or both?
I hadn’t intended to specify, because I’m not completely sure, and I don’t expect the analogy to hold that precisely. I’m thinking there are elements of both in both analogies.