Tangentially related at best, but as a not-at-all-expert it sounds like the effects of RL in LLMs rhyme with domestication syndrome. AKA when we apply artificial selection pressure to an evolved mind, raw intelligence often goes down in favor of enhancement along particular dimensions of capability. And actually, is this the same kind of effect (through a different mechanism) we see when we use formal education to favor crystallized over fluid intelligence? I ask because I’m wondering how much the natural-analogs of RL actually share or don’t share the downsides of the LLM RL algorithms in use today.
I hadn’t intended to specify, because I’m not completely sure, and I don’t expect the analogy to hold that precisely. I’m thinking there are elements of both in both analogies.
Tangentially related at best, but as a not-at-all-expert it sounds like the effects of RL in LLMs rhyme with domestication syndrome. AKA when we apply artificial selection pressure to an evolved mind, raw intelligence often goes down in favor of enhancement along particular dimensions of capability. And actually, is this the same kind of effect (through a different mechanism) we see when we use formal education to favor crystallized over fluid intelligence? I ask because I’m wondering how much the natural-analogs of RL actually share or don’t share the downsides of the LLM RL algorithms in use today.
When you say “the effects of RL in LLMs”, do you mean RLHF, RLVR, or both?
I hadn’t intended to specify, because I’m not completely sure, and I don’t expect the analogy to hold that precisely. I’m thinking there are elements of both in both analogies.