I think I agree with your statement once a significant amount of capabilities is learned in RL.
I’m confused about how much current models have learned via RL.
The persona selection model argues that post-training mostly selects an existing persona that was learned in pre-training (though maybe this is mostly related to character, and somewhat orthogonal to capabilities learned by post-training RL)
Venhoff et al. seems to suggest that reasoning training only affects somewhat specific parts of the model (though maybe those parts are just super important)
I think I agree with your statement once a significant amount of capabilities is learned in RL.
I’m confused about how much current models have learned via RL.
The persona selection model argues that post-training mostly selects an existing persona that was learned in pre-training (though maybe this is mostly related to character, and somewhat orthogonal to capabilities learned by post-training RL)
Venhoff et al. seems to suggest that reasoning training only affects somewhat specific parts of the model (though maybe those parts are just super important)