It seems to me that the part of training most responsible for capabilities would be pre-training rather than RL (something like GRPO requires the base model to get at least one rollout correct). But also, it feels like most RL training has to be objective agnostic; a coding task wouldn’t clearly have a clear connection to alignment. If our goal is to train an aligned AI where capabilities and alignment goes hand in hand, it seems like we should somehow bake alignment training into pre-training rather than rely on post-training techniques. Unless, its primarily RL that induces long horizon goal directed capability (I suspect it’s some of both).
It seems to me that the part of training most responsible for capabilities would be pre-training rather than RL (something like GRPO requires the base model to get at least one rollout correct). But also, it feels like most RL training has to be objective agnostic; a coding task wouldn’t clearly have a clear connection to alignment. If our goal is to train an aligned AI where capabilities and alignment goes hand in hand, it seems like we should somehow bake alignment training into pre-training rather than rely on post-training techniques. Unless, its primarily RL that induces long horizon goal directed capability (I suspect it’s some of both).