My impression is that studies have shown that, at least for earlier rounds of reasoning models where the total computation invested in reasoning training was fairly small compared to pretraining, reasoning training was mostly up-or-down-regulating skills, many of the metacognitive, that the base model already had, to optimize them happening at the right times and frequencies.
With sufficiently large amounts of reasoning training, one would expect metacognitive skills to improve. But already having the skill present in the base model from us is still going to be very helpful, I strongly suspect.
My impression is that studies have shown that, at least for earlier rounds of reasoning models where the total computation invested in reasoning training was fairly small compared to pretraining, reasoning training was mostly up-or-down-regulating skills, many of the metacognitive, that the base model already had, to optimize them happening at the right times and frequencies.
With sufficiently large amounts of reasoning training, one would expect metacognitive skills to improve. But already having the skill present in the base model from us is still going to be very helpful, I strongly suspect.