Fabien Roger comments on Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases

Fabien Roger 11 Mar 2025 23:54 UTC
LW: 4 AF: 4
0
AF
By distillation, I mean training to imitate. So in the distill-from-paraphrased setting, the only model involved at evaluation time is the base model fine-tuned on paraphrased scratchpads, and it generates an answer from beginning to end.