kave comments on Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases

kave 11 Mar 2025 21:23 UTC
LW: 4 AF: 3
2
AF
I think you train Claude 3.7 to imitate the paraphrased scratchpad, but I’m a little unsure because you say “distill”. Just checking that Claude 3.7 still produces CoT (in the style of the paraphrase) after training, rather than being trained to perform the paraphrased-CoT reasoning in one step?
- Fabien Roger 11 Mar 2025 23:54 UTC
  LW: 4 AF: 4
  0
  AF Parent
  By distillation, I mean training to imitate. So in the distill-from-paraphrased setting, the only model involved at evaluation time is the base model fine-tuned on paraphrased scratchpads, and it generates an answer from beginning to end.