Fabien Roger comments on Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases

Fabien Roger 13 Mar 2025 14:26 UTC
LW: 5 AF: 4
0
AF
How happy would you be about the following experiment:
1. Cut scratchpads in chunks for N paragraphs each
2. Paraphrase the first k chunks (where k is drawn at random), wrap them in <paraphrased_chunk> or <chunk> tags and join them
3. Distill that
4. Do the inference time experiment (asking for a chunk by pre-filling <chunk>, stop when you see </chunk>, paraphrase it and put it in <paraphrased_chunk> tags)
This way the inference-time experiment is not OOD and should not reduce performance. This has roughly the same drawbacks as the experiment I described in this post though (i.e. risk of “learning the paraphrased version of the encoded reasoning scheme”, not testing the OOD-generalization, …), so I am not sure if this is valuable.
Another experiment:
Cut scratchpads in chunks for N paragraphs each. Wrap them in <chunk> tags and distill them (without paraphrasing). Then do the inference time experiment. This gets rid of the FUD around chunking reducing performance (which I observed, as described in the footnote). This could still result in degraded performance just because it’s OOD, but arguably it’s slightly worrisome if being exactly IID matters for scratchpads.
I am also curious if you have takes about what value of N (or what chunking scheme) would make the most sense.
- Daniel Kokotajlo 13 Mar 2025 17:24 UTC
  LW: 3 AF: 2
  0
  AF Parent
  I think I don’t understand why the version of the experiment I proposed is worse/bad/etc., and am getting hung up on that.
  
  I like your second experiment design. Seems good to control for the chunk tags.
  
  Question: Why do you need chunk tags at all?
  - Fabien Roger 14 Mar 2025 10:00 UTC
    LW: 9 AF: 4
    2
    AF Parent
    I see 2 issues with the experiment you suggested:
    The chunking (Where do you stop to paraphrase? I think to have something meaningful to paraphrase you want something like every “big paragraph” (but maybe not each equation, which could each be a line). Idk how to do that cleanly without doing the distillation. And not doing it cleanly might result in benign performance degradation.)
    The OODness, but I guess you’re fine with this one? I suspect this might get you worrisome results for no worrying reason, just because it’s somewhat degenerate to prefill a scratchpad with a big paraphrase, get a new paragraph which is probably influence by the paraphrasing style already, then paraphrase that. Besides the paraphrasing style being distracting, there is also a “paraphrase of paraphrase of paraphrase of …” effect that might degrade the scratchpad for “normal” reasons but in ways that are hard to pull apart from the worrying reasons.