The chunking (Where do you stop to paraphrase? I think to have something meaningful to paraphrase you want something like every “big paragraph” (but maybe not each equation, which could each be a line). Idk how to do that cleanly without doing the distillation. And not doing it cleanly might result in benign performance degradation.)
The OODness, but I guess you’re fine with this one? I suspect this might get you worrisome results for no worrying reason, just because it’s somewhat degenerate to prefill a scratchpad with a big paraphrase, get a new paragraph which is probably influence by the paraphrasing style already, then paraphrase that. Besides the paraphrasing style being distracting, there is also a “paraphrase of paraphrase of paraphrase of …” effect that might degrade the scratchpad for “normal” reasons but in ways that are hard to pull apart from the worrying reasons.
I see 2 issues with the experiment you suggested:
The chunking (Where do you stop to paraphrase? I think to have something meaningful to paraphrase you want something like every “big paragraph” (but maybe not each equation, which could each be a line). Idk how to do that cleanly without doing the distillation. And not doing it cleanly might result in benign performance degradation.)
The OODness, but I guess you’re fine with this one? I suspect this might get you worrisome results for no worrying reason, just because it’s somewhat degenerate to prefill a scratchpad with a big paraphrase, get a new paragraph which is probably influence by the paraphrasing style already, then paraphrase that. Besides the paraphrasing style being distracting, there is also a “paraphrase of paraphrase of paraphrase of …” effect that might degrade the scratchpad for “normal” reasons but in ways that are hard to pull apart from the worrying reasons.