Thanks for pointing this out! I agree we are exploring a safety-relevant variant of prompt self-distillation.
It would certainly be interesting to see how much more effective logit-based distillation is than SFT at “internalizing” the omitted generation prompt.
Thanks for pointing this out! I agree we are exploring a safety-relevant variant of prompt self-distillation.
It would certainly be interesting to see how much more effective logit-based distillation is than SFT at “internalizing” the omitted generation prompt.