Addie Foote comments on Distillation Robustifies Unlearning

Addie Foote 23 Jun 2025 22:01 UTC
1 point
0
Yeah, I’m also surprised by it. I have two hypotheses, but it could be for other reasons I’m missing. One hypothesis is that we kept temperature=1 for the KL divergence, and using a different temperature might be important to distill faster. The second is that we undertrained the pretrained models, so pretraining was shorter while distillation took around the same amount of time. I’m not really sure though.