Bruce W. Lee comments on Distillation Robustifies Unlearning

Bruce W. Lee 21 Jul 2025 21:16 UTC
3 points
0
Thank you for this suggestion. I read the paper that you mentioned. The authors note : “The novelty of our threat is that the adversary chooses a set of target concepts they aim to preserve despite subsequent erasure.” How realistic is this assumption, given a setup where presumably model providers choose the method (a set of target concepts to be erased) and the public only has access to the resulting model? Is it stemming from the concerns of an insider threat?
- Thomas Kwa 21 Jul 2025 21:19 UTC
  2 points
  0
  Parent
  I don’t think it’s especially realistic but it seems like a good setting to test unlearning against for the sake of diversity / broader understanding of what we can do with AI safety techniques
  - Bruce W. Lee 21 Jul 2025 21:30 UTC
    3 points
    0
    Parent
    Super interesting! Thanks for sharing the paper.