RedMan comments on Distillation Robustifies Unlearning

RedMan 24 Jul 2025 21:21 UTC
1 point
0
I think I’ll bet on M_scoped winning out.
1. Filtering at D has not worked and I don’t see why it would it C is an emergent property.
2. Given finite resources, a decision would be between training two M_weaks on different datasets, vs train one M_dangerous with all available resources, and provide customers with M_scoped, (while giving full access to M_dangerous to LWers who promise they won’t be bad and using it internally to advance your business objectives).
3. A frontier lab can charge customers for making M_scoped from M_awesome but the business case for asking a customer to pay to censor D in specific ways at intake is challenging.