That’s an excellent idea! I believe a similar approach can be used for model capabilities as well, but it may also prevent benign users from updating their models as well. Still, achieving fragile capabilities for adversarial updates but preserving them for benign updates seems doable to me.
That’s an excellent idea! I believe a similar approach can be used for model capabilities as well, but it may also prevent benign users from updating their models as well. Still, achieving fragile capabilities for adversarial updates but preserving them for benign updates seems doable to me.