As mentioned in the other reply, DP gives up performance (though with enough data you can overcome that, and in many cases, you’d need only a little less data for reliable answers anyway).
Another point is that DP is fragile and a pain:
you have to carefully track the provenance of your data (so you don’t accidentally include someone’s data twice without explicitly accounting for it)
you usually need to clip data to a priori bounds or something equivalent (IIRC, the standard DP algorithm for training NNs requires gradient clipping)
you can “run out of budget”—there’s a parameter that bounds the dissimilarity, and at some point, you have run so many iterations that you can’t prove that the result is DP if you keep running; this happens way before you stop improving on the validation set (with current DP techniques)
hyperparameter tuning is an active research area, because once you’ve run with one set of hyperparameters, you’ve exhausted the budget (see previous point); you can just say “budget is per hyperparameter set”, but then you lose generalization guarantees
DP bounds tend to be pessimistic (see previous example about single pathological dataset), so while you might in principle be able to continue improving without damaging generalization, DP on its own won’t be informative about it
There are relaxations of DP (here’s a reference that tries to give an overview of over 200 papers; since its publication, even more work in this vein has come out), but they’re not as well-studied, and even figuring out which one has the properties you need is a difficult problem.
It’s also not exactly what you want, and it’s not easy, so it tends to be better to look for the thing that correlates more closely with what you actually want (i.e. generalization).
Thanks, this kind of inside knowledge from practice is both precious and the hardest to find. I also like the review very much: yes it may be not exactly what I wanted, but it feels like exactly what I should have wanted, in that it helped me realize that each and every of the numerous DP variants discussed implicates a slightly different notion of generalizability. In retrospect, my last question was a bit like « Ice-cream is good, DP is good, so why not use DP to improve ice-cream? » ⇒ because this false logic stops working as soon as « good » is properly defined.
I need some time to catch up with the excellent reading list bellow and try a few codes myself, but will keep your name in mind if I have more technical questions. In the mean time, and as I’m not very familiar with the habits here: should I do something like rewriting the main post to mentioned the excellent answers I had, or can I click on some « accept this answer » button somewhere?
should I do something like rewriting the main post to mentioned the excellent answers I had, or can I click on some « accept this answer » button somewhere?
I don’t think there’s an “accept answer” button, and I don’t think you’re expected to update your question. I personally would probably edit it to add one sentence summarizing your takeaways.
As mentioned in the other reply, DP gives up performance (though with enough data you can overcome that, and in many cases, you’d need only a little less data for reliable answers anyway).
Another point is that DP is fragile and a pain:
you have to carefully track the provenance of your data (so you don’t accidentally include someone’s data twice without explicitly accounting for it)
you usually need to clip data to a priori bounds or something equivalent (IIRC, the standard DP algorithm for training NNs requires gradient clipping)
you can “run out of budget”—there’s a parameter that bounds the dissimilarity, and at some point, you have run so many iterations that you can’t prove that the result is DP if you keep running; this happens way before you stop improving on the validation set (with current DP techniques)
hyperparameter tuning is an active research area, because once you’ve run with one set of hyperparameters, you’ve exhausted the budget (see previous point); you can just say “budget is per hyperparameter set”, but then you lose generalization guarantees
DP bounds tend to be pessimistic (see previous example about single pathological dataset), so while you might in principle be able to continue improving without damaging generalization, DP on its own won’t be informative about it
There are relaxations of DP (here’s a reference that tries to give an overview of over 200 papers; since its publication, even more work in this vein has come out), but they’re not as well-studied, and even figuring out which one has the properties you need is a difficult problem.
It’s also not exactly what you want, and it’s not easy, so it tends to be better to look for the thing that correlates more closely with what you actually want (i.e. generalization).
Thanks, this kind of inside knowledge from practice is both precious and the hardest to find. I also like the review very much: yes it may be not exactly what I wanted, but it feels like exactly what I should have wanted, in that it helped me realize that each and every of the numerous DP variants discussed implicates a slightly different notion of generalizability. In retrospect, my last question was a bit like « Ice-cream is good, DP is good, so why not use DP to improve ice-cream? » ⇒ because this false logic stops working as soon as « good » is properly defined.
I need some time to catch up with the excellent reading list bellow and try a few codes myself, but will keep your name in mind if I have more technical questions. In the mean time, and as I’m not very familiar with the habits here: should I do something like rewriting the main post to mentioned the excellent answers I had, or can I click on some « accept this answer » button somewhere?
I don’t think there’s an “accept answer” button, and I don’t think you’re expected to update your question. I personally would probably edit it to add one sentence summarizing your takeaways.