Ofer comments on Gradient hacking

Ofer 2 Nov 2019 16:42 UTC
LW: 3 AF: 2
0
AF

Why can’t the gradient descent get rid of the computation that decides to perform gradient hacking, or repurpose it for something more useful?

Gradient descent is a very simple algorithm. It only “gets rid” of some piece of logic when that is the result of updating the parameters in the direction of the gradient. In the scenario of gradient hacking, we might end up with a model that maliciously prevents gradient descent from, say, changing the parameter $θ_{1537}$ , by being a model that outputs a very incorrect value if $θ_{1537}$ is even slightly different than the desired value.