Why can’t the gradient descent get rid of the computation that decides to perform gradient hacking, or repurpose it for something more useful?
Gradient descent is a very simple algorithm. It only “gets rid” of some piece of logic when that is the result of updating the parameters in the direction of the gradient. In the scenario of gradient hacking, we might end up with a model that maliciously prevents gradient descent from, say, changing the parameter θ1537, by being a model that outputs a very incorrect value if θ1537 is even slightly different than the desired value.
Gradient descent is a very simple algorithm. It only “gets rid” of some piece of logic when that is the result of updating the parameters in the direction of the gradient. In the scenario of gradient hacking, we might end up with a model that maliciously prevents gradient descent from, say, changing the parameter θ1537, by being a model that outputs a very incorrect value if θ1537 is even slightly different than the desired value.