StellaAthena comments on Obstacles to gradient hacking

StellaAthena 6 Sep 2021 17:08 UTC
3 points
0
AF

Due to the redundancy, changing any single weight—that is associated with one of those two pieces of logic—does not change the output.

You seem to be under the impression that the goal is to make the NN robust to single-weight perturbation. But gradient descent doesn’t modify a neural network one weight at a time, and so being robust to single-weight modification doesn’t come with any real guarantees. The backward pass could result in weights of both forks being updated.
- Ofer 9 Sep 2021 14:56 UTC
  LW: 1 AF: 1
  0
  AF Parent
  
  But gradient descent doesn’t modify a neural network one weight at a time
  
  Sure, but the gradient component that is associated with a given weight is still zero if updating that weight alone would not affect loss.
  - StellaAthena 9 Sep 2021 16:54 UTC
    3 points
    0
    Parent
    What do you think the gradient of min(x, y) is?