Due to the redundancy, changing any single weight—that is associated with one of those two pieces of logic—does not change the output.
You seem to be under the impression that the goal is to make the NN robust to single-weight perturbation. But gradient descent doesn’t modify a neural network one weight at a time, and so being robust to single-weight modification doesn’t come with any real guarantees. The backward pass could result in weights of both forks being updated.
You seem to be under the impression that the goal is to make the NN robust to single-weight perturbation. But gradient descent doesn’t modify a neural network one weight at a time, and so being robust to single-weight modification doesn’t come with any real guarantees. The backward pass could result in weights of both forks being updated.
Sure, but the gradient component that is associated with a given weight is still zero if updating that weight alone would not affect loss.
What do you think the gradient of min(x, y) is?