Measure comments on Obstacles to gradient hacking

Measure 6 Sep 2021 11:54 UTC
1 point
0

One particular formalism that might work is continuous everywhere and differentiable everywhere except a set of countable size.

I’m confused by this part. If the function is stored numerically in network weights rather than symbolically, how do you verify that it has these properties?
- leogao 6 Sep 2021 14:46 UTC
  1 point
  0
  Parent
  The objective would be to show that under very modest assumptions, a certain type of gradient hacking is impossible. Ideally it would be easy to verify that nearly all neural network architectures, regardless of the weights, meet these assumptions. For example, all ReLU networks would meet the “continuous everywhere and differentiable everywhere except a set of countable size” assumption, and I think any network composed of components that individually meet this assumption should also meet this assumption.
  It’s still possible that the network abuses floating point rounding to gradient hack, but this is a distinct failure case.