Charlie Steiner comments on Understanding Gradient Hacking

Charlie Steiner 13 Dec 2021 7:38 UTC
LW: 3 AF: 2
0
AF
I think this is a totally fine length. But then, I would :P
I still feel like this was a little gears-light. Do the proposed examples of gradient hacking really work if you make a toy neural network with them? (Or does gradient descent find a way around the apparent local minimum?)