evhub comments on Gradient hacking

evhub 29 Oct 2019 21:37 UTC
LW: 6 AF: 5
0
AF

Does the agent need to have the ability to change the weights in the neural net that it is implemented in? If so, how does it get that ability?

No, at least not in the way that I’m imagining this working. In fact, I wouldn’t really call that gradient-hacking anymore (maybe it’s just normal hacking at that point?).

For your other points, I agree that they seem like interesting directions to poke on for figuring out whether something like this works or not.