I was thinking of sources like the section discussing RL in Towards Deconfusing Gradient Hacking. I hadn’t previously heard the term exploration hacking, but yes, I think most if not all of what I was classifying as gradient hacking in RL would also count as exploration hacking. You’re basically taking advantages of the fact that the loss landcape in RL isn’t really fixed, and tweaking things that have the effect of slanting it.
Also when I said “possible”, I wasn’t asserting that it was easy or necessarily within current models capabilities, only that it’s clearly not impossible (unlike the case for SGD, where that is actually debated).
Do you mean exploration hacking? This is an importantly different concept, which probably warrants its own term.
Also, how well-established? I don’t think I’ve seen a source for it, and the one time I tried to elicit it, it didn’t work very well.
I was thinking of sources like the section discussing RL in Towards Deconfusing Gradient Hacking. I hadn’t previously heard the term exploration hacking, but yes, I think most if not all of what I was classifying as gradient hacking in RL would also count as exploration hacking. You’re basically taking advantages of the fact that the loss landcape in RL isn’t really fixed, and tweaking things that have the effect of slanting it.
Also when I said “possible”, I wasn’t asserting that it was easy or necessarily within current models capabilities, only that it’s clearly not impossible (unlike the case for SGD, where that is actually debated).