Would be interesting to empirically check the reward surrounding reward hacking solutions. Should be able to plot the reward against variance and see if that’s different than other spots.
Would be interesting to empirically check the reward surrounding reward hacking solutions. Should be able to plot the reward against variance and see if that’s different than other spots.