It does seem useful to make the distinction between thinking about how gradient hacking failures look like in worlds where they cause an existential catastrophe, and thinking about how to best pursue empirical research today about gradient hacking.
It does seem useful to make the distinction between thinking about how gradient hacking failures look like in worlds where they cause an existential catastrophe, and thinking about how to best pursue empirical research today about gradient hacking.