Doesn’t answer your question, but we also came across this effect in the RM Goodharting work, though instead of figuring out the details we only proved that it when it’s definitely not heavy tailed it’s monotonic, for Regressional Goodhart (https://arxiv.org/pdf/2210.10760.pdf#page=17). Jacob probably has more detailed takes on this than me.
In any event my intuition is this seems unlikely to be the main reason for overoptimization—I think it’s much more likely that it’s Extremal Goodhart or some other thing where the noise is not independent
Doesn’t answer your question, but we also came across this effect in the RM Goodharting work, though instead of figuring out the details we only proved that it when it’s definitely not heavy tailed it’s monotonic, for Regressional Goodhart (https://arxiv.org/pdf/2210.10760.pdf#page=17). Jacob probably has more detailed takes on this than me.
In any event my intuition is this seems unlikely to be the main reason for overoptimization—I think it’s much more likely that it’s Extremal Goodhart or some other thing where the noise is not independent