leogao comments on Thomas Kwa’s Shortform

leogao 15 Apr 2023 7:23 UTC
LW: 4 AF: 2
0
AF
Doesn’t answer your question, but we also came across this effect in the RM Goodharting work, though instead of figuring out the details we only proved that it when it’s definitely not heavy tailed it’s monotonic, for Regressional Goodhart (https://arxiv.org/pdf/2210.10760.pdf#page=17). Jacob probably has more detailed takes on this than me.
In any event my intuition is this seems unlikely to be the main reason for overoptimization—I think it’s much more likely that it’s Extremal Goodhart or some other thing where the noise is not independent