Finally got around to that one, and am also pretty into that explanation for the cases of double descent we observe. It also tentatively makes me want to say that the decrease in variance with model size is the ‘real story’/primary thing we should think about.
Finally got around to that one, and am also pretty into that explanation for the cases of double descent we observe. It also tentatively makes me want to say that the decrease in variance with model size is the ‘real story’/primary thing we should think about.