I think this is supposed to say “undertrained”, right?
Undertraining per parameter is equivalent to overtraining per datum (for fixed compute). So Rohin’s usage makes sense in context, but also I agree with you that the word is confusing :P
This still seems confusing to me. Rohin says that the model is overtrained (not something like “prior approaches overtrained on limited data”), so it seems like he’s talking about the parameters and not the data.
Undertraining per parameter is equivalent to overtraining per datum (for fixed compute). So Rohin’s usage makes sense in context, but also I agree with you that the word is confusing :P
I did just mean undertrained (because I’m ~always using it in the per-parameter sense, which I think is how other people use it too).
This still seems confusing to me. Rohin says that the model is overtrained (not something like “prior approaches overtrained on limited data”), so it seems like he’s talking about the parameters and not the data.
Yeah I meant undertrained, I’ve fixed it now.