Charlie Steiner comments on [AN #173] Recent language model results from DeepMind

Charlie Steiner 21 Jul 2022 10:20 UTC
3 points
1

I think this is supposed to say “undertrained”, right?

Undertraining per parameter is equivalent to overtraining per datum (for fixed compute). So Rohin’s usage makes sense in context, but also I agree with you that the word is confusing :P
- Rohin Shah 21 Jul 2022 16:13 UTC
  4 points
  0
  Parent
  I did just mean undertrained (because I’m ~always using it in the per-parameter sense, which I think is how other people use it too).
- Quintin Pope 21 Jul 2022 16:10 UTC
  2 points
  0
  Parent
  This still seems confusing to me. Rohin says that the model is overtrained (not something like “prior approaches overtrained on limited data”), so it seems like he’s talking about the parameters and not the data.
  - Rohin Shah 21 Jul 2022 16:13 UTC
    3 points
    0
    Parent
    Yeah I meant undertrained, I’ve fixed it now.