gwern comments on Mati_Roy’s Shortform

gwern 20 Sep 2020 17:35 UTC
7 points
0

and error and hyperparameter tuning that would probably increase the cost several-fold.

All of which was done on much smaller models and GPT-3 just scaled up existing settings/equations—they did their homework. That was the whole point of the scaling papers, to tell you how to train the largest cost-effective model without having to brute force it! I think OA may well have done a single run and people are substantially inflating the cost because they aren’t paying any attention to the background research or how the GPT-3 paper pointedly omits any discussion of hyperparameter tuning and implies only one run (eg the dataset contamination issue).
- Mati_Roy 21 Sep 2020 11:32 UTC
  1 point
  0
  Parent
  Good to know, thanks!