gwern comments on Collection of GPT-3 results

gwern 22 Jul 2020 16:38 UTC
3 points
0
Although smaller is not very interesting, especially if you want to probe the model’s understanding and intelligence. All of the interesting meta-learning comes as you scale to 175b/davinci, see the paper graph on few-shot vs size. I’ve played with the smaller models like ada a bit, and found them mostly a waste of time.