Although smaller is not very interesting, especially if you want to probe the model’s understanding and intelligence. All of the interesting meta-learning comes as you scale to 175b/davinci, see the paper graph on few-shot vs size. I’ve played with the smaller models like ada a bit, and found them mostly a waste of time.
Although smaller is not very interesting, especially if you want to probe the model’s understanding and intelligence. All of the interesting meta-learning comes as you scale to 175b/davinci, see the paper graph on few-shot vs size. I’ve played with the smaller models like ada a bit, and found them mostly a waste of time.