Vladimir_Nesov comments on Parameters Are Like Pixels

Vladimir_Nesov 14 Jan 2026 18:45 UTC
5 points
0

larger models need a much larger training set even to match smaller models

This is empirically false, perplexity on a test set goes down with increase in model size even for a fixed dataset. See for example Figure 2 in the Llama 3 report, larger models do better with say 1e10 tokens on that plot.

Larger models could be said to want a larger dataset, in the sense that if you are training compute optimally, then with more compute you want both the model size and the dataset size to increase, and so the model size increases together with the dataset size. But even with the dataset of the same size they still do better, at least while reasonably close to compute optimal numbers of tokens.