SoerenMind comments on Inference cost limits the impact of ever larger models

SoerenMind 15 Nov 2021 10:17 UTC
2 points
0
My point is that, while PCIe bandwidths aren’t increasing very quickly, it’s easy to increase the number of machines you use. So you can distribute each NN layer (width-wise) across many machines, each of which adds to the total bandwidth you have.

(As noted in the previous comment, you can do this with <<300GB of total GPU memory for GPT-3 with something like ZeRO-infinity)