[Question] How feasible/​costly would it be to train a very large AI model on distributed clusters of GPUs?

Folding@home is the most powerful supercomputer in the world. It relies on simulations utilizing on a distributed network of GPUs, CPUs, and ARM processors volunteered by people around the world. From some quick Googling, it looks like GPUs account for a large majority of Folding@home’s processing power. This suggests to me that distributed computing networks like Folding@home could potentially be used to train large deep neural networks.

I asked a friend about this, and they offered the following thoughts:

  • I’m highly skeptical of a F@H model for DL training where you have lone GPUs contributing to training. My guess is that any version of distributed training will pose severe latency problems, but to the extent there would be any version not prohibitively costly, it may be something like a set of distributed clusters, where each cluster has a sufficient number of GPUs (probably dozens at least, or even hundreds or more depending on the size of the model?) to store the model and do model parallelism on-site. (Data parallelism would span clusters.)

  • I think there’s an interesting question of how much more costly it would be. If it’s, say, 1.5x, then someone might do it to evade detection in a world where there existed a method to detect truly massive supercomputers. On the other hand, a 5x penalty would mean nobody would ever bother, probably.

This second bullet point is the question I want to ask: how much more costly it would be to train a very large AI model on a set of distributed clusters of compute, where each cluster has a sufficient number of GPUs to store the model and do model parallelism on-site? It would also be helpful to know whether/​how much this premium might change in the future.