That probably doesn’t scale to near agi level models. The reason is this paper is having each “node” in the network able to host a model at all, then compression the actual weight updates between training instances. It’s around 128 H100s per instance of gpt-4. So you can compress the data sent between nodes but this paper will not allow you to say have 4000 people with a 2060, interconnected by typical home Internet links with 30-1000mb upload, and somehow be able to run even 1 instance of gpt-4 at a usable speed.
The reason this fails is you have to send the actual activations from however you sliced the network tensors through the slow home upload links. This is so slow it may be no faster than simply using a single 2060 and the computers SSD to stash in progress activations.
Obviously if Moore’s law continues at the same rate this won’t be true. If it’s a doubling of compute per dollar every 2.5 years then in 25 years with 1000 times cheaper compute, and assuming no government regulations where home users have this kind of performance in order to host ai locally, then this could be a problem.
For the second point: they are already trying to solve this
That probably doesn’t scale to near agi level models. The reason is this paper is having each “node” in the network able to host a model at all, then compression the actual weight updates between training instances. It’s around 128 H100s per instance of gpt-4. So you can compress the data sent between nodes but this paper will not allow you to say have 4000 people with a 2060, interconnected by typical home Internet links with 30-1000mb upload, and somehow be able to run even 1 instance of gpt-4 at a usable speed.
The reason this fails is you have to send the actual activations from however you sliced the network tensors through the slow home upload links. This is so slow it may be no faster than simply using a single 2060 and the computers SSD to stash in progress activations.
Obviously if Moore’s law continues at the same rate this won’t be true. If it’s a doubling of compute per dollar every 2.5 years then in 25 years with 1000 times cheaper compute, and assuming no government regulations where home users have this kind of performance in order to host ai locally, then this could be a problem.