Yes, that’s what I’m arguing. Really massive gains in algorithmic efficiency, plus gains in decentralized training and peak capability and continual learning, not necessarily all at once though. Maybe just enough that you then feel confident to continue scraping together additional resources to pour into your ongoing continual training. Renting GPUs from datacenters all around the world (smaller providers like Vast.ai, Runpod, Lambda Labs, plus marginal amounts from larger providers like AWS and GCP, all rented in the name of a variety of shell companies). The more compute you put in, the better it works, the more money you are able to earn (or convince investors or governments to give you) with the model-so-far, the more compute you can afford to rent....
Not necessarily exactly this story, just something in this direction.
Yes, that’s what I’m arguing. Really massive gains in algorithmic efficiency, plus gains in decentralized training and peak capability and continual learning, not necessarily all at once though. Maybe just enough that you then feel confident to continue scraping together additional resources to pour into your ongoing continual training. Renting GPUs from datacenters all around the world (smaller providers like Vast.ai, Runpod, Lambda Labs, plus marginal amounts from larger providers like AWS and GCP, all rented in the name of a variety of shell companies). The more compute you put in, the better it works, the more money you are able to earn (or convince investors or governments to give you) with the model-so-far, the more compute you can afford to rent....
Not necessarily exactly this story, just something in this direction.