Hundreds of years seems far too strong even if the only driver were Moore’s Law, which it isn’t. You also have:
Improvements in algorithms, data and software
Increased optimization for ML workloads on laptops for running local models for autocomplete, transcription, simple tasks etc
You also don’t need to get into actual ‘run on laptop’ range, even 2-3OOMs above that would allow you to train on things that would barely register as datacenters. 10-15 years seems more likely for training > current frontier models on not-really-a-datacenters especially if you want a buffer to account for the uncertainty that there could be discrete breakthroughs in training efficiency
If you can prevent algorithmic progress then I agree somewhat, though experiments to make this sort of progress should be doable on small volumes of compute so you’d need to suppress the research or publishing.
I do think that not being able to acquire, say, $1M worth of matmul-adapted compute is a higher bar than you imply here. Being able to do large numbers of matmuls is an extremely useful property for like a zillion reasons beyond AI—iirc Google poured at least hundreds of millions into building TPUs based only on the projected demand for very simple NLP algorithms. LLM-optimized matmul machines are helpful but you can use anything if you’re willing to adapt your algorithms and software. I would expect rendering farm or basically any serious cluster at all in 15y to be able to train >current models.