future tricked out laptop: 10^15 flops or something?
current big AI training cluster: 10^20 flops or something?
future big AI training cluster: 10^23 flops or something?
That’s a sizable gap but not all that crazy in the grand scheme of things? Intuitively one would have to have a pretty detailed (and accurate) understanding of things to (calibratedly) predict “yes train on datacenter, no train on laptop”. Does that make any sense?
I mean, I’m not super confident. I think “it’s very unobvious to me that there’s a way …” indicated mild skepticism / request for more reasons to think why it should be possible.
Are you saying that it’s plausible that those 8 OoMs can be bridged by various algorithmic tricks or something?
I wasn’t directly contradicting your statement, I think. I meant to say “If you’re confident you can do it with a datacenter, then you should probably mostly guess that you can do it on a laptop, unless you have some pretty strong specific model here.”. IDK whether you’re confident you can do it with a datacenter.
I do think it’s very likely you can do it on a laptop, though it’s a bit hard to express why, and it’s reasonable to treat it as unobvious. I think it’s coming from various intuitions like “the info density of the human brain isn’t crazy high” and “the effective flops needed for much of what human brains do day to day isn’t that high” and “well, you could do a bunch of clever caching and just-in-time compressing/uncompressing and reordering of various parallel-ish computations, and thereby fit things into tiny amounts of data”. As another intuition pump, consider the demoscene: https://en.wikipedia.org/wiki/Demoscene where they put impressive visual displays in very small amounts of data (though IDK how much compute is involved) https://www.youtube.com/watch?v=R-4wHUw_OdE&list=PLMuQbRD9kQr5hsbu9uyFSBA7kOrowz6Xo
Very rough BOTEC, corrections welcome:
A thing to keep in mind:
laptop: 10^13 flops or something?
future tricked out laptop: 10^15 flops or something?
current big AI training cluster: 10^20 flops or something?
future big AI training cluster: 10^23 flops or something?
That’s a sizable gap but not all that crazy in the grand scheme of things? Intuitively one would have to have a pretty detailed (and accurate) understanding of things to (calibratedly) predict “yes train on datacenter, no train on laptop”. Does that make any sense?
I mean, I’m not super confident. I think “it’s very unobvious to me that there’s a way …” indicated mild skepticism / request for more reasons to think why it should be possible.
Are you saying that it’s plausible that those 8 OoMs can be bridged by various algorithmic tricks or something?
I wasn’t directly contradicting your statement, I think. I meant to say “If you’re confident you can do it with a datacenter, then you should probably mostly guess that you can do it on a laptop, unless you have some pretty strong specific model here.”. IDK whether you’re confident you can do it with a datacenter.
I do think it’s very likely you can do it on a laptop, though it’s a bit hard to express why, and it’s reasonable to treat it as unobvious. I think it’s coming from various intuitions like “the info density of the human brain isn’t crazy high” and “the effective flops needed for much of what human brains do day to day isn’t that high” and “well, you could do a bunch of clever caching and just-in-time compressing/uncompressing and reordering of various parallel-ish computations, and thereby fit things into tiny amounts of data”. As another intuition pump, consider the demoscene: https://en.wikipedia.org/wiki/Demoscene where they put impressive visual displays in very small amounts of data (though IDK how much compute is involved) https://www.youtube.com/watch?v=R-4wHUw_OdE&list=PLMuQbRD9kQr5hsbu9uyFSBA7kOrowz6Xo