I think this is a super important post. Thanks for publishing it!
One question that occurred to me while reading:
You assume that we will have a massive compute overhang once we have this new architecture. Is there a reason to expect that GPUs would remain useful? Or should we expect that a new architecture that’s sufficiently far away from the DL paradigm would actually need some new type of hardware? I really don’t know the answer to this so would be cool if you could shed some light on it. I guess if efficiency gains are sufficiently large with a new architecture then this becomes somewhat moot.
I don’t think GPUs would be the best of all possible chip designs for the next paradigm, but I expect they’ll work well enough (after some R&D on the software side, which I expect would be done early on, during the “seemingly irrelevant” phase, see §1.8.1.1). It’s not like any given chip can run one and only one algorithm. Remember, GPUs were originally designed for processing graphics :) And people are already today running tons of AI algorithms on GPUs that are not deep neural networks (random example).
I concur with that sentiment. GPUs hit a sweet spot between compute efficiency and algorithmic flexibility. CPUs are more flexible for arbitrary control logic, and custom ASICs can improve compute efficiency for a stable algorithm, but GPUs are great for exploring new algorithms where SIMD-style control flows exist (SIMD=single instruction, multiple data).
Or should we expect that a new architecture that’s sufficiently far away from the DL paradigm would actually need some new type of hardware?
My expectation is that it’d be possible to translate any such architecture into a format that would efficiently run on GPUs/TPUs with some additional work, even if its initial definition would be e. g. neurosymbolic.
Though I do think it’s an additional step that the researchers would need to think of and execute, which might delay the doom for years (if it’s too inefficient in its initial representation).
I think this is a super important post. Thanks for publishing it!
One question that occurred to me while reading:
You assume that we will have a massive compute overhang once we have this new architecture. Is there a reason to expect that GPUs would remain useful? Or should we expect that a new architecture that’s sufficiently far away from the DL paradigm would actually need some new type of hardware? I really don’t know the answer to this so would be cool if you could shed some light on it. I guess if efficiency gains are sufficiently large with a new architecture then this becomes somewhat moot.
I don’t think GPUs would be the best of all possible chip designs for the next paradigm, but I expect they’ll work well enough (after some R&D on the software side, which I expect would be done early on, during the “seemingly irrelevant” phase, see §1.8.1.1). It’s not like any given chip can run one and only one algorithm. Remember, GPUs were originally designed for processing graphics :) And people are already today running tons of AI algorithms on GPUs that are not deep neural networks (random example).
I concur with that sentiment. GPUs hit a sweet spot between compute efficiency and algorithmic flexibility. CPUs are more flexible for arbitrary control logic, and custom ASICs can improve compute efficiency for a stable algorithm, but GPUs are great for exploring new algorithms where SIMD-style control flows exist (SIMD=single instruction, multiple data).
My expectation is that it’d be possible to translate any such architecture into a format that would efficiently run on GPUs/TPUs with some additional work, even if its initial definition would be e. g. neurosymbolic.
Though I do think it’s an additional step that the researchers would need to think of and execute, which might delay the doom for years (if it’s too inefficient in its initial representation).