jacob_cannell comments on Memory bandwidth constraints imply economies of scale in AI inference

jacob_cannell 20 Sep 2023 19:04 UTC
18 points
8
I point this—the VN bottleneck—out now and then.

Its really just a simple consequence of scaling geometry. Compute scales with device surface area (for 2d chips) or volume (for 3d systems like the brain), while bandwidth/interconnect scales with dimension minus one.

A few years back VCs were fooled by a number of well meaning startups based on the pitch “We can just make a big matmul chip like a GPU but with far more on chip SRAM and thereby avoid the VN bottleneck!” But Nvidia is in fact pretty smart, and understands why exactly this approach doesn’t actually work (at least not yet with SRAM), and much money was wasted.

I used to be pretty excited about neuromorphic computing around 2010 ish. I still am—but today it still seems to be about a decade away.
- Mo Putera 31 Dec 2023 18:39 UTC
  3 points
  2
  Parent
  A few years back VCs were fooled by a number of well meaning startups based on the pitch “We can just make a big matmul chip like a GPU but with far more on chip SRAM and thereby avoid the VN bottleneck!”
  Including Cerebras?