Ege Erdil comments on $250 prize for checking Jake Cannell’s Brain Efficiency

Ege Erdil 19 May 2023 10:47 UTC
12 points
0
The GPU needs numbers to be stored in registers inside the GPU before it can do operations on them. A memory operation (what Jacob calls MEM) is when you load a particular value from memory into a register. An arithmetic operation is when you do an elementary arithmetic operation such as addition or multiplication on two values that have already been loaded into registers. These are done by the arithmetic-logic unit (ALU) of the processor so are called ALU ops.

Because a matrix multiplication of two $N \times N$ matrices only involves $2 N^{2}$ distinct floating point numbers as input, and writing the result back into memory is going to cost you another $N^{2}$ memory operations, the total MEM ops cost of a matrix multiplication of two matrices of size $N \times N$ is $3 N^{2}$ . In contrast, if you’re using the naive matrix multiplication algorithm, computing each entry in the output matrix takes you $N$ additions and $N$ multiplications, so you end up with $2 N \cdot N^{2} = 2 N^{3}$ ALU ops needed.

The ALU:MEM ratio is important because if your computation is imbalanced relative to what is supported by your hardware then you’ll end up being bottlenecked by one of them and you’ll be unable to exploit the surplus resources you have on the other side. For instance, if you’re working with a bizarre GPU that has a 1:1 ALU:MEM ratio, whenever you’re only using the hardware to do matrix multiplications you’ll have enormous amounts of MEM ops capacity sitting idle because you don’t have the capacity to be utilizing them.
- Alexander Gietelink Oldenziel 19 May 2023 13:35 UTC
  3 points
  0
  Parent
  This is helpful, thanks a ton Ege!