If you do a matrix multiplication the obvious way, this results in dot products of rows and columns (one for each element of the resulting matrix). So it seems to me that improving matrix to matrix multiplication performance comes from improving the performance of dot products.
This seems like a decent explanation of Hardware Matrix Multiplication, even if it lacks concrete sources.
As for a tensor, I think thesereferences explain it better that I can at my current level. But the intuition is that it’s a generalization of a matrix to high-dimensions, with additional properties when transformed.
To me there is a difference between the hardware for 1xN by Nx1 and MxN by NxM (with N > M > 1). Although any matrix operation is many 1xN by Nx1 dot products, doing them independently would be inefficient. ”If you do a matrix multiplication the obvious way, this results in dot products of rows and columns (one for each element of the resulting matrix). So it seems to me that improving matrix to matrix multiplication performance comes from improving the performance of dot products.” True, but not individual dot products, but the collective of very many dot products. Obviously you do not do it the obvious way as you would have to load the same data over and over again.
If you do a matrix multiplication the obvious way, this results in dot products of rows and columns (one for each element of the resulting matrix). So it seems to me that improving matrix to matrix multiplication performance comes from improving the performance of dot products.
This seems like a decent explanation of Hardware Matrix Multiplication, even if it lacks concrete sources.
As for a tensor, I think these references explain it better that I can at my current level. But the intuition is that it’s a generalization of a matrix to high-dimensions, with additional properties when transformed.
To me there is a difference between the hardware for 1xN by Nx1 and MxN by NxM (with N > M > 1). Although any matrix operation is many 1xN by Nx1 dot products, doing them independently would be inefficient.
”If you do a matrix multiplication the obvious way, this results in dot products of rows and columns (one for each element of the resulting matrix). So it seems to me that improving matrix to matrix multiplication performance comes from improving the performance of dot products.”
True, but not individual dot products, but the collective of very many dot products. Obviously you do not do it the obvious way as you would have to load the same data over and over again.