According to this paper https://suif.stanford.edu/papers/lam-asplos91.pdf, the number of data read is 2N^3+N^2 for a square matrix multiplication operation (XY=Z) in th