Can blocked matrix multiplication operation be optimally reduced to 2*N^3/B^3+N^2/B^2 reads on cache?

前端 未结 0 909
不知归路
不知归路 2021-02-09 08:36

According to this paper https://suif.stanford.edu/papers/lam-asplos91.pdf, the number of data read is 2N^3+N^2 for a square matrix multiplication operation (XY=Z) in th

相关标签:
回答
  • 消灭零回复
提交回复
热议问题