OpenMP parallelizing matrix multiplication by a triple for loop (performance issue)

后端未结

关注

 2  1758

情歌与酒 2021-02-06 08:45

I\'m writing a program for matrix multiplication with OpenMP, that, for cache convenience, implements the multiplication A x B(transpose) rows X rows instead of the classic A x

2条回答

时光取名叫无心 (楼主)

2021-02-06 09:15
Try hitting the result less often. This induces cacheline sharing and prevents the operation from running in parallel. Using a local variable instead will allow most of the writes to take place in each core's L1 cache.

Also, use of restrict may help. Otherwise the compiler can't guarantee that writes to C aren't changing A and B.

Try:
```
for (i=0; i
```
Also, I think Elalfer is right about needing reduction if you parallelize the innermost loop.
0 讨论(0) 查看其它2个回答发布评论: 提交评论加载中...