OpenMP parallelizing matrix multiplication by a triple for loop (performance issue)

后端 未结 2 1758
情歌与酒
情歌与酒 2021-02-06 08:45

I\'m writing a program for matrix multiplication with OpenMP, that, for cache convenience, implements the multiplication A x B(transpose) rows X rows instead of the classic A x

2条回答
  •  时光取名叫无心
    2021-02-06 09:15

    Try hitting the result less often. This induces cacheline sharing and prevents the operation from running in parallel. Using a local variable instead will allow most of the writes to take place in each core's L1 cache.

    Also, use of restrict may help. Otherwise the compiler can't guarantee that writes to C aren't changing A and B.

    Try:

    for (i=0; i

    Also, I think Elalfer is right about needing reduction if you parallelize the innermost loop.

提交回复
热议问题