CUDA - why is warp based parallel reduction slower?

后端 未结 2 1788
温柔的废话
温柔的废话 2021-02-14 13:49

I had the idea about a warp based parallel reduction since all threads of a warp are in sync by definition.

So the idea was that the input data can be reduced by fact

2条回答
  •  灰色年华
    2021-02-14 14:36

    You should also check the Examples in the SDK. I remember one very nice example with implementations of several ways of reductions. At least one of those also uses warp based reduction.

    (I can't look up the name right now, because I have it only installed on my other machine)

提交回复
热议问题