Reduce multiple blocks of equal length that are arranged in a big vector Using CUDA

点点圈 提交于 2019-12-11 04:25:34

问题


I am looking for a fast way to reduce multiple blocks of equal length that are arranged as a big vector. I have N subarrays(contiguous elements) that are arranged in one big array. each sub array has a fixed size : k. so the size of the whole array is : N*K

What I'm doing is to call the kernel N times. in each time it computes the reduction of the subarray as follow: I will iterate over all the subarrays contained in the big vector :

    for(i=0;i<N;i++){
       thrust::device_vector< float > Vec(subarray, subarray+k);
       float sum = thrust::reduce(Vec.begin(), Vec.end(), (float)0, thrust::plus<float>());
       printf("sum %f\n",sum);
 }

for pure CUDA i will do it like this (pseudo code):

 for(i=0;i<N;i++){
        reduction_kernel(subarray)

         }

do you have another solution to perform the reduction of the contiguous subarrays in once? using pure CUDA or Thrust


回答1:


What you're asking for is a segmented reduction. This can be done in thrust using thrust::reduce_by_key In addition to your data vector of length N*K, we will need a "key" vector that defines each segment -- the segments don't have to be the same size, as long as the key vector differentiates segments like so:

data:  1 3 2 3 1 4 2 3 2 1 4 2 ...
keys:  0 0 0 1 1 1 0 0 0 3 3 3 ...
seg:   0 0 0 1 1 1 2 2 2 3 3 3 ...

The keys delineate a new segment any time the key sequence changes (note that I have two separate segments in the above example that are delineated using the same key - thrust doesn't group such segments together but treats them separately because there are 1 or more intervening key values that are different). You don't actually have this data, but for speed and efficiency, since your segments are of equal length, we can produce the necessary key sequence "on the fly" using a combination of thrust fancy iterators.

The fancy iterators will combine to:

  1. produce a linear sequence 0 1 2 3 ... (via counting_iterator)
  2. divide each member of the linear sequence by K, the segment length (via transform_iterator). I'm using thrust placeholder methodology here so I don't have to write a functor for the transform iterator.

This will produce the necessary segment-key sequence.

Here is a worked example:

$ cat t1282.cu
#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/copy.h>
#include <thrust/execution_policy.h>
#include <iostream>

const int N = 1000;  // sequences
const int K = 100;   // length of sequence
typedef int mytype;

using namespace thrust::placeholders;

int main(){

  thrust::device_vector<mytype> data(N*K, 1);
  thrust::device_vector<mytype> sums(N);
  thrust::reduce_by_key(thrust::device, thrust::make_transform_iterator(thrust::counting_iterator<int>(0), _1/K), thrust::make_transform_iterator(thrust::counting_iterator<int>(N*K), _1/K), data.begin(), thrust::discard_iterator<int>(), sums.begin());
  // just display the first 10 results
  thrust::copy_n(sums.begin(), 10, std::ostream_iterator<mytype>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -arch=sm_35 -o t1282 t1282.cu
$ ./t1282
100,100,100,100,100,100,100,100,100,100,
$


来源:https://stackoverflow.com/questions/42260493/reduce-multiple-blocks-of-equal-length-that-are-arranged-in-a-big-vector-using-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!