Parallelize function which will count all vectors with sum equal of vector elements and elements not bigger of k

前端 未结 3 1524
旧巷少年郎
旧巷少年郎 2021-01-28 18:16

I want to parallelize a function in CUDA C which will count all vectors with sum equal of vector elements and elements not bigger than k. For example if the number of vector ele

3条回答
  •  离开以前
    2021-01-28 18:30

    As Robert said in comments, if you want to generate all (k+1)^n permutations on GPU and test them, you can think of some GPU kernel like this:

    __device__ int count;  //global variable must be initialized to zero before kernel call
    __global__ void perm_generator(int k, int n, int sum) {
       int tid = blockIdx.x*blockDim.x+threadIdx.x;
       int id = tid;
       int mysum = 0;
       for ( int i = n; i > 1; i-- ) { //all n-1 vector elements
         mysum += (id % (k+1));
         id /= (k+1);
       }
       mysum += id; //last element
       if ( mysum == sum ) atomicAdd( &count, 1 );
    }
    

    The kernel should be called with exactly (k+1)^n threads. If you happen to call your kernel with more threads (simply because rule of thumb that block dimension should be multiple of 32), you need to check value of tid inside your kernel beforehand. Also, cudaThreadSynchronize() is deprecated. Use cudaDeviceSynchronize() instead.

提交回复
热议问题