Parallelize function which will count all vectors with sum equal of vector elements and elements not bigger of k

前端未结

关注

 3  1524

旧巷少年郎 2021-01-28 18:16

I want to parallelize a function in CUDA C which will count all vectors with sum equal of vector elements and elements not bigger than k. For example if the number of vector ele

3条回答

离开以前 (楼主)

2021-01-28 18:30
As Robert said in comments, if you want to generate all (k+1)^n permutations on GPU and test them, you can think of some GPU kernel like this:
```
__device__ int count;  //global variable must be initialized to zero before kernel call
__global__ void perm_generator(int k, int n, int sum) {
   int tid = blockIdx.x*blockDim.x+threadIdx.x;
   int id = tid;
   int mysum = 0;
   for ( int i = n; i > 1; i-- ) { //all n-1 vector elements
     mysum += (id % (k+1));
     id /= (k+1);
   }
   mysum += id; //last element
   if ( mysum == sum ) atomicAdd( &count, 1 );
}
```
The kernel should be called with exactly (k+1)^n threads. If you happen to call your kernel with more threads (simply because rule of thumb that block dimension should be multiple of 32), you need to check value of tid inside your kernel beforehand. Also, cudaThreadSynchronize() is deprecated. Use cudaDeviceSynchronize() instead.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...