CUDA kernel - nested for loop

后端 未结 3 1255
[愿得一人]
[愿得一人] 2020-12-08 09:01

Hello I\'m trying to write a CUDA kernel to perform the following piece of code.

for (n = 0; n < (total-1); n++)
{
  a = values[n];

  for ( i = n+1; i &         


        
相关标签:
3条回答
  • 2020-12-08 09:41

    I'll probably be way wrong but the n < (total-1) check in

    for (int n = idx; n < (total-1); n += blockDim.x*gridDim.x)
    

    seems different than the original version.

    0 讨论(0)
  • 2020-12-08 09:47

    Realize this problem in 2D and launch your kernel with 2D thread blocks. The total number of threads in x and y dimension will be equal to total . The kernel code should look like this:

    __global__ void calc(float *values, float *newvalues, int total){
    
    
    float a,b,c;
    
    int n= blockIdy.y * blockDim.y + threadIdx.y;
    int i= blockIdx.x * blockDim.x + threadIdx.x;
    
      if (n>=total || i>=total)
            return;
    
    a = values[n];
    b = values[i] - a;
    c = b*b;
     if( c < 10)
            newvalues[i] = c;  
    
    // I don't know your problem statement but i think it should be like: newvalues[n*total+i] = c;  
    
    
    }
    

    Update:

    This is how you should call the kernel

    dim3 block(16,16);
    dim3 grid (  (total+15)/16,  (total+15)/16  );
    calc<<<grid,block>>>(float *val, float *newval, int T);
    

    Also make sure you add this line in kernel (see updated kernel)

    if (n>=total || i>=total)
    return;
    
    0 讨论(0)
  • 2020-12-08 09:51

    Why don't you just remove the outter loop and start the kernel with as many threads as you need for this loop? It's a bit weird to have a loop that depends on your blockId. Normally you try to avoid these loops. Secondly it seems to me that newvalues[i] can be overriden by different threads.

    0 讨论(0)
提交回复
热议问题