Is CUDA warp scheduling deterministic?

后端 未结 1 1395
执笔经年
执笔经年 2021-02-10 04:53

I am wondering if the warp scheduling order of a CUDA application is deterministic.

Specifically I am wondering if the ordering of warp execution will stay the same wit

相关标签:
1条回答
  • 2021-02-10 05:23

    The precise behavior of CUDA warp scheduling is not defined. Therefore you cannot depend on it being deterministic. In particular, if multiple warps are ready to be executed in a given issue slot, there is no description of which warp will be selected by the warp scheduler(s).

    There is no external method to precisely control the order of warp execution.

    It's certainly possible to build code that determines warp ID, and forces warps to execute in a particular order. Something like this:

    #include <stdio.h>
    
    #define N_WARPS 16
    #define nTPB (32*N_WARPS)
    
    __device__ volatile int my_next = 0;
    __device__ int warp_order[N_WARPS];
    
    __global__ void my_kernel(){
    
      __shared__ volatile int warp_num;
      unsigned my_warpid = (threadIdx.x & 0x0FE0U)>>5;
      if (!threadIdx.x) warp_num = 0;
      __syncthreads();  // don't use syncthreads() after this point
      while (warp_num != my_warpid);
      // warp specific code here
      if ((threadIdx.x & 0x01F) == 0){
        warp_order[my_next++] = my_warpid;
        __threadfence();
        warp_num++; // release next warp
        } // could use syncthreads() after this point, if more code follows
    }
    
    
    int main(){
    
      int h_warp_order[N_WARPS];
      for (int i = 0; i < N_WARPS; i++) h_warp_order[i] = -1;
      cudaMemcpyToSymbol(warp_order, h_warp_order, N_WARPS*sizeof(int));
      my_kernel<<<1,nTPB>>>();
      cudaDeviceSynchronize();
      cudaMemcpyFromSymbol(h_warp_order, warp_order, N_WARPS*sizeof(int));
      for (int i = 0; i < N_WARPS; i++) printf("index: %d, warp_id: %d\n", i, h_warp_order[i]);
      return 0;
    }
    

    allowing only one warp to execute at a time will be very inefficient, of course.

    In general, the best parallelizable algorithms have little or no order dependence.

    0 讨论(0)
提交回复
热议问题