Should CUDA Constant Memory be accessed warp-uniformly?

前端 未结 1 782
情深已故
情深已故 2021-01-27 11:45

My CUDA application has constant memory of less than 8KB. Since it will all be cached, do I need to worry about every thread accessing the same address for optimization?

<
1条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-27 11:53

    Since it will all be cached, do I need to worry about every thread accessing the same address for optimization?

    Yes. The cache itself can only serve up one 32-bit word per cycle.

    If yes, how do I assure all threads are accessing the same address at the same time?

    Ensure that whatever kind of indexing or addressing you use to reference an element in the constant memory area does not depend on any of the built in thread variables, e.g. threadIdx.x, threadIdx.y, or threadIdx.z. Note that the actual requirement is less stringent than this. You can achieve the necessary goal as long as the indexing evaluates to the same number for every thread in a given warp. Here are a few examples:

    __constant__ int data[1024];
    ...
    // assume 1D threadblock
    int idx = threadIdx.x;
    int bidx = blockIdx.x;
    int a = data[idx];      // bad - every thread accesses a different element
    int b = data[12];       // ok  - every thread accesses the same element
    int c = data[b];        // ok  - b is a constant w.r.t threads
    int d = data[b + idx];  // bad
    int e = data[b + bidx]; // ok
    int f = data[idx/32];   // ok - the same element is being accessed per warp
    

    0 讨论(0)
提交回复
热议问题