Is there a way of setting default value for shared memory array?

后端 未结 3 1026
悲&欢浪女
悲&欢浪女 2020-12-06 02:06

Consider the following code:

__global__ void kernel(int *something) {
extern __shared__ int shared_array[];     

// Some operations on shared_array here.

}         


        
相关标签:
3条回答
  • 2020-12-06 02:15

    No. Shared memory is uninitialised. You have to somehow initialise it yourself, one way or another...

    From CUDA C Programming Guide 3.2, Section B.2.4.2, paragraph 2:

    __shared__ variables cannot have an initialization as part of their declaration.

    This also discards nontrivial default constructors for shared variables.

    0 讨论(0)
  • 2020-12-06 02:17

    You can efficiently initialize shared arrays in parallel like this

    // if SHARED_SIZE == blockDim.x, eliminate this loop
    for (int i = threadIdx.x; i < SHARED_SIZE; i += blockDim.x) 
        shared_array[i] = INITIAL_VALUE;
    __syncthreads();
    
    0 讨论(0)
  • 2020-12-06 02:18

    Yes, you can. You can specify that the first thread in the block sets it, while the other's don't eg.:

    extern __shared__ unsigned int local_bin[]; // Size specified in kernel call
    
    if (threadIdx.x == 0) // Wipe on first thread - include " && threadIdx.y == 0" and " && threadIdx.z == 0"  if threadblock has 2 or 3 dimensions instead of 1.
    {
        // For-loop to set all local_bin array indexes to specified value here - note you cannot use cudaMemset as it translates to a kernel call itself
    }
    
    // Do stuff unrelated to local_bin here    
    
    __syncthreads(); // To make sure the memset above has completed before other threads start writing values to local_bin.
    
    // Do stuff to local_bin here
    

    Ideally you should do as much work as possible before the syncthreads call, as this allows for all the other threads to do their work before the memset is complete - obviously this only matters if the work has the potential to have quite different thread completion times, for example if there is conditional branching. Note that for the thread 0 "setting" for-loop, you need to have passed the size of the local_bin array as a parameter to the kernel so you know the size of the array you are iterating.

    Original concept source

    0 讨论(0)
提交回复
热议问题