Is there a way of setting default value for shared memory array?

后端未结

关注

 3  1026

Consider the following code:

__global__ void kernel(int *something) {
extern __shared__ int shared_array[];     

// Some operations on shared_array here.

}


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2020-12-06 02:15
              
            
            
                                                                       
No. Shared memory is uninitialised. You have to somehow initialise it yourself, one way or another...

From CUDA C Programming Guide 3.2, Section B.2.4.2, paragraph 2:


  __shared__ variables cannot have an initialization as part of their declaration.


This also discards nontrivial default constructors for shared variables.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  春和景丽        
                
              
                            
                2020-12-06 02:17
              
            
            
                                                                       
You can efficiently initialize shared arrays in parallel like this

// if SHARED_SIZE == blockDim.x, eliminate this loop
for (int i = threadIdx.x; i < SHARED_SIZE; i += blockDim.x) 
    shared_array[i] = INITIAL_VALUE;
__syncthreads();

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一个人的身影        
                
              
                            
                2020-12-06 02:18
              
            
            
                                                                       
Yes, you can. You can specify that the first thread in the block sets it, while the other's don't eg.:

extern __shared__ unsigned int local_bin[]; // Size specified in kernel call

if (threadIdx.x == 0) // Wipe on first thread - include " && threadIdx.y == 0" and " && threadIdx.z == 0"  if threadblock has 2 or 3 dimensions instead of 1.
{
    // For-loop to set all local_bin array indexes to specified value here - note you cannot use cudaMemset as it translates to a kernel call itself
}

// Do stuff unrelated to local_bin here    

__syncthreads(); // To make sure the memset above has completed before other threads start writing values to local_bin.

// Do stuff to local_bin here


Ideally you should do as much work as possible before the syncthreads call, as this allows for all the other threads to do their work before the memset is complete - obviously this only matters if the work has the potential to have quite different thread completion times, for example if there is conditional branching.
Note that for the thread 0 "setting" for-loop, you need to have passed the size of the local_bin array as a parameter to the kernel so you know the size of the array you are iterating.

Original concept source
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复