Performance penalty when invoking a cuda kernel

后端未结

关注

 2  900

轻奢々 2021-02-08 20:22

I\'m wondering what the overhead of performing a cuda kernel call is in C/C++ such as the following:

somekernel1<<>>(args);
som


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   野性不改
                                             
                
                
                (楼主)
            
              
              
                2021-02-08 20:49
              

            
            
                        
The host side overhead of a kernel launch uaing the runtime API is only about 15-30 microseconds on non-WDDM Windows platforms. On WDDM platforms (which I don't use), I understand it can be much, much higher, plus there is some sort of batching mechanism in the driver which tries to amortise the cost by doing multiple operations in a single driver side operation.

Generally, there will be a performance increase in "fusing" multiple data operations which would otherwise be done in separate kernels into a single kernel, where the algorithms allow it. The GPU has much higher arithmetic peak performance than peak memory bandwidth, so the more FLOPs which can be executed per memory transaction (and per kernel "setup code"), the better the performance of the kernel will be. On the other hand, trying to write a "swiss army knife" style kernel which tries to cram completely disparate operations into a single piece of code is never a particularly good idea, because it increases register pressure and reduce the efficiency of things like L1, constant memory and texture caches.

Which way you choose to go should really be guided by the nature of the code/algorithms. I don't believe there is a single "correct" answer to this question that can be applied in all circumstances.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复