GLSL branching behaviour

前端未结

关注

 2  1829

I have a rather simple fragment shader with a branch and I\'m a bit unsure how it is handled by the GLSL compiler and how it would affect performance.

unifor


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2021-01-02 00:18
              
            
            
                                                                       
Here you have it:

il_ps_2_0
dcl_input_generic_interp(linear) v1
dcl_resource_id(0)_type(2d)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
eq r2.xy__, c1.xyyy, c0.xyyy
imul r5.x___, r2.x, r2.y
mov r1.x___, r5.x
if_logicalnz r1.x
    sample_resource(0)_sampler(0) r6, v1.xyyy
    mov r7, r6
else
    sample_resource(0)_sampler(0) r8, v1.xyyy
    mov r7, r8
endif
mov r9, r7
mov oC0, r9
endmain


To rephrase a bit what Kos said, what matters is to know if the guard condition can be known before execution. This is the case here since c1 and c0 registers are constant (constant registers start with letter 'c') and so is r1.x register value.

That means this value is the same for all invocated fragment shaders, therefore no thread divergence can happen.

Btw, I'm using AMD GPU ShaderAnalyser for transforming GLSL into the IL.
You can also generate native GPU assembly code for a specific generation (ranging from HD29xx to HD58xx).This is really a good tool!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2021-01-02 00:31
              
            
            
                                                                       
Yes, IIRC you won't hit a performance overhead since all the threads in a single batch (warp) on a single GPU processor will go through a single branch. By 'thread' I mean 'a single execution line of the shader'.

The efficiency problem arises when a part of threads executed at the given time by a given processor (which'd be up to like 32 threads AFAIK; depends on hardware, I'm giving the numbers for G80 architecture) would branch into several branches - two different instructions at a time cannot be executed by one processor, so firstly the "if" branch would be executed by a part of threads (and the remaining would wait), and then the "else" branch would get executed by the rest.

That's not the case with your code, so I believe you're safe.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复