Understanding the strcmp function of gnu libc

前端未结

关注

 2  1997

Here is the strcmp function that i found in the glibc:

int
STRCMP (const char *p1, const char *p2)
{
  const unsigned char *s1 = (const unsigned


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2021-01-06 06:21
              
            
            
                                                                       
You're correct of course. One of the casts should be sufficient. Especially if the pointer is cast, casting the retrieved value is a no-op.



Here is the x86-64 compiled with gcc -O3 for the one with unnecessary cast:

STRCMP:
.L4:
        addq    $1, %rdi
        movzbl  -1(%rdi), %eax
        addq    $1, %rsi
        movzbl  -1(%rsi), %edx
        testb   %al, %al
        je      .L7
        cmpb    %dl, %al
        je      .L4
        subl    %edx, %eax
        ret
.L7:
        movzbl  %dl, %eax
        negl    %eax
        ret


and here's the one without the unnecessary cast:

STRCMP:
.L4:
        addq    $1, %rdi
        movzbl  -1(%rdi), %eax
        addq    $1, %rsi
        movzbl  -1(%rsi), %edx
        testb   %al, %al
        je      .L7
        cmpb    %dl, %al
        je      .L4
        subl    %edx, %eax
        ret
.L7:
        movzbl  %dl, %eax
        negl    %eax
        ret


They're identical



However there is one gotcha, that is now mostly of historical interest. If char is signed and the signed representation is not two's complement, 

*(const unsigned char *)p1


and

(unsigned char)*p1


are not equivalent. The former reinterprets the bit pattern, while the latter converts the value using modulo arithmetic. This is of only historical interest since not even GCC supports any architecture that doesn't have 2's complement signed representation. And it is the compiler with most ports.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  無奈伤痛        
                
              
                            
                2021-01-06 06:29
              
            
            
                                                                       

  What i didn't understand is the use of s1 and s2 variables. I mean other than the fact that they are unsigned char they are also const like the 2 arguments p1 and p2, so why not just use the p1 and p2 inside the body of while and cast them ?


For readability; to make it easier for us humans to maintain the code.

If you look at glibc sources, the code tends to readability rather than concise expressions.  It seems to be a good policy, because it has kept it relevant and vibrant (actively maintained) for over 30 years now.


  Does in this case using those 2 extra variables make the function somehow more optimized?


No, not at all.


  I would also like to know if there are any particular reason why glibc used those extra variables instead of casting the parameters p1 and p2 directly inside while.


For readability only.

The authors know that the C compiler used should be able to optimize this code just fine. (And it is easy to prove that is the case, just by looking at the code compiler generateds. For GCC, you can use the -S option, or you can use the binutils' objdump -d to examine an object file or a binary executable.)

Note that the casts to unsigned char are required for the exact same reasons as they are for isspace(), isalpha() et cetera: the character codes compared must be treated as unsigned char for correct results. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复