Faster approach to checking for an all-zero buffer in C?

后端未结

关注

 20  2182

I am searching for a faster method of accomplishing this:

int is_empty(char * buf, int size) 
{
    int i;
    for(i = 0; i < size; i++) {
        if(buf[


                      
              相关标签:


      
      
        
          20条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2020-12-03 05:38
              
            
            
                                                                       
I think I have a good solution for this. 
Create a dummy zeroed array and use memcmp(). Thats what I do.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-12-03 05:43
              
            
            
                                                                       
You stated in your question that you are looking for a most likely unnecessary micro-optimization. In 'normal' cases the ASM approach by Thomas and others should give you the fastest results. 

Still, this is forgetting the big picture. If your buffer is really large, then starting from the start and essential do a linear search is definitely not the fastest way to do this. Assume your cp replacement is quite good at finding large consecutive empty regions but has a few non-empty bytes at the end of the array. All linear searches would require reading the whole array. On the other hand a quicksort inspired algorithm could search for any non-zero elements and abort much faster for a large enough dataset. 

So before doing any kind of micro-optimization I would look closely at the data in your buffer and see if that gives you any patterns. For a single '1', randomly distributed in the buffer a linear search (disregarding threading/parallelization) will be the fastest approach, in other cases not necessarily so.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉酒成梦        
                
              
                            
                2020-12-03 05:44
              
            
            
                                                                       
What about looping from size to zero (cheaper checks):

int is_empty(char * buf, int size) 
{
    while(size --> 0) {
        if(buf[i] != 0) return 0;
    }
    return 1;
}


It must be noted that we probably cannot outperform the compiler, so enable the most aggressive speed optimization in your compiler and assume that you're likely to not go any faster.

Or handling everything using pointers (not tested, but likely to perform quite good):

int is_empty(char* buf, int size)
{
    char* org = buf;

    if (buf[size-1] == 1)
        return 0;

    buf[size-1] = 1;
    while(! *buf++);
    buf--;

    return buf == org[size-1];
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-12-03 05:44
              
            
            
                                                                       
Inline assembly version of the initial C code (no error checking, if uiSize is == 0 and/or the array is not allocated exceptions will be generated. Perhaps use try {} catch() as this might be faster than adding a lot of check to the code. Or do as I do, try not to call functions with invalid values (usually does not work). At least add a NULL pointer check and a size != 0 check, that is very easy.

 unsigned int IsEmpty(char* pchBuffer, unsigned int uiSize)
 {
    asm {
      push esi
      push ecx         

      mov esi, [pchBuffer]
      mov ecx, [uiSize]

      // add NULL ptr and size check here

      mov eax, 0

    next_char:
      repe scasb           // repeat string instruction as long as BYTE ptr ds:[ESI] == 0
                           // scasb does pointer arithmetic for BYTES (chars), ie it copies a byte to al and increments ESI by 1
      cmp cx,0             // did the loop complete?
      je all_chars_zero    // yes, array is all 0
      jmp char_not_zero    // no, loop was interrupted due to BYTE PTR ds:[ESI] != 0

    all_chars_zero:        
      mov eax, 1           // Set return value (works in MASM)
      jmp end  

    char_not_zero:
      mov eax, 0          // Still not sure if this works in inline asm

    end:
      pop ecx
      pop esi          
  }
}


That is written on the fly, but it looks correct enough, corrections are welcome. ANd if someone known about how to set the return value from inline asm, please do tell.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2020-12-03 05:46
              
            
            
                                                                       
Four functions for testing zeroness of a buffer with simple benchmarking:

#include <stdio.h> 
#include <string.h> 
#include <wchar.h> 
#include <inttypes.h> 

#define SIZE (8*1024) 
char zero[SIZE] __attribute__(( aligned(8) ));

#define RDTSC(var)  __asm__ __volatile__ ( "rdtsc" : "=A" (var)); 

#define MEASURE( func ) { \ 
  uint64_t start, stop; \ 
  RDTSC( start ); \ 
  int ret = func( zero, SIZE ); \ 
  RDTSC( stop ); \ 
  printf( #func ": %s   %12"PRIu64"\n", ret?"non zero": "zero", stop-start ); \ 
} 


int func1( char *buff, size_t size ){
  while(size--) if(*buff++) return 1;
  return 0;
}

int func2( char *buff, size_t size ){
  return *buff || memcmp(buff, buff+1, size-1);
}

int func3( char *buff, size_t size ){
  return *(uint64_t*)buff || memcmp(buff, buff+sizeof(uint64_t), size-sizeof(uint64_t));
}

int func4( char *buff, size_t size ){
  return *(wchar_t*)buff || wmemcmp((wchar_t*)buff, (wchar_t*)buff+1, size/sizeof(wchar_t)-1);
}

int main(){
  MEASURE( func1 );
  MEASURE( func2 );
  MEASURE( func3 );
  MEASURE( func4 );
}


Result on my old PC:

func1: zero         108668
func2: zero          38680
func3: zero           8504
func4: zero          24768
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-12-03 05:48
              
            
            
                                                                       
One potential way, inspired by Kieveli's dismissed idea:

int is_empty(char *buf, size_t size)
{
    static const char zero[999] = { 0 };
    return !memcmp(zero, buf, size > 999 ? 999 : size);
}


Note that you can't make this solution work for arbitrary sizes. You could do this:

int is_empty(char *buf, size_t size)
{
    char *zero = calloc(size);
    int i = memcmp(zero, buf, size);
    free(zero);
    return i;
}


But any dynamic memory allocation is going to be slower than what you have. The only reason the first solution is faster is because it can use memcmp(), which is going to be hand-optimized in assembly language by the library writers and will be much faster than anything you could code in C.

EDIT: An optimization no one else has mentioned, based on earlier observations about the "likelyness" of the buffer to be in state X: If a buffer isn't empty, will it more likely not be empty at the beginning or the end? If it's more likely to have cruft at the end, you could start your check at the end and probably see a nice little performance boost.

EDIT 2: Thanks to Accipitridae in the comments:

int is_empty(char *buf, size_t size)
{
    return buf[0] == 0 && !memcmp(buf, buf + 1, size - 1);
}


This basically compares the buffer to itself, with an initial check to see if the first element is zero. That way, any non-zero elements will cause memcmp() to fail. I don't know how this would compare to using another version, but I do know that it will fail quickly (before we even loop) if the first element is nonzero. If you're more likely to have cruft at the end, change buf[0] to buf[size] to get the same effect.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
3
4
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复