Is there a more efficient way of expanding a char to an uint64_t?

前端未结

关注

 8  619

星月不相逢 2021-01-18 06:32

I want to inflate an unsigned char to an uint64_t by repeating each bit 8 times. E.g.

char -> uint64_t
0x00 -> 0x00
0x01 ->


      
      
        
          8条回答        

        
                    
            
            
                         
                
              
              
                
                   一整个雨季
                                             
                
                
                (楼主)
            
              
              
                2021-01-18 06:51
              

            
            
                        
In selected architectures (SSE,Neon) there are fast vector operations that can speed up this task or are designed to do this. Without special instructions the suggested look up table approach is both the fastest and most portable.

If the 2k size is an issue, parallel vector arithmetic operations can be simulated:

static uint64_t inflate_parallel(unsigned char a) {
  uint64_t vector = a * 0x0101010101010101ULL;
  // replicate the word all over qword
  // A5 becomes A5 A5 A5 A5 A5 A5 A5 A5
  vector &= 0x8040201008040201;  // becomes 80 00 20 00 00 04 00 01 <-- 
  vector += 0x00406070787c7e7f;  // becomes 80 40 80 70 78 80 7e 80
                                 // MSB is correct
  vector = (vector >> 7) & 0x0101010101010101ULL;  // LSB is correct
  return vector * 255;                             // all bits correct
}


EDIT:  2^31 iterations, (four time unroll to mitigate loop evaluation)

time ./parallel            time ./original            time ./lookup
real        0m2.038s       real       0m14.161s       real      0m1.436s
user        0m2.030s       user       0m14.120s       user      0m1.430s
sys         0m0.000s       sys        0m0.000s        sys       0m0.000s


That's about 7x speedup, while the lookup table gives ~10x speedup
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它8个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复