faster implementation of sum ( for Codility test )

前端未结

关注

 22  2083

How can the following simple implementation of sum be faster?

private long sum( int [] a, int begin, int end ) {
    if( a == null   ) {
        ret


                      
              相关标签:


      
      
        
          22条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2021-02-04 12:44
              
            
            
                                                                       
Probably the fastest you could get would be to have your int array 16-byte aligned, stream 32 bytes into two __m128i variables (VC++) and call _mm_add_epi32 (again, a VC++ intrinsic) on the chunks. Reuse one of the chunks to keep adding into it and on the final chunk extract your four ints and add them the old fashioned way.

The bigger question is why simple addition is a worthy candidate for optimization.

Edit: I see it's mostly an academic exercise.  Perhaps I'll give it a go tomorrow and post some results...
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2021-02-04 12:44
              
            
            
                                                                       
This won't help you with an O(n^2) algorithm, but you can optimize your sum.

At a previous company, we had Intel come by and give us optimization tips.  They had one non-obvious and somewhat cool trick.  Replace:

long r = 0; 
for( int i =  begin ; i < end ; i++ ) { 
   r+= a[i]; 
} 


with

long r1 = 0, r2 = 0, r3 = 0, r4 = 0; 
for( int i =  begin ; i < end ; i+=4 ) { 
   r1+= a[i];
   r2+= a[i + 1];
   r3+= a[i + 2];
   r4+= a[i + 3];
}
long r = r1 + r2 + r3 + r4;
// Note: need to be clever if array isn't divisible by 4


Why this is faster:
  In the original implementation, your variable r is a bottleneck.  Every time through the loop, you have to pull data from memory array a (which takes a couple cycles), but you can't do multiple pulls in parallel, because the value of r in the next iteration of the loop depends on the value of r in this iteration of the loop.  In the second version, r1, r2, r3, and r4 are independent, so the processor can hyperthread their execution.  Only at the very end do they come together.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-02-04 12:47
              
            
            
                                                                       
This code is simple enough that unless a is quite small, it's probably going to be limited primarily by memory bandwidth. As such, you probably can't hope for any significant gain by working on the summing part itself (e.g., unrolling the loop, counting down instead of up, executing sums in parallel -- unless they're on separate CPUs, each with its own access to memory). The biggest gain will probably come from issuing some preload instructions so most of the data will already be in the cache by the time you need it. The rest will just (at best) get the CPU to hurry up more, so it waits longer.

Edit: It appears that most of what's above has little to do with the real question. It's kind of small, so it may be difficult to read, but, I tried just using std::accumulate() for the initial addition, and it seemed to think that was all right:


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2021-02-04 12:49
              
            
            
                                                                       
Here is a thought:

private static ArrayList equi(int[] A)
{
    ArrayList answer = new ArrayList();

    //if(A == null) return -1; 
    if ((answer.Count == null))
    {
        answer.Add(-1);
        return answer;
    }

    long sum0 = 0, sum1 = 0;
    for (int i = 0; i < A.Length; i++) sum0 += A[i];
    for (int i = 0; i < A.Length; i++)
    {
        sum0 -= A[i];
        if (i > 0) { sum1 += A[i - 1]; }
        if (sum1 == sum0) answer.Add(i);
    //return i;
    }
    //return -1;
    return answer;
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
4
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复