C: memcpy speed on dynamically allocated arrays

后端未结

关注

 3  1070

I need help with the performance of the following code. It does a memcpy on two dynamically allocated arrays of arbitrary size:

int main()
{
  double *a, *b;
  u


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-02-13 17:28
              
            
            
                                                                       
The first bzero runs longer because of (1) lazy page allocation and (2) lazy page zero-initialization by kernel. While second reason is unavoidable because of security reasons, lazy page allocation may be optimized by using larger ("huge") pages.

There are at least two ways to use huge pages on Linux. Hard way is hugetlbfs. Easy way is Transparent huge pages.

Search khugepaged in the list of processes on your system. If such process exists, transparent huge pages are supported, you can use them in your application if you change malloc to this:

posix_memalign((void **)&b, 2*1024*1024, n*sizeof(double));
madvise((void *)b, n*sizeof(double), MADV_HUGEPAGE);

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2021-02-13 17:32
              
            
            
                                                                       
Surely if you are comparing the speed of initialise and copy to the speed of just copy, then the initialisation should be included in timed section. It appears to me you should actually be comparing this:

// Version 1
for(i=0; i<n; i++)
    a[i] = 1.0;

tic();

memcpy(b, a, n*sizeof(double));

toc();


To this:

// Version 2
for(i=0; i<n; i++)
    a[i] = 1.0;

tic();

for(i=0; i<n; i++)
    b[i] = 0.0;
memcpy(b, a, n*sizeof(double));

toc();


I expect this will see your 3x speed improvement drop sharply.

EDIT: As suggested by Steve Jessop, you may also want to test a third strategy of only touching one entry per page:

// Version 3
for(i=0; i<n; i++)
    a[i] = 1.0;

tic();

for(i=0; i<n; i+=DOUBLES_PER_PAGE)
    b[i] = 0.0;
memcpy(b, a, n*sizeof(double));

toc();

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2021-02-13 17:37
              
            
            
                                                                       
It's probably lazy page allocation, Linux only mapping the pages on first access. IIRC each page in a new block in Linux is a copy-on-write of a blank page, and your allocations are big enough to demand new blocks.

If you want to work around it, you could write one byte or word, at 4k intervals. That might get the virtual addresses mapped to RAM slightly faster than writing the whole of each page.

I wouldn't expect (most efficient workaround to force the lazy memory mapping to happen) plus (copy) to be significantly faster than just (copy) without the initialization of b, though. So unless there's a specific reason why you care about the performance just of the copy, not of the whole operation, then it's fairly futile. It's "pay now or pay later", Linux pays later, and you're only measuring the time for later.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复