Optimized matrix multiplication in C

后端未结

关注

 13  2194

I\'m trying to compare different methods for matrix multiplication. The first one is normal method:

do
{
    for (j = 0; j < i; j++)
    {
        for (k


                      
              相关标签:


      
      
        
          13条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-11-30 02:50
              
            
            
                                                                       
Getting this right can be non-trivial.  One optimization that is of particular importance for large matrices, is tiling the multiplication to keep stuff in the cache.  I once measured a 12x performance difference doing so, but I specifically picked a matrix size that consumed multiples of my cache (circa '97 so the cache was small).

There's a lot of literature on the subject.  A starting point is:

http://en.wikipedia.org/wiki/Loop_tiling

For a more in depth study, the following references, especially the Banerjee books, may be helpful:

[Ban93]     Banerjee, Utpal, Loop Transformations for Restructuring Compilers: the Foundations, Kluwer Academic Publishers, Norwell, MA, 1993.

[Ban94]     Banerjee, Utpal, Loop Parallelization, Kluwer Academic Publishers, Norwell, MA, 1994.

[BGS93]     Bacon, David F., Susan L. Graham, and Oliver Sharp, Compiler Transformations for High-Performance Computing, Computer Science Division, University of California, Berkeley, Calif., Technical Report No UCB/CSD-93-781.

[LRW91]     Lam, Monica S., Edward E. Rothberg, and Michael E Wolf. The Cache Performance and Optimizations of Blocked Algorithms, In 4th International Conference on Architectural Support for Programming Languages, held in Santa Clara, Calif., April, 1991, 63-74.

[LW91]      Lam, Monica S., and Michael E Wolf. A Loop Transformation Theory and an Algorithm to Maximize Parallelism, In IEEE Transactions on Parallel and Distributed Systems, 1991, 2(4):452-471.

[PW86]      Padua, David A., and Michael J. Wolfe, Advanced Compiler Optimizations for Supercomputers, In Communications of the ACM, 29(12):1184-1201, 1986.

[Wolfe89]      Wolfe, Michael J. Optimizing Supercompilers for Supercomputers, The MIT Press, Cambridge, MA, 1989.

[Wolfe96]      Wolfe, Michael J., High Performance Compilers for Parallel Computing, Addison-Wesley, CA, 1996.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复