Optimized matrix multiplication in C

后端 未结 13 2194
一整个雨季
一整个雨季 2020-11-30 01:44

I\'m trying to compare different methods for matrix multiplication. The first one is normal method:

do
{
    for (j = 0; j < i; j++)
    {
        for (k          


        
相关标签:
13条回答
  • 2020-11-30 02:50

    Getting this right can be non-trivial. One optimization that is of particular importance for large matrices, is tiling the multiplication to keep stuff in the cache. I once measured a 12x performance difference doing so, but I specifically picked a matrix size that consumed multiples of my cache (circa '97 so the cache was small).

    There's a lot of literature on the subject. A starting point is:

    http://en.wikipedia.org/wiki/Loop_tiling

    For a more in depth study, the following references, especially the Banerjee books, may be helpful:

    [Ban93] Banerjee, Utpal, Loop Transformations for Restructuring Compilers: the Foundations, Kluwer Academic Publishers, Norwell, MA, 1993.

    [Ban94] Banerjee, Utpal, Loop Parallelization, Kluwer Academic Publishers, Norwell, MA, 1994.

    [BGS93] Bacon, David F., Susan L. Graham, and Oliver Sharp, Compiler Transformations for High-Performance Computing, Computer Science Division, University of California, Berkeley, Calif., Technical Report No UCB/CSD-93-781.

    [LRW91] Lam, Monica S., Edward E. Rothberg, and Michael E Wolf. The Cache Performance and Optimizations of Blocked Algorithms, In 4th International Conference on Architectural Support for Programming Languages, held in Santa Clara, Calif., April, 1991, 63-74.

    [LW91] Lam, Monica S., and Michael E Wolf. A Loop Transformation Theory and an Algorithm to Maximize Parallelism, In IEEE Transactions on Parallel and Distributed Systems, 1991, 2(4):452-471.

    [PW86] Padua, David A., and Michael J. Wolfe, Advanced Compiler Optimizations for Supercomputers, In Communications of the ACM, 29(12):1184-1201, 1986.

    [Wolfe89] Wolfe, Michael J. Optimizing Supercompilers for Supercomputers, The MIT Press, Cambridge, MA, 1989.

    [Wolfe96] Wolfe, Michael J., High Performance Compilers for Parallel Computing, Addison-Wesley, CA, 1996.

    0 讨论(0)
提交回复
热议问题