Cache friendly method to multiply two matrices

前端 未结 2 1584
暗喜
暗喜 2020-12-18 03:15

I intend to multiply 2 matrices using the cache-friendly method ( that would lead to less number of misses)

I found out that this can be done with a cache friendly t

相关标签:
2条回答
  • 2020-12-18 03:49

    The word you are looking for is thrashing. Searching for thrashing matrix multiplication in Google yields more results.

    A standard multiplication algorithm for c = a*b would look like

    void multiply(double[,] a, double[,] b, double[,] c)
    {
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                for (int k = 0; k < n; k++)
                    C[i, j] += a[i, k] * b[k, j]; 
    }
    

    Basically, navigating the memory fastly in large steps is detrimental to performance. The access pattern for k in B[k, j] is doing exactly that. So instead of jumping around in the memory, we may rearrange the operations such that the most inner loops operate only on the second access index of the matrices:

    void multiply(double[,] a, double[,] B, double[,] c)
    {  
       for (i = 0; i < n; i++)
       {  
          double t = a[i, 0];
          for (int j = 0; j < n; j++)
             c[i, j] = t * b[0, j];
    
          for (int k = 1; k < n; k++)
          {
             double s = 0;
             for (int j = 0; j < n; j++ )
                s += a[i, k] * b[k, j];
             c[i, j] = s;
          }
       }
    }
    

    This was the example given on that page. However, another option is to copy the contents of B[k, *] into an array beforehand and use this array in the inner loop calculations. This approach is usually much faster than the alternatives, even if it involves copying data around. Even if this might seem counter-intuitive, please feel free to try for yourself.

    void multiply(double[,] a, double[,] b, double[,] c)
    {
        double[] Bcolj = new double[n];
        for (int j = 0; j < n; j++)
        {
            for (int k = 0; k < n; k++)
                Bcolj[k] = b[k, j];
    
            for (int i = 0; i < n; i++)
            {
                double s = 0;
                for (int k = 0; k < n; k++)
                    s += a[i,k] * Bcolj[k];
                c[j, i] = s;
            }
       }
    }
    
    0 讨论(0)
  • 2020-12-18 04:11

    @Cesar's answer is not correct. For example, the inner loop

    for (int k = 0; k < n; k++)
       s += a[i,k] * Bcolj[k];
    

    goes through the i-th column of a.

    The following code should ensure we always visit data row by row.

    void multiply(const double (&a)[I][K], 
                  const double (&b)[K][J], 
                  double (&c)[I][J]) 
    {
        for (int j=0; j<J; ++j) {
           // iterates the j-th row of c
           for (int i=0; i<I; ++i) {
             c[i][j] = 0;
           } 
    
           // iterates the j-th row of b
           for (int k=0; k<K; ++k) {
              double t = b[k][j];
              // iterates the j-th row of c
              // iterates the k-th row of a
              for (int i=0; i<I; ++i) {
                c[i][j] += a[i][k] * t;
              } 
           }
        }
    }
    
    0 讨论(0)
提交回复
热议问题