Cache friendly method to multiply two matrices

家住魔仙堡 提交于 2019-11-28 00:56:32

问题


I intend to multiply 2 matrices using the cache-friendly method ( that would lead to less number of misses)

I found out that this can be done with a cache friendly transpose function.

But I am not able to find this algorithm. Can I know how to achieve this?


回答1:


The word you are looking for is thrashing. Searching for thrashing matrix multiplication in Google yields more results.

A standard multiplication algorithm for c = a*b would look like

void multiply(double[,] a, double[,] b, double[,] c)
{
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            for (int k = 0; k < n; k++)
                C[i, j] += a[i, k] * b[k, j]; 
}

Basically, navigating the memory fastly in large steps is detrimental to performance. The access pattern for k in B[k, j] is doing exactly that. So instead of jumping around in the memory, we may rearrange the operations such that the most inner loops operate only on the second access index of the matrices:

void multiply(double[,] a, double[,] B, double[,] c)
{  
   for (i = 0; i < n; i++)
   {  
      double t = a[i, 0];
      for (int j = 0; j < n; j++)
         c[i, j] = t * b[0, j];

      for (int k = 1; k < n; k++)
      {
         double s = 0;
         for (int j = 0; j < n; j++ )
            s += a[i, k] * b[k, j];
         c[i, j] = s;
      }
   }
}

This was the example given on that page. However, another option is to copy the contents of B[k, *] into an array beforehand and use this array in the inner loop calculations. This approach is usually much faster than the alternatives, even if it involves copying data around. Even if this might seem counter-intuitive, please feel free to try for yourself.

void multiply(double[,] a, double[,] b, double[,] c)
{
    double[] Bcolj = new double[n];
    for (int j = 0; j < n; j++)
    {
        for (int k = 0; k < n; k++)
            Bcolj[k] = b[k, j];

        for (int i = 0; i < n; i++)
        {
            double s = 0;
            for (int k = 0; k < n; k++)
                s += a[i,k] * Bcolj[k];
            c[j, i] = s;
        }
   }
}



回答2:


@Cesar's answer is not correct. For example, the inner loop

for (int k = 0; k < n; k++)
   s += a[i,k] * Bcolj[k];

goes through the i-th column of a.

The following code should ensure we always visit data row by row.

void multiply(const double (&a)[I][K], 
              const double (&b)[K][J], 
              double (&c)[I][J]) 
{
    for (int j=0; j<J; ++j) {
       // iterates the j-th row of c
       for (int i=0; i<I; ++i) {
         c[i][j] = 0;
       } 

       // iterates the j-th row of b
       for (int k=0; k<K; ++k) {
          double t = b[k][j];
          // iterates the j-th row of c
          // iterates the k-th row of a
          for (int i=0; i<I; ++i) {
            c[i][j] += a[i][k] * t;
          } 
       }
    }
}


来源:https://stackoverflow.com/questions/13312625/cache-friendly-method-to-multiply-two-matrices

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!